There seems to be an interesting new model architecture for ranking & recommendation, developed by Google Research. It uses Logistic Regression & Deep Learning in a single model.
This is different from ensemble models, where a sub-model is trained separately, and it’s score is used as a feature for the parent model. In this paper, the authors learn a wide model (Logistic Regression, which is trying to “memorize”), and a deep model (Deep Neural Network, which is trying to “generalize”), jointly.
The input to the wide network are standard features, while the deep network uses dense embeddings of the document to be scored, as input.
The main benefits as per the authors, are:
DNNs can learn to over-generalize, while LR models are limited in how much they can memorize from the training data.
Learning the models jointly means that the ‘wide’ and ‘deep’ part are aware of each other, and the ‘wide’ part only needs to augment the ‘deep’ part.
Also, training jointly helps reduce the side of the individual models.
The authors employed this model to recommend apps to be downloaded to the user in Google Play, where they drove up app installs by 3.9% using the Wide & Deep model.
However, the Deep model in itself, drove up installs by 2.9%. It is natural to expect that the ‘wide’ part of the model should help in further improving the metric to be optimized, but it is unclear to me, if the delta could have been achieved by further bulking up the ‘deep’ part (i.e., adding more layers / training bigger dimensional embeddings, which are inputs to the DNNs).