There is a trend in research communities to bring two well-established classes of models together, topic models and latent factor models. By doing so, we may enjoy the ability to analyze text information with topic models and incorporate the collaborative filtering analysis with latent factor models. In this section, I wish to discuss some of these efforts.

Three papers will be covered in this post are listed at the end of the post. Before that, let’s first review what latent factor models are. Latent factor models (LFM) are usually used in collaborative filtering context. Say, we have a user-item rating matrix \( \mathbf{R} \) where \( r_{ij} \) represents the rating user \( i \) gives to item \( j \). Now, we assume for each user \( i \), there is a vector \( \mathbf{u}_{i} \) with the dimensionality \( k \), representing the user in a latent space. Similarly, we assume for each item \( j \), a vector \( \mathbf{v}_{j} \) with the same dimensionality representing the item in a same latent space. Thus, the rating \( r_{ij} \) is therefore represented as:

\[ r_{ij} = \mathbf{u}_{i}^{T} \mathbf{v}_{j} \]This is the basic setting for LFM. In addition to this basic setting, additional biases can be incorporated, see here. For topic models (TM), the simplest case is Latent Dirichlet Allocation (LDA). The story of LDA is like this. For a document \( d \), we first sample a multinomial distribution \( \boldsymbol{\theta}_{d} \), which is a distribution over all possible topics. For each term position \( w \) in the document, we sample a discrete topic assignment \( z \) from \( \boldsymbol{\theta}_{d} \), indicating which topic we use for this term. Then, we sample a term \( v \) from a topic \( \boldsymbol{\beta} \), a multinomial distribution over the vocabulary.

For both LFM and TM, they are methods to reduce original data into latent spaces. Therefore, it might be possible to link them together. Especially, items in the LFM are associated with rich text information. One natural idea is that, for an item \( j \), the latent factor \( \mathbf{v}_{j} \) and its topic proportional parameter \( \boldsymbol{\theta}_{j} \) somehow gets connected. One way is to directly equalize these two variables. Since \( \mathbf{v}_{j} \) is a real-value variable and \( \boldsymbol{\theta}_{j} \) falls into a simplex, we need certain ways to keep these properties. Two possible methods can be used:

- Keep \( \boldsymbol{\theta}_{j} \) and make sure it is in the range of [0, 1] in the optimization process. Essentially put some constraint on the parameter.
- Keep \( \mathbf{v}_{j} \) and use logistic transformation to transfer a real-valued vector into simplex.

Hanhuai and Banerjee showed the second technique in their paper by combining Correlated Topic Model with LFM. Wang and Blei argued that this setting suffers from the limitation that it cannot distinguish topics for explaining recommendations from topics important for explaining content since the latent space is strictly equal. Thus, they proposed a slightly different approach. Namely, each \( \mathbf{v}_{j} \) derives from \( \boldsymbol{\theta}_{j} \) with item-dependent noise:

\[ \mathbf{v}_{j} = \boldsymbol{\theta}_{j} + \epsilon_{j} \] where \( \epsilon_{j} \) is a Gaussian noise.

A different approach is to not directly equal these two quantities but let me impact these each other. One such way explored by Hanhuai and Banerjee is that \( \boldsymbol{\theta}_{j} \) influences how \( \mathbf{v}_{j} \) is generated. More specifically, in Probabilistic Matrix Factorization (PMF) setting, all \( \mathbf{v} \)s are generated by a Gaussian distribution with a fixed mean and variance. Now, by combining LDA, the authors allow different topic has different Gaussian prior mean and variance values. A value similar to \( z \) is firstly generated from \( \boldsymbol{\theta}_{j} \) to decide which mean to use and then generate \( \mathbf{v}_{j} \) from that particular mean and variance.

A totally different direction was taken by Agarwal and Chen. In their fLDA paper, there is no direct relationship between item latent factor and content latent factor. In fact, their relationship is realized by the predictive equation:

\[ r_{ij} = \mathbf{a}^{T} \mathbf{u}_{i} + \mathbf{b}^{T} \mathbf{v}_{j} + \mathbf{s}_{i}^{T} \bar{\mathbf{z}}_{j}

\]where \( \mathbf{a} \), \( \mathbf{b} \) and \(\mathbf{s}_{i} \) are regression weights and \( \bar{\mathbf{z}}_{j} \) is the average topic assignments for item \( j \). Note, \(\mathbf{s}_{i} \) is a user-dependent regression weights. This formalism encodes the notion that all latent factors (including content) will contribute to the rating, not only item and user factors.

In summary, three directions have been taken for integrating TM and LFM:

- Equal item latent factor and topic proportion vector, or make some Gaussian noise.
- Let topic proportion vector to control the prior distribution for item latent factor.
- Let item latent factor and topic assignments, as well as user latent factor, contribute the rating.

Reference:

- Deepak Agarwal and Bee-Chung Chen. 2010.
**fLDA: matrix factorization through latent dirichlet allocation**. In Proceedings of the third ACM international conference on Web search and data mining (WSDM ’10). ACM, New York, NY, USA, 91-100. [PDF] - Hanhuai Shan and Arindam Banerjee. 2010.
**Generalized Probabilistic Matrix Factorizations for Collaborative Filtering**. In Proceedings of the 2010 IEEE International Conference on Data Mining (ICDM ’10). IEEE Computer Society, Washington, DC, USA, 1025-1030. [PDF] - Chong Wang and David M. Blei. 2011.
**Collaborative topic modeling for recommending scientific articles**. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD ’11). ACM, New York, NY, USA, 448-456.[PDF]