In this post, I would like to review several existing techniques to binary matrix decomposition.
- Andrew I. Schein, Lawrence K. Saul, and Lyle H. Ungar. A Generalized Linear Model for Principal Component Analysis of Binary Data. Appeared in Proceedings of the 9’th International Workshop on Artificial Intelligence and Statistics. January 3-6, 2003. Key West, FL.
This paper introduced a logistic version of PCA to binary data. The model assumes that each observation is from a single latent factor and there exists multiple latent factors. The model is quite straightforward and the inference is been done by Alternative Least Square.
- Tao Li. 2005. A general model for clustering binary data. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining (KDD ’05). ACM, New York, NY, USA, 188-197.
In this paper, the author introduced the problem of “binary data decomposition”. The paper demonstrated several techniques that are popular for normal matrix factorization to binary data, like k-means, spectral clustering. The proposed method is to factorize the binary matrix into two binary matrices, where the binary indicators suggest membership.
- Tomas Singliar and Milos Hauskrecht. 2006. Noisy-OR Component Analysis and its Application to Link Analysis. J. Mach. Learn. Res. 7 (December 2006), 2189-2213.
This paper introduced a probabilistic view of binary data. Like other latent factor models, each observation can be viewed as a sample from multiple binary latent Bernoulli factors, essentially a mixture model. A variational inference is conducted in the paper. The weak part of the paper is that the comparison of the model with PLSA and LDA is not quite convincing.
- Zhongyuan Zhang, Tao Li, Chris Ding, and Xiangsun Zhang. 2007. Binary Matrix Factorization with Applications. In Proceedings of the 2007 Seventh IEEE International Conference on Data Mining (ICDM ’07). IEEE Computer Society, Washington, DC, USA, 391-400.
This paper indeed introduced a variant of Non-negative Matrix Factorization to binary data, meaning that a binary matrix will be always decomposed into two matrices bounded by 0 to 1. The proposed method is a modification of NMF. However, in a document clustering problem, the performance difference between proposed method and NMF is very small.
- Miettinen, P.; Mielikainen, T.; Gionis, A.; Das, G.; Mannila, H.; , “The Discrete Basis Problem,” Knowledge and Data Engineering, IEEE Transactions on , vol.20, no.10, pp.1348-1362, Oct. 2008.
Miettinen, P.; , “Sparse Boolean Matrix Factorizations,” Data Mining (ICDM), 2010 IEEE 10th International Conference on , vol., no., pp.935-940, 13-17 Dec. 2010
These two papers stated another view of factorization of binary data. Rather than directly using some SVD based or NMF based methods, these papers introduced a “cover” based discrete optimization method to the problem. However, through experiments, the performance advantages over traditional SVD or NMF methods are not very clear. Another drawback of their method is that some other existing methods are difficult to be incorporated with.
- Andreas P. Streich, Mario Frank, David Basin, and Joachim M. Buhmann. 2009. Multi-assignment clustering for Boolean data. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML ’09). ACM, New York, NY, USA, 969-976.
This paper introduced a probabilistic view of the binary data. The observation is assumed to be generated either by “signal” or by “noise”, both are Bernoulli distributions. The switch variable is also sampled from the third Bernoulli distribution. This is essentially a simplified PLSA. The inference is done by deterministic annealing.
- Ata Kaban, Ella Bingham, Factorisation and denoising of 0-1 data: A variational approach, Neurocomputing, Volume 71, Issues 10-12, Neurocomputing for Vision Research; Advances in Blind Signal Processing, June 2008, Pages 2291-2308, ISSN 0925-2312.
This paper is somewhat similar “Noisy-OR” model and Logistic PCA as well. However, unlike Logistic PCA, the proposed model is a mixture model, meaning that a single observation is “generated” by multiple latent factors. The authors put a Beta prior over latent factors and the inference is done by Variational Inference.
Ella Bingham, Ata Kaban, and Mikael Fortelius. 2009. The aspect Bernoulli model: multiple causes of presences and absences. Pattern Anal. Appl. 12, 1 (January 2009), 55-78.
This paper goes back to the assumption that each observation is sampled from a simple factor. The inference is done by EM.
In all, it seems that the performance advantages of specifically designed binary data models are small. However, the biggest advatange of these model is that they can give better interpretations sometimes. For computational models, NMF seems a good approximation. For probablistic models, a modified PLSA or LDA seems quite resonable.