[arXiv] Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, Thorsten Joachims: Recommendations as Treatments: Debiasing Learning and Evaluation. [BibTex] The work is nice and straightforward. The basic idea is to estimate (or use) propensity scores in evaluation and optimization as well. The results are not only for error rate related metrics but for ranking metrics as well. For estimation, it shows that Naive Bayes and Logistic Regression are two simple yet powerful methods to estimate propensity scores.
Matt Taddy. Distributed multinomial regression. Annals of Applied Statistics 9, 2015. Both papers introduce a framework called “Multinomial Inverse Regression” (MNIR) to model discrete data (primarily words). The main idea of the framework is to, instead of modeling Y (responses) given X (features), use Y to predict X. That’s why it’s called “Inverse Regression”. The framework can be linked to dimension reduction and LDA as well. The main drawback of the framework is its computational difficulty from the log normalization factor.
M. Rabinovich and D. Blei. The inverse regression topic model. International Conference on Machine Learning, 2014. [Supplement]. This paper is the extension to the MNIR framework for topic modeling, meaning words are generated from multiple topics. The inference is even difficult here.
Word Vectors and Embeddings
Peter D. Turney and Patrick Pantel. 2010. From frequency to meaning: vector space models of semantics. Journal of Artificial Intelligence Research. 37, 1 (January 2010), 141-188. [DOI] It is a good survey of Vector-Space-Model (VSM) with history and plenty of references. One interesting discussion of the paper is that, in addition to TF-IDF weighting, the paper discussed Pointwise-Mutual-Information weighting schemes, which might relate to other word embedding approaches. Also, the list of applications of which VSM method belongs to are also helpful.
Edward Meeds, Remco Hendriks, Said al Faraby, Magiel Bruntink, Max Welling: MLitB: machine learning in the browser. PeerJ Computer Science 1: e11 (2015) [BibTex] The idea of the paper is to introduce a paradigm to train machine learning models through browsers. The architecture is consisted of master server, data server and clients (with data workers and web workers). Master nodes (servers) also handle the “reduce”-step of the learning. The paper does not provide any idea of how models would converge and the quality of the models though.
Daniel Crankshaw, Peter Bailis, Joseph E. Gonzalez, Haoyuan Li, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, Michael I. Jordan: The Missing Piece in Complex Analytics: Low Latency, Scalable Model Management and Serving with Velox. CIDR 2015. [BibTex] The paper introduced a system to manage and serve models in a large scale. The main ingredients of the paper include caching predictions, bandits with multiple models (to avoid feedback loops), bootstrapping new users with nearest neighbors and a balance between offline/online learning. The paper is system-oriented and not too many details.
Machine Learning Engineering
Hidden Technical Debt in Machine Learning Systems D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-François Crespo, Dan Dennison. NIPS 2015. [BibTex] This paper is a must-read for machine learning engineering. It has a lot of insights and suggestions about how machine learning systems are hard to maintain and why/where these pain-points come from.