# Paper Notes

## Counterfactual Reasoning and Causal Inference

1. [arXiv] Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, Thorsten JoachimsRecommendations as Treatments: Debiasing Learning and Evaluation. [BibTex]
The work is nice and straightforward. The basic idea is to estimate (or use) propensity scores in evaluation and optimization as well. The results are not only for error rate related metrics but for ranking metrics as well. For estimation, it shows that Naive Bayes and Logistic Regression are two simple yet powerful methods to estimate propensity scores.

## Multinomial Inverse Regression

1. Matt Taddy. Multinomial inverse regression for text analysis. Journal of the American Statistical Association 108, 2013. arXiv paper and rejoinder.
2. Matt Taddy. Distributed multinomial regression. Annals of Applied Statistics 9, 2015.
Both papers introduce a framework called “Multinomial Inverse Regression” (MNIR) to model discrete data (primarily words). The main idea of the framework is to, instead of modeling Y (responses) given X (features), use Y to predict X. That’s why it’s called “Inverse Regression”. The framework can be linked to dimension reduction and LDA as well. The main drawback of the framework is its computational difficulty from the log normalization factor.
3. M. Rabinovich and D. Blei.   The inverse regression topic model.   International Conference on Machine Learning, 2014. [Supplement].
This paper is the extension to the MNIR framework for topic modeling, meaning words are generated from multiple topics. The inference is even difficult here.

## Word Vectors and Embeddings

1. Peter D. Turney and Patrick Pantel. 2010. From frequency to meaning: vector space models of semantics. Journal of Artificial Intelligence Research. 37, 1 (January 2010), 141-188. [DOI]
It is a good survey of Vector-Space-Model (VSM) with history and plenty of references. One interesting discussion of the paper is that, in addition to TF-IDF weighting, the paper discussed Pointwise-Mutual-Information weighting schemes, which might relate to other word embedding approaches. Also, the list of applications of which VSM method belongs to are also helpful.

## Recommender Systems & Collaborative Filtering

1. Mingxuan Sun, Fuxin Li, Joonseok Lee, Ke Zhou, Guy Lebanon, Hongyuan Zha:  WSDM 2013: 445-454. [BibTex]
The main idea of the paper is to learn decision trees for cold-start users. However, the way decision trees is built is a bit complicated and hard to optimize.

## Rank Aggregation

1. [arXiv] Nihar B. Shah, Martin J. Wainwright: Simple, Robust and Optimal Ranking from Pairwise Comparisons. [BibTex]
The main result of the paper is that, the counting method for pairwise comparisons is optimal under certain conditions and it’s also faster than other counterparts.

## Distributed Model Management and Serving

1. Edward Meeds, Remco Hendriks, Said al Faraby, Magiel Bruntink, Max Welling PeerJ Computer Science 1: e11 (2015) [BibTex]
The idea of the paper is to introduce a paradigm to train machine learning models through browsers. The architecture is consisted of master server, data server and clients (with data workers and web workers). Master nodes (servers) also handle the “reduce”-step of the learning. The paper does not provide any idea of how models would converge and the quality of the models though.
2. Daniel Crankshaw, Peter Bailis, Joseph E. Gonzalez, Haoyuan Li, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, Michael I. Jordan:  CIDR 2015. [BibTex]
The paper introduced a system to manage and serve models in a large scale. The main ingredients of the paper  include caching predictions, bandits with multiple models (to avoid feedback loops), bootstrapping new users with nearest neighbors and a balance between offline/online learning. The paper is system-oriented and not too many details.

## Machine Learning Engineering

1. Hidden Technical Debt in Machine Learning Systems D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-François Crespo, Dan Dennison. NIPS 2015. [BibTex]
This paper is a must-read for machine learning engineering. It has a lot of insights and suggestions about how machine learning systems are hard to maintain and why/where these pain-points come from.

## Combining Distributions

1. Latent Bayesian melding for integrating individual and population models Mingjun Zhong, Nigel Goddard, Charles Sutton. NIPS 2015. [BibTex].
The technique introduced by the paper seems quite peculiar and may not be applicable for many applications but the discussion of various of methods to combine probabilities is worthwhile to read.