An Easy Reading Tutorial for Bayesian Non-parametric Models


The “god-father” of LDA, David Blei, recently published a tutorial on Bayesian Non-parametric Models, with one of his student. The whole tutorial is easy-reading and provides very clear overview of Bayesian Non-parametric Models. In particular, Chinese Restaurant Process (CRP) and Indian Buffet Process are discussed in a very intuitive way. For those who are interests in technical details about these models, this tutorial may be just a starting point and the Appendix points out several ways to discuss models more formally, including inference algorithms.

One specific interesting property shown in this tutorial is the “exchangeable” property for CRP, which I wish to re-state as below.

Let \( c_{n} \) be the table assignment of the \(n\)th customer. A draw from CPR can be generated by sequentially assigning observations to classes with probability:
\[P(c_{n} = k | \mathbf{c}_{1:n-1}) = \begin{cases}
\frac{m_{k}}{n-1+\alpha}, & \mbox{if } k \leq \mathbf{K}_{+} \mbox{ (i.e., $k$ is a previously occupied table)} \\
\frac{\alpha}{n-1+\alpha}, & \mbox{otherwise (i.e., $k$ is the next unoccupied table)}
\end{cases}\]where \( m_{k} \) is the number of customers sitting at table \( k \), and \( \mathbf{K}_{+} \) is the number of tables for which \( m_{k} > 0 \). The parameter \( \alpha \) is called the concentration parameter. The CRP exhibits an important invariance property: The cluster assignments under this distribution are exchangeable. This means \( p(\mathbf{c}) \) is unchanged if the order of customers is shuffled.

Consider the joint distribution of a set of customer assignments \( c_{1:N} \). It decomposes according to the chain rule:
\[p(c_{1}, c_{2}, \cdots , c_{N}) = p(c_{1}) p(c_{2} | c_{1}) \cdots p(c_{N} | c_{1}, c_{2}, \cdots , c_{N-1}) \]where each terms comes from above equation. To show that this distribution is exchangeable, we will introduce some new notation. Let \( \mathbf{K}(c_{1:N}) \) be the number of groups in which these assignments place the customers, which is a number between 1 and \( N \). Let \( I_{k} \) be the set of indices of customers assigned to the \(k\)th group, and let \( N_{k} \) be the number of customers assigned to that group. Now, for a particular group \( k \) the joint probability of all assignments in this group is:
\[ \frac{\alpha}{I_{k,1}-1+\alpha} \frac{1}{I_{k,2}-1+\alpha} \frac{2}{I_{k,3}-1+\alpha} \cdots \frac{N_{k}-1}{I_{k,N}-1+\alpha} \]where each term in the equation represents a customer. The numerator can be re-written as \( \alpha (N_{k}-1)!\). Therefore, we have:
\[ p(c_{1}, c_{2}, \cdots , c_{N}) = \prod_{k=1}^{K} \frac{\alpha (N_{k}-1)!}{(I_{k,1}-1+\alpha)(I_{k,2}-1+\alpha)\cdots (I_{k,N_{k}}-1+\alpha)} \]Finally, notice that the union of \( \mathbf{I}_{k} \) across all groups \(k\) identifies each index once, because each customer is assigned to exactly one group. This simplifies the denominator and let us write the joint as:
\[ p(c_{1}, c_{2}, \cdots , c_{N}) = \frac{\alpha^{K} \prod_{k=1}^{K} (N_{k}-1)! }{\prod_{i=1}^{N} (i-1+\alpha)} \]This equation only depends on the number of groups \(\mathbf{K}\) and the size of each group \(\mathbf{N}_{k}\).

Leave a comment

Your email address will not be published. Required fields are marked *