The “god-father” of LDA, David Blei, recently published a tutorial on Bayesian Non-parametric Models, with one of his student. The whole tutorial is easy-reading and provides very clear overview of Bayesian Non-parametric Models. In particular, Chinese Restaurant Process (CRP) and Indian Buffet Process are discussed in a very intuitive way. For those who are interests in technical details about these models, this tutorial may be just a starting point and the Appendix points out several ways to discuss models more formally, including inference algorithms.

One specific interesting property shown in this tutorial is the “*exchangeable*” property for CRP, which I wish to re-state as below.

Let \( c_{n} \) be the table assignment of the \(n\)th customer. A draw from CPR can be generated by sequentially assigning observations to classes with probability:

\[P(c_{n} = k | \mathbf{c}_{1:n-1}) = \begin{cases}

\frac{m_{k}}{n-1+\alpha}, & \mbox{if } k \leq \mathbf{K}_{+} \mbox{ (i.e., $k$ is a previously occupied table)} \\

\frac{\alpha}{n-1+\alpha}, & \mbox{otherwise (i.e., $k$ is the next unoccupied table)}

\end{cases}\]where \( m_{k} \) is the number of customers sitting at table \( k \), and \( \mathbf{K}_{+} \) is the number of tables for which \( m_{k} > 0 \). The parameter \( \alpha \) is called the concentration parameter. The CRP exhibits an important invariance property: The cluster assignments under this distribution are exchangeable. This means \( p(\mathbf{c}) \) is unchanged if the order of customers is shuffled.

Consider the joint distribution of a set of customer assignments \( c_{1:N} \). It decomposes according to the chain rule:

\[p(c_{1}, c_{2}, \cdots , c_{N}) = p(c_{1}) p(c_{2} | c_{1}) \cdots p(c_{N} | c_{1}, c_{2}, \cdots , c_{N-1}) \]where each terms comes from above equation. To show that this distribution is exchangeable, we will introduce some new notation. Let \( \mathbf{K}(c_{1:N}) \) be the number of groups in which these assignments place the customers, which is a number between 1 and \( N \). Let \( I_{k} \) be the set of indices of customers assigned to the \(k\)th group, and let \( N_{k} \) be the number of customers assigned to that group. Now, for a particular group \( k \) the joint probability of all assignments in this group is:

\[ \frac{\alpha}{I_{k,1}-1+\alpha} \frac{1}{I_{k,2}-1+\alpha} \frac{2}{I_{k,3}-1+\alpha} \cdots \frac{N_{k}-1}{I_{k,N}-1+\alpha} \]where each term in the equation represents a customer. The numerator can be re-written as \( \alpha (N_{k}-1)!\). Therefore, we have:

\[ p(c_{1}, c_{2}, \cdots , c_{N}) = \prod_{k=1}^{K} \frac{\alpha (N_{k}-1)!}{(I_{k,1}-1+\alpha)(I_{k,2}-1+\alpha)\cdots (I_{k,N_{k}}-1+\alpha)} \]Finally, notice that the union of \( \mathbf{I}_{k} \) across all groups \(k\) identifies each index once, because each customer is assigned to exactly one group. This simplifies the denominator and let us write the joint as:

\[ p(c_{1}, c_{2}, \cdots , c_{N}) = \frac{\alpha^{K} \prod_{k=1}^{K} (N_{k}-1)! }{\prod_{i=1}^{N} (i-1+\alpha)} \]This equation only depends on the number of groups \(\mathbf{K}\) and the size of each group \(\mathbf{N}_{k}\).