Tricks in Sampling Discrete Distributions (1) – Sampling in Log-Space


One critical component in Gibbs sampling for complex graphical models is to be able to draw samples from discrete distributions. Take latent Dirichlet allocation (LDA) as an example, the main computation focus is to draw samples from the following distribution:
\begin{equation}
P(z_{i} = k \, | \, \mathbf{z}_{-i}, \mathbf{w}, \boldsymbol{\alpha}, \boldsymbol{\beta}) \propto (n_{d,k}^{-i} + \alpha_{k}) \frac{n_{w_{i}, k}^{-i} + \beta_{w_{i}}}{n_{., k}^{-i} + \sum_{v} \beta_{v}}\label{eq:lda}
\end{equation}
where \( n_{d,k}^{-i} \) is the number of tokens in the document \( d \) assigned to the topic \( k \), excluding the token \( i \), \( n_{w_{i}, k}^{-i} \) is the number of times token \( w_{i} \) assigned to the topic \( k \), excluding \( i \), and \( n_{.,k}^{-i} \) is the total number of tokens assigned to the topic \( k \).

So, a straightforward sampling algorithm works as follows:

  1. Let \( c_{k} \) be the right-hand side of Equation \eqref{eq:lda} for topic \( k \), which is an un-normalized probability.
  2. We compute the accumulated weights as: \( C[i] = C[i-1] + c_{i} , \,\, \forall i \in (0, K-1] \) and \( C[0] = c_{0} \).
  3. Draw \( u \sim \mathcal{U}(0, C[K-1] ) \) and find \( t = \arg\min_{i} \left( x_{i} \right) \) where \( x_{i} = C[i] – u \) and \( x_{i} > 0 \)

The last line is essentially to find the minimum index that the array value is greater than the random number \( u \).

One difficulty to deal with \eqref{eq:lda} is that the right hand side might be too small and therefore overflow (thinking about too many near-zero numbers multiplying). Thus, we want to deal with probabilities in log-space. We start to work with:
\begin{equation}
\log P(z_{i} = k \, | \, \mathbf{z}_{-i}, \mathbf{w}, \boldsymbol{\alpha}, \boldsymbol{\beta}) \propto \log (n_{d,k}^{-i} + \alpha_{k}) + \log (n_{w_{i}, k}^{-i} + \beta_{w_{i}}) – \log ( n_{., k}^{-i} + \sum_{v} \beta_{v} )\label{eq:log_lda}
\end{equation}and store in \( c_{k} \) but remember that now each value represents un-normalized log probability. The next step is to compute the accumulated weights, this time as accumulated probabilities but in log-space! Thanks to the trick mentioned in [Notes on Calculating Log Sum of Exponentials], we are able to compute log sum efficiently. Please use the last equation there to compute the accumulated weights. The last step is to draw the random number. We computeĀ \( u \sim \mathcal{U}(0, 1) + \log C[K-1] \) and again find the minimum index that satisfy the array value is greater than \( u \).

Notes:

  1. The log-sampling algorithm for LDA is implemented in Fugue Topic Modeling Package.
  2. Unless you really face the issue of overflow, sampling in log-space is usually much slower than the original space as log and exp are expensive functions to compute.

Leave a comment

Your email address will not be published. Required fields are marked *