<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Hong, LiangJie</title>
	<atom:link href="http://www.hongliangjie.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.hongliangjie.com</link>
	<description>Dept. of Computer Science and Engineering at Lehigh University</description>
	<lastBuildDate>Thu, 02 Feb 2012 18:01:37 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Sorting Tuples in C++</title>
		<link>http://www.hongliangjie.com/2011/10/10/sortin-tuples-in-c/</link>
		<comments>http://www.hongliangjie.com/2011/10/10/sortin-tuples-in-c/#comments</comments>
		<pubDate>Mon, 10 Oct 2011 22:20:44 +0000</pubDate>
		<dc:creator>Liangjie Hong</dc:creator>
				<category><![CDATA[Research in General]]></category>

		<guid isPermaLink="false">http://www.hongliangjie.com/?p=748</guid>
		<description><![CDATA[In this post, I would like to show how to create a tuple object in C++ 11 and how to sort tuples.<p class="read-more"><a href="http://www.hongliangjie.com/2011/10/10/sortin-tuples-in-c/">Read more &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<p>In this post, I would like to show how to create a tuple object in C++ 11 and how to sort tuples.</p>
<p>Here is the code for creating tuples and doing the sort. It is pretty straightforward.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
</pre></td><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #339900;">#include &lt;iostream&gt;</span>
<span style="color: #339900;">#include &lt;string&gt;</span>
<span style="color: #339900;">#include &lt;vector&gt;</span>
<span style="color: #339900;">#include &lt;tuple&gt;</span>
<span style="color: #339900;">#include &lt;algorithm&gt;</span>
<span style="color: #0000ff;">using</span> <span style="color: #0000ff;">namespace</span> std<span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">typedef</span> tuple<span style="color: #000080;">&lt;</span>string,<span style="color: #0000ff;">double</span>,<span style="color: #0000ff;">int</span><span style="color: #000080;">&gt;</span> mytuple<span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">bool</span> mycompare <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">const</span> mytuple <span style="color: #000040;">&amp;</span>lhs, <span style="color: #0000ff;">const</span> mytuple <span style="color: #000040;">&amp;</span>rhs<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#123;</span>
  <span style="color: #0000ff;">return</span> get<span style="color: #000080;">&lt;</span><span style="color: #0000dd;">1</span><span style="color: #000080;">&gt;</span><span style="color: #008000;">&#40;</span>lhs<span style="color: #008000;">&#41;</span> <span style="color: #000080;">&lt;</span> get<span style="color: #000080;">&lt;</span><span style="color: #0000dd;">1</span><span style="color: #000080;">&gt;</span><span style="color: #008000;">&#40;</span>rhs<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span>
&nbsp;
<span style="color: #0000ff;">int</span> main<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">void</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#123;</span>
  vector<span style="color: #000080;">&lt;</span>mytuple<span style="color: #000080;">&gt;</span> data<span style="color: #008080;">;</span>
  data.<span style="color: #007788;">push_back</span><span style="color: #008000;">&#40;</span>make_tuple<span style="color: #008000;">&#40;</span><span style="color: #FF0000;">&quot;abc&quot;</span>,<span style="color:#800080;">4.5</span>,<span style="color: #0000dd;">1</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
  data.<span style="color: #007788;">push_back</span><span style="color: #008000;">&#40;</span>make_tuple<span style="color: #008000;">&#40;</span><span style="color: #FF0000;">&quot;def&quot;</span>,<span style="color:#800080;">5.5</span>,<span style="color: #000040;">-</span><span style="color: #0000dd;">1</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
  data.<span style="color: #007788;">push_back</span><span style="color: #008000;">&#40;</span>make_tuple<span style="color: #008000;">&#40;</span><span style="color: #FF0000;">&quot;wolf&quot;</span>,<span style="color: #000040;">-</span><span style="color:#800080;">3.47</span>,<span style="color: #0000dd;">1</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
  sort<span style="color: #008000;">&#40;</span>data.<span style="color: #007788;">begin</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>,data.<span style="color: #007788;">end</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>,mycompare<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
  <span style="color: #0000ff;">for</span><span style="color: #008000;">&#40;</span>vector<span style="color: #000080;">&lt;</span>mytuple<span style="color: #000080;">&gt;</span><span style="color: #008080;">::</span><span style="color: #007788;">iterator</span> iter <span style="color: #000080;">=</span> data.<span style="color: #007788;">begin</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span> iter <span style="color: #000040;">!</span><span style="color: #000080;">=</span> data.<span style="color: #007788;">end</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span> iter<span style="color: #000040;">++</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#123;</span>
    <span style="color: #0000dd;">cout</span> <span style="color: #000080;">&lt;&lt;</span> get<span style="color: #000080;">&lt;</span><span style="color: #0000dd;">0</span><span style="color: #000080;">&gt;</span><span style="color: #008000;">&#40;</span><span style="color: #000040;">*</span>iter<span style="color: #008000;">&#41;</span> <span style="color: #000080;">&lt;&lt;</span> <span style="color: #FF0000;">&quot;<span style="color: #000099; font-weight: bold;">\t</span>&quot;</span> <span style="color: #000080;">&lt;&lt;</span> get<span style="color: #000080;">&lt;</span><span style="color: #0000dd;">1</span><span style="color: #000080;">&gt;</span><span style="color: #008000;">&#40;</span><span style="color: #000040;">*</span>iter<span style="color: #008000;">&#41;</span> <span style="color: #000080;">&lt;&lt;</span> <span style="color: #FF0000;">&quot;<span style="color: #000099; font-weight: bold;">\t</span>&quot;</span> <span style="color: #000080;">&lt;&lt;</span> get<span style="color: #000080;">&lt;</span><span style="color: #0000dd;">2</span><span style="color: #000080;">&gt;</span><span style="color: #008000;">&#40;</span><span style="color: #000040;">*</span>iter<span style="color: #008000;">&#41;</span> <span style="color: #000080;">&lt;&lt;</span> endl<span style="color: #008080;">;</span>
  <span style="color: #008000;">&#125;</span>
<span style="color: #008000;">&#125;</span></pre></td></tr></table></div>

<p>The code is successfully compiled by G++ 4.6.1 with the option &#8220;-std=gnu++0x&#8221;.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.hongliangjie.com/2011/10/10/sortin-tuples-in-c/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Two Forms of Logistic Regression</title>
		<link>http://www.hongliangjie.com/2011/10/03/two-forms-of-logistic-regression/</link>
		<comments>http://www.hongliangjie.com/2011/10/03/two-forms-of-logistic-regression/#comments</comments>
		<pubDate>Tue, 04 Oct 2011 01:18:29 +0000</pubDate>
		<dc:creator>Liangjie Hong</dc:creator>
				<category><![CDATA[Research in General]]></category>

		<guid isPermaLink="false">http://www.hongliangjie.com/?p=735</guid>
		<description><![CDATA[There are two forms of Logistic Regression used in literature. In this post, I will build a bridge between these two forms and show they are equivalent.<p class="read-more"><a href="http://www.hongliangjie.com/2011/10/03/two-forms-of-logistic-regression/">Read more &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<p>There are two forms of Logistic Regression used in literature. In this post, I will build a bridge between these two forms and show they are equivalent.</p>
<h2>Logistic Function &amp; Logistic Regression</h2>
<p>The common definition of Logistic Function is as follows:<br />
\[<br />
P(x) = \frac{1}{1+\exp(-x)} \;\; \qquad (1)<br />
\] where \(x \in \mathbb{R} \) is the variable of the function and \(P(x) \in [0,1]\). One important property of Equation (1) is that:<br />
\[ \begin{eqnarray}<br />
P(-x) &amp;=&amp; \frac{1}{1+\exp(x)} \nonumber \\<br />
&amp;=&amp; \frac{1}{1+\frac{1}{\exp(-x)}} \nonumber \\<br />
&amp;=&amp; \frac{\exp(-x)}{1+\exp(-x)} \nonumber \\<br />
&amp;=&amp; 1 - \frac{1}{1+\exp(-x)} \nonumber \\<br />
&amp;=&amp; 1 - P(x) \; \; \qquad (2)<br />
\end{eqnarray} \]The form of Equation (2) is widely used as the form of Logistic Regression (e.g., [1,2,3]):<br />
\[ \begin{eqnarray}<br />
P(y = 1 \, | \, \boldsymbol{\beta}, \mathbf{x}) &amp;=&amp; \frac{\exp(\boldsymbol{\beta}^{T} \mathbf{x})}{1 + \exp(\boldsymbol{\beta}^{T} \mathbf{x})} \nonumber \\<br />
P(y = 0 \, | \, \boldsymbol{\beta}, \mathbf{x}) &amp;=&amp; \frac{1}{1 + \exp(\boldsymbol{\beta}^{T} \mathbf{x})} \;\; \qquad (3)<br />
\end{eqnarray} \] where \(\mathbf{x}\) is a feature vector and \(\boldsymbol{\beta}\) is a coefficient vector. By using Equation (2), we also have:<br />
\[ \begin{equation}<br />
P(y=1 \, | \, \boldsymbol{\beta}, \mathbf{x}) = 1 - P(y=0 \, | \, \boldsymbol{\beta}, \mathbf{x})<br />
\end{equation} \] This formalism of Logistic Regression is used in [1,2] where labels \( y \in \{0,1\} \) and the functional form of the probability to generate different labels is different. Another formalism introduced in [3] unified the two forms into one single equation by integrating the label and the prediction together:<br />
\[ \begin{equation}<br />
P(g= \pm 1 \, | \, \boldsymbol{\beta}, \mathbf{x}) = \frac{1}{1 + \exp( - g\boldsymbol{\beta}^{T} \mathbf{x})} \;\; \qquad (4)<br />
\end{equation} \]where \( g \in \{\pm 1\} \) is the label for data item \( x \). It is also easily to verify that \( P(g=1 \, | \, \boldsymbol{\beta}, \mathbf{x}) = 1 &#8211; P(g=-1 \, | \, \boldsymbol{\beta}, \mathbf{x}) \).</p>
<h2>The Equivalence of Two Forms of Logistic Regression</h2>
<p>At first glance, the form (3) and the form (4) looks very different. However, the equivalence between these two forms can be easily established. Starting from the form (3), we can have:<br />
\[ \begin{eqnarray}<br />
P(y = 1 \, | \, \boldsymbol{\beta}, \mathbf{x}) &amp;=&amp; \frac{\exp(\boldsymbol{\beta}^{T} \mathbf{x})}{1 + \exp(\boldsymbol{\beta}^{T} \mathbf{x})} \nonumber \\<br />
&amp;=&amp; \frac{1}{\frac{1}{\exp(\boldsymbol{\beta}^{T} \mathbf{x})} + 1} \nonumber \\<br />
&amp;=&amp; \frac{1}{\exp(-\boldsymbol{\beta}^{T} \mathbf{x}) + 1} \nonumber \\<br />
&amp;=&amp; P(g= 1 \, | \, \boldsymbol{\beta}, \mathbf{x})<br />
\end{eqnarray} \]We can also establish the equivalence between \( P(y=0 \, | \, \boldsymbol{\beta}, \mathbf{x})\) and \(P(g=-1 \, | \, \boldsymbol{\beta}, \mathbf{x})\) easily by using property (2). Another way to establish the equivalence is from the classification rule. For the form (3), we have the following classification rule:<br />
\[ \begin{eqnarray}<br />
\frac{\frac{\exp(\boldsymbol{\beta}^{T} \mathbf{x})}{1 + \exp(\boldsymbol{\beta}^{T} \mathbf{x})}}{\frac{1}{1 + \exp(\boldsymbol{\beta}^{T} \mathbf{x})}} &amp; &gt; &amp; 1 \;\; \rightarrow \;\; y = 1 \nonumber \\<br />
\exp(\boldsymbol{\beta}^{T} \mathbf{x}) &amp; &gt; &amp; 1 \nonumber \\<br />
\boldsymbol{\beta}^{T} \mathbf{x} &amp; &gt; &amp; 0<br />
\end{eqnarray} \]An exactly same classification rule for the form (4) can also be obtained as:<br />
\[ \begin{eqnarray}<br />
\frac{\frac{1}{1 + \exp( - \boldsymbol{\beta}^{T} \mathbf{x})}}{\frac{1}{1 + \exp( \boldsymbol{\beta}^{T} \mathbf{x})}} &amp; &gt; &amp; 1 \;\; \rightarrow \;\; g = 1 \nonumber \\<br />
\frac{1 + \exp(\boldsymbol{\beta}^{T} \mathbf{x})}{1 + \exp( - \boldsymbol{\beta}^{T} \mathbf{x})} &amp; &gt; &amp; 1 \nonumber \\<br />
\exp(\boldsymbol{\beta}^{T} \mathbf{x}) &amp; &gt; &amp; 1 \nonumber \\<br />
\boldsymbol{\beta}^{T} \mathbf{x} &amp; &gt; &amp; 0<br />
\end{eqnarray} \] Therefore, we can see that two forms essentially learn the same classification boundary.</p>
<h2>Logistic Loss</h2>
<p>Since we establish the equivalence of two forms of Logistic Regression, it is convenient to use the second form as it can be explained by a general classification framework. Here, we assume \( y \) is the label of data and \( \mathbf{x} \) is a feature vector. The classification framework can be formalized as follows:<br />
\[ \begin{equation}<br />
\arg\min \sum_{i} L\Bigr(y_{i},f(\mathbf{x}_{i})\Bigl)<br />
\end{equation}\]where \(f\) is a hypothesis function and \(L\) is loss function. For Logistic Regression, we have the following instantiation:<br />
\[ \begin{eqnarray}<br />
f(\mathbf{x}) &amp;=&amp; \boldsymbol{\beta}^{T} \mathbf{x} \nonumber \\<br />
L\Bigr(y,f(\mathbf{x})\Bigl) &amp;=&amp; \log \Bigr( 1 + \exp(-y f(\mathbf{x})\Bigl)<br />
\end{eqnarray}\]where \(y \in \{ \pm 1 \}\).</p>
<h2>References</h2>
<p>[1] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc., New York, NY, USA, 2001.<br />
[2] Tom M. Mitchell. Machine learning. McGraw Hill series in computer science. McGraw-Hill, 1997.<br />
[3] Jason D. M. Rennie. <a href="http://people.csail.mit.edu/jrennie/writing/lr.pdf" target="_blank">Logistic Regression</a>. http://people.csail.mit.edu/jrennie/writing, April 2003.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.hongliangjie.com/2011/10/03/two-forms-of-logistic-regression/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Random Number Generation with C++ 0x in GCC</title>
		<link>http://www.hongliangjie.com/2011/09/16/random-number-generation-with-c-0x-in-gcc/</link>
		<comments>http://www.hongliangjie.com/2011/09/16/random-number-generation-with-c-0x-in-gcc/#comments</comments>
		<pubDate>Sat, 17 Sep 2011 00:38:41 +0000</pubDate>
		<dc:creator>Liangjie Hong</dc:creator>
				<category><![CDATA[Practical Programming]]></category>

		<guid isPermaLink="false">http://www.hongliangjie.com/?p=725</guid>
		<description><![CDATA[In this post, I woud like to explore how random number would be generated by using newly added features from C++ 0x in GCC. Rather than explaining the details of C++ 0x, I just post code here: 1 2 3 &#8230;<p class="read-more"><a href="http://www.hongliangjie.com/2011/09/16/random-number-generation-with-c-0x-in-gcc/">Read more &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<p>In this post, I woud like to explore how random number would be generated by using newly added features from C++ 0x in GCC.</p>
<p>Rather than explaining the details of C++ 0x, I just post code here:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
</pre></td><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #339900;">#include &lt;random&gt;</span>
<span style="color: #339900;">#include &lt;iostream&gt;</span>
<span style="color: #339900;">#include &lt;string&gt;</span>
&nbsp;
<span style="color: #0000ff;">using</span> <span style="color: #0000ff;">namespace</span> std<span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">int</span> main<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
  <span style="color: #0000ff;">int</span> n<span style="color: #008080;">;</span>
  <span style="color: #0000ff;">double</span> p, lambda, shape, mu, sigma<span style="color: #008080;">;</span>
  mt19937 eng<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">time</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">NULL</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
  uniform_int_distribution<span style="color: #000080;">&lt;</span><span style="color: #0000ff;">int</span><span style="color: #000080;">&gt;</span> uniform_int<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">3</span>,<span style="color: #0000dd;">7</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
  <span style="color: #0000ff;">for</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">int</span> i<span style="color: #000080;">=</span><span style="color: #0000dd;">0</span><span style="color: #008080;">;</span> i <span style="color: #000080;">&lt;</span> <span style="color: #0000dd;">10</span><span style="color: #008080;">;</span> i<span style="color: #000040;">++</span><span style="color: #008000;">&#41;</span>
    <span style="color: #0000dd;">cout</span> <span style="color: #000080;">&lt;&lt;</span> <span style="color: #FF0000;">&quot;[Uniform INT distribution]:&quot;</span> <span style="color: #000080;">&lt;&lt;</span> uniform_int<span style="color: #008000;">&#40;</span>eng<span style="color: #008000;">&#41;</span> <span style="color: #000080;">&lt;&lt;</span> endl<span style="color: #008080;">;</span>
  <span style="color: #0000dd;">cout</span> <span style="color: #000080;">&lt;&lt;</span> endl<span style="color: #008080;">;</span>
&nbsp;
  uniform_real_distribution<span style="color: #000080;">&lt;</span><span style="color: #0000ff;">double</span><span style="color: #000080;">&gt;</span> uniform_real<span style="color: #008000;">&#40;</span><span style="color:#800080;">0.0</span>,<span style="color:#800080;">1.0</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
  <span style="color: #0000ff;">for</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">int</span> i<span style="color: #000080;">=</span><span style="color: #0000dd;">0</span><span style="color: #008080;">;</span> i <span style="color: #000080;">&lt;</span> <span style="color: #0000dd;">10</span><span style="color: #008080;">;</span> i<span style="color: #000040;">++</span><span style="color: #008000;">&#41;</span>
    <span style="color: #0000dd;">cout</span> <span style="color: #000080;">&lt;&lt;</span> <span style="color: #FF0000;">&quot;[Uniform REAL distribution]:&quot;</span> <span style="color: #000080;">&lt;&lt;</span> uniform_real<span style="color: #008000;">&#40;</span>eng<span style="color: #008000;">&#41;</span> <span style="color: #000080;">&lt;&lt;</span> endl<span style="color: #008080;">;</span>
  <span style="color: #0000dd;">cout</span> <span style="color: #000080;">&lt;&lt;</span> endl<span style="color: #008080;">;</span>
&nbsp;
  n <span style="color: #000080;">=</span> <span style="color: #0000dd;">5</span><span style="color: #008080;">;</span>
  p <span style="color: #000080;">=</span> <span style="color:#800080;">0.3</span><span style="color: #008080;">;</span>
  binomial_distribution<span style="color: #000080;">&lt;</span><span style="color: #0000ff;">int</span><span style="color: #000080;">&gt;</span> binomial<span style="color: #008000;">&#40;</span>n, p<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
  <span style="color: #0000ff;">for</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">int</span> i<span style="color: #000080;">=</span><span style="color: #0000dd;">0</span><span style="color: #008080;">;</span> i <span style="color: #000080;">&lt;</span> <span style="color: #0000dd;">10</span><span style="color: #008080;">;</span> i<span style="color: #000040;">++</span><span style="color: #008000;">&#41;</span>
    <span style="color: #0000dd;">cout</span> <span style="color: #000080;">&lt;&lt;</span> <span style="color: #FF0000;">&quot;[Binomial distribution]:&quot;</span> <span style="color: #000080;">&lt;&lt;</span> binomial<span style="color: #008000;">&#40;</span>eng<span style="color: #008000;">&#41;</span> <span style="color: #000080;">&lt;&lt;</span> endl<span style="color: #008080;">;</span>
  <span style="color: #0000dd;">cout</span> <span style="color: #000080;">&lt;&lt;</span> endl<span style="color: #008080;">;</span>
&nbsp;
  lambda <span style="color: #000080;">=</span> <span style="color:#800080;">4.0</span><span style="color: #008080;">;</span>
  exponential_distribution<span style="color: #000080;">&lt;</span><span style="color: #0000ff;">double</span><span style="color: #000080;">&gt;</span> exponential<span style="color: #008000;">&#40;</span>lambda<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
  <span style="color: #0000ff;">for</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">int</span> i<span style="color: #000080;">=</span><span style="color: #0000dd;">0</span><span style="color: #008080;">;</span> i <span style="color: #000080;">&lt;</span> <span style="color: #0000dd;">10</span><span style="color: #008080;">;</span> i<span style="color: #000040;">++</span><span style="color: #008000;">&#41;</span>
    <span style="color: #0000dd;">cout</span> <span style="color: #000080;">&lt;&lt;</span> <span style="color: #FF0000;">&quot;[Exponential distribution]:&quot;</span> <span style="color: #000080;">&lt;&lt;</span> exponential<span style="color: #008000;">&#40;</span>eng<span style="color: #008000;">&#41;</span> <span style="color: #000080;">&lt;&lt;</span> endl<span style="color: #008080;">;</span>
  <span style="color: #0000dd;">cout</span> <span style="color: #000080;">&lt;&lt;</span> endl<span style="color: #008080;">;</span>
&nbsp;
  shape <span style="color: #000080;">=</span> <span style="color:#800080;">3.0</span><span style="color: #008080;">;</span>
  gamma_distribution<span style="color: #000080;">&lt;</span><span style="color: #0000ff;">double</span><span style="color: #000080;">&gt;</span> gamma<span style="color: #008000;">&#40;</span>shape<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
  <span style="color: #0000ff;">for</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">int</span> i<span style="color: #000080;">=</span><span style="color: #0000dd;">0</span><span style="color: #008080;">;</span> i <span style="color: #000080;">&lt;</span> <span style="color: #0000dd;">10</span><span style="color: #008080;">;</span> i<span style="color: #000040;">++</span><span style="color: #008000;">&#41;</span>
    <span style="color: #0000dd;">cout</span> <span style="color: #000080;">&lt;&lt;</span> <span style="color: #FF0000;">&quot;[Gamma distribution]:&quot;</span> <span style="color: #000080;">&lt;&lt;</span> gamma<span style="color: #008000;">&#40;</span>eng<span style="color: #008000;">&#41;</span> <span style="color: #000080;">&lt;&lt;</span> endl<span style="color: #008080;">;</span>
  <span style="color: #0000dd;">cout</span> <span style="color: #000080;">&lt;&lt;</span> endl<span style="color: #008080;">;</span>
&nbsp;
  p <span style="color: #000080;">=</span> <span style="color:#800080;">0.5</span><span style="color: #008080;">;</span>
  geometric_distribution<span style="color: #000080;">&lt;</span><span style="color: #0000ff;">int</span><span style="color: #000080;">&gt;</span> geometric<span style="color: #008000;">&#40;</span>p<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
  <span style="color: #0000ff;">for</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">int</span> i<span style="color: #000080;">=</span><span style="color: #0000dd;">0</span><span style="color: #008080;">;</span> i <span style="color: #000080;">&lt;</span> <span style="color: #0000dd;">10</span><span style="color: #008080;">;</span> i<span style="color: #000040;">++</span><span style="color: #008000;">&#41;</span>
    <span style="color: #0000dd;">cout</span> <span style="color: #000080;">&lt;&lt;</span> <span style="color: #FF0000;">&quot;[Geometric distribution]:&quot;</span> <span style="color: #000080;">&lt;&lt;</span> geometric<span style="color: #008000;">&#40;</span>eng<span style="color: #008000;">&#41;</span> <span style="color: #000080;">&lt;&lt;</span> endl<span style="color: #008080;">;</span>
  <span style="color: #0000dd;">cout</span> <span style="color: #000080;">&lt;&lt;</span> endl<span style="color: #008080;">;</span>
&nbsp;
  mu <span style="color: #000080;">=</span> <span style="color:#800080;">3.0</span><span style="color: #008080;">;</span> sigma <span style="color: #000080;">=</span> <span style="color:#800080;">4.0</span><span style="color: #008080;">;</span>
  normal_distribution<span style="color: #000080;">&lt;</span><span style="color: #0000ff;">double</span><span style="color: #000080;">&gt;</span> normal<span style="color: #008000;">&#40;</span>mu, sigma<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
  <span style="color: #0000ff;">for</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">int</span> i<span style="color: #000080;">=</span><span style="color: #0000dd;">0</span><span style="color: #008080;">;</span> i <span style="color: #000080;">&lt;</span> <span style="color: #0000dd;">10</span><span style="color: #008080;">;</span> i<span style="color: #000040;">++</span><span style="color: #008000;">&#41;</span>
    <span style="color: #0000dd;">cout</span> <span style="color: #000080;">&lt;&lt;</span> <span style="color: #FF0000;">&quot;[Gaussian distribution]:&quot;</span> <span style="color: #000080;">&lt;&lt;</span> normal<span style="color: #008000;">&#40;</span>eng<span style="color: #008000;">&#41;</span> <span style="color: #000080;">&lt;&lt;</span> endl<span style="color: #008080;">;</span>
  <span style="color: #0000dd;">cout</span> <span style="color: #000080;">&lt;&lt;</span> endl<span style="color: #008080;">;</span>
&nbsp;
  lambda <span style="color: #000080;">=</span> <span style="color:#800080;">7.0</span><span style="color: #008080;">;</span>
  poisson_distribution<span style="color: #000080;">&lt;</span><span style="color: #0000ff;">int</span><span style="color: #000080;">&gt;</span> poisson<span style="color: #008000;">&#40;</span>lambda<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
  <span style="color: #0000ff;">for</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">int</span> i<span style="color: #000080;">=</span><span style="color: #0000dd;">0</span><span style="color: #008080;">;</span> i <span style="color: #000080;">&lt;</span> <span style="color: #0000dd;">10</span><span style="color: #008080;">;</span> i<span style="color: #000040;">++</span><span style="color: #008000;">&#41;</span>
    <span style="color: #0000dd;">cout</span> <span style="color: #000080;">&lt;&lt;</span> <span style="color: #FF0000;">&quot;[Poission distribution]:&quot;</span> <span style="color: #000080;">&lt;&lt;</span> poisson<span style="color: #008000;">&#40;</span>eng<span style="color: #008000;">&#41;</span> <span style="color: #000080;">&lt;&lt;</span> endl<span style="color: #008080;">;</span>
  <span style="color: #0000dd;">cout</span> <span style="color: #000080;">&lt;&lt;</span> endl<span style="color: #008080;">;</span>
&nbsp;
  p <span style="color: #000080;">=</span> <span style="color:#800080;">0.6</span><span style="color: #008080;">;</span>
  bernoulli_distribution bernoulli<span style="color: #008000;">&#40;</span>p<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
  <span style="color: #0000ff;">for</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">int</span> i<span style="color: #000080;">=</span><span style="color: #0000dd;">0</span><span style="color: #008080;">;</span> i <span style="color: #000080;">&lt;</span> <span style="color: #0000dd;">10</span><span style="color: #008080;">;</span> i<span style="color: #000040;">++</span><span style="color: #008000;">&#41;</span>
    <span style="color: #0000dd;">cout</span> <span style="color: #000080;">&lt;&lt;</span> <span style="color: #FF0000;">&quot;[Bernoulli distribution]:&quot;</span> <span style="color: #000080;">&lt;&lt;</span> bernoulli<span style="color: #008000;">&#40;</span>eng<span style="color: #008000;">&#41;</span> <span style="color: #000080;">&lt;&lt;</span> endl<span style="color: #008080;">;</span>
  <span style="color: #0000dd;">cout</span> <span style="color: #000080;">&lt;&lt;</span> endl<span style="color: #008080;">;</span>
&nbsp;
  <span style="color: #0000ff;">return</span> <span style="color: #008000;">&#40;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span>
<span style="color: #008000;">&#125;</span></pre></td></tr></table></div>

<p>Note, the code is very clean in the sense that you don&#8217;t need any extra libraries at all.</p>
<p>Please compile with:</p>

<div class="wp_syntax"><div class="code"><pre class="shell" style="font-family:monospace;">g++ -std=gnu++0x</pre></div></div>

<p>The GCC I used is GCC 4.6.1</p>
<p>Several references:<br />
1. http://www.johndcook.com/test_TR1_random.html (sort of out-dated)<br />
2. http://www.johndcook.com/cpp_TR1_random.html (sort of out-dated)<br />
3. http://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.200x (current status of C++ 0x in GCC)<br />
4. http://gcc.gnu.org/onlinedocs/libstdc++/manual/bk01pt01ch03s02.html ( a list of header files of C++ in GCC)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.hongliangjie.com/2011/09/16/random-number-generation-with-c-0x-in-gcc/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Axiomatic Analysis and Optimization of Information Retrieval Models</title>
		<link>http://www.hongliangjie.com/2011/09/14/axiomatic-analysis-and-optimization-of-information-retrieval-models/</link>
		<comments>http://www.hongliangjie.com/2011/09/14/axiomatic-analysis-and-optimization-of-information-retrieval-models/#comments</comments>
		<pubDate>Wed, 14 Sep 2011 09:10:03 +0000</pubDate>
		<dc:creator>Liangjie Hong</dc:creator>
				<category><![CDATA[IR]]></category>

		<guid isPermaLink="false">http://www.hongliangjie.com/?p=720</guid>
		<description><![CDATA[This is an &#8220;unusual&#8221; research aspect of Information Retrieval (IR). By trying to compare and analyze different IR models in a formal way, Axiomatic Framework can show some interesting and even astonishing results of IR models. For instance, it can show &#8230;<p class="read-more"><a href="http://www.hongliangjie.com/2011/09/14/axiomatic-analysis-and-optimization-of-information-retrieval-models/">Read more &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<p>This is an &#8220;unusual&#8221; research aspect of Information Retrieval (IR). By trying to compare and analyze different IR models in a formal way, Axiomatic Framework can show some interesting and even astonishing results of IR models. For instance, it can show that IR models should satisfy certain number of constraints. If a model cannot satisfy some of them, we can expect its performance being worse. This is the type of comparison without any experiments at all, though the claims are indeed justified by empirical studies.</p>
<p>Materials:</p>
<ul>
<li><strong>Axiomatic Analysis and Optimization of Information Retrieval Models</strong> by <a href="http://www.cs.uiuc.edu/~czhai/" target="_blank">ChengXiang Zhai</a> at <a href="http://www.ictir11.org/" target="_blank">ICTIR11</a> [<a href="http://www.hongliangjie.com/wp-content/uploads/2011/09/ictir11-keynote-final.ppt">Slides</a>]</li>
<li>Yuanhua Lv, ChengXiang Zhai. <strong>Lower-Bounding Term Frequency Normalization</strong>. <em>Proceedings of </em><em>the 20th ACM International Conference on Information and Knowledge Management</em><em> </em> (<strong>CIKM&#8217;11</strong>), 2011. [<a href="http://sifaka.cs.uiuc.edu/~ylv2/pub/cikm11-lowerbound.pdf" target="_blank">PDF</a>]</li>
<li>Hui Fang, Tao Tao, and Chengxiang Zhai. 2011. <strong>Diagnostic Evaluation of Information Retrieval Models</strong>. <em>ACM Transactions on Information Systems (<strong>TOIS</strong>)</em> 29, 2, Article 7 (April 2011), 42 pages. [<a href="http://sifaka.cs.uiuc.edu/czhai/pub/tois-diag.pdf" target="_blank">PDF</a>]</li>
<li>Hui Fang, ChengXiang Zhai, <strong>Semantic Term Matching in Axiomatic Approaches to Information Retrieval</strong>. <em>Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval </em>(<strong> SIGIR&#8217;06 </strong>), pages 115-122. [<a href="http://sifaka.cs.uiuc.edu/czhai/pub/sigir06-semantic.pdf" target="_blank">PDF</a>]</li>
<li>Hui Fang, ChengXiang Zhai, <strong>An Exploration of Axiomatic Approach to Information Retrieval</strong>. <em>Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval </em>(<strong> SIGIR&#8217;05 </strong>), 480-487, 2005. [<a href="http://sifaka.cs.uiuc.edu/czhai/pub/sigir05-axiom.pdf" target="_blank">PDF</a>]</li>
<li>Hui Fang, Tao Tao, ChengXiang Zhai, <strong>A formal study of information retrieval heuristics</strong>. <em>Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval </em>(<strong> SIGIR&#8217;04</strong>), pages 49-56, 2004. [<a href="http://sifaka.cs.uiuc.edu/czhai/pub/sigir04-formal.pdf" target="_blank">PDF</a>]</li>
<li><a href="http://www.eecis.udel.edu/~hfang/" target="_blank">Hui Fang</a>&#8216;s PhD dissertation. [<a href="http://www.ideals.illinois.edu/bitstream/handle/2142/11352/An%20Axiomatic%20Approach%20to%20Information%20Retrieval.pdf?sequence=2" target="_blank">PDF</a>]</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.hongliangjie.com/2011/09/14/axiomatic-analysis-and-optimization-of-information-retrieval-models/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Topic Models meet Latent Factor Models</title>
		<link>http://www.hongliangjie.com/2011/08/30/topic-models-meet-latent-factor-models/</link>
		<comments>http://www.hongliangjie.com/2011/08/30/topic-models-meet-latent-factor-models/#comments</comments>
		<pubDate>Tue, 30 Aug 2011 21:16:44 +0000</pubDate>
		<dc:creator>Liangjie Hong</dc:creator>
				<category><![CDATA[Collaborative Filtering]]></category>
		<category><![CDATA[Topic Model]]></category>

		<guid isPermaLink="false">http://www.hongliangjie.com/?p=650</guid>
		<description><![CDATA[In this section, I wish to discuss efforts to combine Topic Models and Latent Factor Models.<p class="read-more"><a href="http://www.hongliangjie.com/2011/08/30/topic-models-meet-latent-factor-models/">Read more &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<p style="text-align: justify;">There is a trend in research communities to bring two well-established classes of models together, topic models and latent factor models. By doing so, we may enjoy the ability to analyze text information with topic models and incorporate the collaborative filtering analysis with latent factor models. In this section, I wish to discuss some of these efforts.</p>
<p style="text-align: justify;">Three papers will be covered in this post are listed at the end of the post. Before that, let&#8217;s first review what latent factor models are. Latent factor models (LFM) are usually used in collaborative filtering context. Say, we have a user-item rating matrix \( \mathbf{R} \) where \( r_{ij} \) represents the rating user \( i \) gives to item \( j \). Now, we assume for each user \( i \), there is a vector \( \mathbf{u}_{i} \) with the dimensionality \( k \), representing the user in a latent space. Similarly, we assume for each item \( j \), a vector \( \mathbf{v}_{j} \) with the same dimensionality representing the item in a same latent space. Thus, the rating \( r_{ij} \) is therefore represented as:<br />
\[ r_{ij} = \mathbf{u}_{i}^{T} \mathbf{v}_{j} \]This is the basic setting for LFM. In addition to this basic setting, additional biases can be incorporated, see <a href="http://courses.ischool.berkeley.edu/i290-dm/s11/SECURE/a1-koren.pdf" target="_blank">here</a>. For topic models (TM), the simplest case is Latent Dirichlet Allocation (LDA). The story of LDA is like this. For a document \( d \), we first sample a multinomial distribution \( \boldsymbol{\theta}_{d} \), which is a distribution over all possible topics. For each term position \( w \) in the document, we sample a discrete topic assignment \( z \) from \( \boldsymbol{\theta}_{d} \), indicating which topic we use for this term. Then, we sample a term \( v \) from a topic \( \boldsymbol{\beta} \), a multinomial distribution over the vocabulary.</p>
<p style="text-align: justify;">For both LFM and TM, they are methods to reduce original data into latent spaces. Therefore, it might be possible to link them together. Especially, items in the LFM are associated with rich text information. One natural idea is that, for an item \( j \), the latent factor \( \mathbf{v}_{j} \) and its topic proportional parameter \( \boldsymbol{\theta}_{j} \) somehow gets connected. One way is to directly equalize these two variables. Since \( \mathbf{v}_{j} \) is a real-value variable and \( \boldsymbol{\theta}_{j} \) falls into a simplex, we need certain ways to keep these properties. Two possible methods can be used:</p>
<ol style="text-align: justify;">
<li>Keep \( \boldsymbol{\theta}_{j} \) and make sure it is in the range of [0, 1] in the optimization process. Essentially put some constraint on the parameter.</li>
<li>Keep \( \mathbf{v}_{j} \) and use logistic transformation to transfer a real-valued vector into simplex.</li>
</ol>
<p style="text-align: justify;">Hanhuai and Banerjee showed the second technique in their paper by combining Correlated Topic Model with LFM. Wang and Blei argued that this setting suffers from the limitation that it cannot distinguish topics for explaining recommendations from topics important for explaining content since the latent space is strictly equal. Thus, they proposed a slightly different approach. Namely, each \( \mathbf{v}_{j} \) derives from \( \boldsymbol{\theta}_{j} \) with item-dependent noise:<br />
\[ \mathbf{v}_{j} = \boldsymbol{\theta}_{j} + \epsilon_{j} \] where \( \epsilon_{j} \) is a Gaussian noise.</p>
<p style="text-align: justify;">A different approach is to not directly equal these two quantities but let me impact these each other. One such way explored by Hanhuai and Banerjee is that \( \boldsymbol{\theta}_{j} \) influences how \( \mathbf{v}_{j} \) is generated. More specifically, in Probabilistic Matrix Factorization (PMF) setting, all \( \mathbf{v} \)s are generated by a Gaussian distribution with a fixed mean and variance. Now, by combining LDA, the authors allow different topic has different Gaussian prior mean and variance values. A value similar to \( z \) is firstly generated from \( \boldsymbol{\theta}_{j} \) to decide which mean to use and then generate \( \mathbf{v}_{j} \) from that particular mean and variance.</p>
<p style="text-align: justify;">A totally different direction was taken by Agarwal and Chen. In their fLDA paper, there is no direct relationship between item latent factor and content latent factor. In fact, their relationship is realized by the predictive equation:<br />
\[ r_{ij} = \mathbf{a}^{T} \mathbf{u}_{i} + \mathbf{b}^{T} \mathbf{v}_{j} + \mathbf{s}_{i}^{T} \bar{\mathbf{z}}_{j}<br />
\]where \( \mathbf{a} \), \( \mathbf{b} \) and \(\mathbf{s}_{i} \) are regression weights and \( \bar{\mathbf{z}}_{j} \) is the average topic assignments for item \( j \). Note, \(\mathbf{s}_{i} \) is a user-dependent regression weights. This formalism encodes the notion that all latent factors (including content) will contribute to the rating, not only item and user factors.</p>
<p style="text-align: justify;">In summary, three directions have been taken for integrating TM and LFM:</p>
<ol>
<li style="text-align: justify;">Equal item latent factor and topic proportion vector, or make some Gaussian noise.</li>
<li style="text-align: justify;">Let topic proportion vector to control the prior distribution for item latent factor.</li>
<li style="text-align: justify;">Let item latent factor and topic assignments, as well as user latent factor, contribute the rating.</li>
</ol>
<p style="text-align: justify;">Reference:</p>
<ul>
<li style="text-align: justify;">Deepak Agarwal and Bee-Chung Chen. 2010. <strong>fLDA: matrix factorization through latent dirichlet allocation</strong>. In Proceedings of the third ACM international conference on Web search and data mining (WSDM &#8217;10). ACM, New York, NY, USA, 91-100. [<a href="http://www.wsdm-conference.org/2010/proceedings/docs/p91.pdf" target="_blank">PDF</a>]</li>
<li style="text-align: justify;">Hanhuai Shan and Arindam Banerjee. 2010. <strong>Generalized Probabilistic Matrix Factorizations for Collaborative Filtering</strong>. In Proceedings of the 2010 IEEE International Conference on Data Mining (ICDM &#8217;10). IEEE Computer Society, Washington, DC, USA, 1025-1030. [<a href="http://www-users.cs.umn.edu/~shan/icdm10_gpmf.pdf" target="_blank">PDF</a>]</li>
<li style="text-align: justify;">Chong Wang and David M. Blei. 2011. <strong>Collaborative topic modeling for recommending scientific articles</strong>. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD &#8217;11). ACM, New York, NY, USA, 448-456.[<a href="http://www.cs.princeton.edu/~blei/papers/WangBlei2011.pdf" target="_blank">PDF</a>]</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.hongliangjie.com/2011/08/30/topic-models-meet-latent-factor-models/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>An Easy Reading Tutorial for Bayesian Non-parametric Models</title>
		<link>http://www.hongliangjie.com/2011/08/27/an-easy-reading-tutorial-for-bayesian-non-parametric-models/</link>
		<comments>http://www.hongliangjie.com/2011/08/27/an-easy-reading-tutorial-for-bayesian-non-parametric-models/#comments</comments>
		<pubDate>Sat, 27 Aug 2011 23:13:44 +0000</pubDate>
		<dc:creator>Liangjie Hong</dc:creator>
				<category><![CDATA[Topic Model]]></category>

		<guid isPermaLink="false">http://www.hongliangjie.com/?p=623</guid>
		<description><![CDATA[The &#8220;god-father&#8221; of LDA, David Blei, recently published a tutorial on Bayesian Non-parametric Models, with one of his student. The whole tutorial is easy-reading and provides very clear overview of Bayesian Non-parametric Models. In particular, Chinese Restaurant Process (CRP) and &#8230;<p class="read-more"><a href="http://www.hongliangjie.com/2011/08/27/an-easy-reading-tutorial-for-bayesian-non-parametric-models/">Read more &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<p style="text-align: justify;">The &#8220;god-father&#8221; of <a href="http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation" target="_blank">LDA</a>, <a href="http://www.cs.princeton.edu/~blei/" target="_blank">David Blei</a>, recently published a <a href="http://www.cs.princeton.edu/~blei/papers/GershmanBlei2011.pdf" target="_blank">tutorial on Bayesian Non-parametric Models</a>, with one of his student. The whole tutorial is easy-reading and provides very clear overview of Bayesian Non-parametric Models. In particular, Chinese Restaurant Process (CRP) and Indian Buffet Process are discussed in a very intuitive way. For those who are interests in technical details about these models, this tutorial may be just a starting point and the Appendix points out several ways to discuss models more formally, including inference algorithms.</p>
<p>One specific interesting property shown in this tutorial is the &#8220;<em>exchangeable</em>&#8221; property for CRP, which I wish to re-state as below.</p>
<p style="text-align: justify;">Let \( c_{n} \) be the table assignment of the \(n\)th customer. A draw from CPR can be generated by sequentially assigning observations to classes with probability:<br />
\[P(c_{n} = k | \mathbf{c}_{1:n-1}) = \begin{cases}<br />
\frac{m_{k}}{n-1+\alpha}, &amp; \mbox{if } k \leq \mathbf{K}_{+} \mbox{ (i.e., $k$ is a previously occupied table)} \\<br />
\frac{\alpha}{n-1+\alpha}, &amp; \mbox{otherwise (i.e., $k$ is the next unoccupied table)}<br />
\end{cases}\]where \( m_{k} \) is the number of customers sitting at table \( k \), and \( \mathbf{K}_{+} \) is the number of tables for which \( m_{k} &gt; 0 \). The parameter \( \alpha \) is called the concentration parameter. The CRP exhibits an important invariance property: The cluster assignments under this distribution are exchangeable. This means \( p(\mathbf{c}) \) is unchanged if the order of customers is shuffled.</p>
<p style="text-align: justify;">Consider the joint distribution of a set of customer assignments \( c_{1:N} \). It decomposes according to the chain rule:<br />
\[p(c_{1}, c_{2}, \cdots , c_{N}) = p(c_{1}) p(c_{2} | c_{1}) \cdots p(c_{N} | c_{1}, c_{2}, \cdots , c_{N-1}) \]where each terms comes from above equation. To show that this distribution is exchangeable, we will introduce some new notation. Let \( \mathbf{K}(c_{1:N}) \) be the number of groups in which these assignments place the customers, which is a number between 1 and \( N \). Let \( I_{k} \) be the set of indices of customers assigned to the \(k\)th group, and let \( N_{k} \) be the number of customers assigned to that group. Now, for a particular group \( k \) the joint probability of all assignments in this group is:<br />
\[ \frac{\alpha}{I_{k,1}-1+\alpha} \frac{1}{I_{k,2}-1+\alpha} \frac{2}{I_{k,3}-1+\alpha} \cdots \frac{N_{k}-1}{I_{k,N}-1+\alpha} \]where each term in the equation represents a customer. The numerator can be re-written as \( \alpha (N_{k}-1)!\). Therefore, we have:<br />
\[ p(c_{1}, c_{2}, \cdots , c_{N}) = \prod_{k=1}^{K} \frac{\alpha (N_{k}-1)!}{(I_{k,1}-1+\alpha)(I_{k,2}-1+\alpha)\cdots (I_{k,N_{k}}-1+\alpha)} \]Finally, notice that the union of \( \mathbf{I}_{k} \) across all groups \(k\) identifies each index once, because each customer is assigned to exactly one group. This simplifies the denominator and let us write the joint as:<br />
\[ p(c_{1}, c_{2}, \cdots , c_{N}) = \frac{\alpha^{K} \prod_{k=1}^{K} (N_{k}-1)! }{\prod_{i=1}^{N} (i-1+\alpha)} \]This equation only depends on the number of groups \(\mathbf{K}\) and the size of each group \(\mathbf{N}_{k}\).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.hongliangjie.com/2011/08/27/an-easy-reading-tutorial-for-bayesian-non-parametric-models/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Simple Geographical Calculations</title>
		<link>http://www.hongliangjie.com/2011/05/27/simple-geographical-calculation/</link>
		<comments>http://www.hongliangjie.com/2011/05/27/simple-geographical-calculation/#comments</comments>
		<pubDate>Fri, 27 May 2011 04:33:35 +0000</pubDate>
		<dc:creator>Liangjie Hong</dc:creator>
				<category><![CDATA[Research in General]]></category>

		<guid isPermaLink="false">http://www.hongliangjie.com/?p=563</guid>
		<description><![CDATA[In this post, I would like to share some simple code to calculate geographical distances by using latitude and longitude points from some third-party services. This is particular useful when we wish to compute the average distances users travel from the check-in or geo-tagging information from Twitter, for instance. The code is straightforward and simple.<p class="read-more"><a href="http://www.hongliangjie.com/2011/05/27/simple-geographical-calculation/">Read more &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<p>In this post, I would like to share some simple code to calculate geographical distances by using latitude and longitude points from some third-party services. This is particular useful when we wish to compute the average distances users travel from the check-in or geo-tagging information from Twitter, for instance. The code is straightforward and simple.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
</pre></td><td class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">math</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">sys</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">os</span>
&nbsp;
<span style="color: #808080; font-style: italic;">## Convert a location into 3d Corordinates</span>
<span style="color: #808080; font-style: italic;">## location is a list of [latitude,longtidue]</span>
<span style="color: #808080; font-style: italic;">## return: a list of [x,y,z]</span>
<span style="color: #ff7700;font-weight:bold;">def</span> convert_location_cor<span style="color: black;">&#40;</span>location<span style="color: black;">&#41;</span>:
    x_n = <span style="color: #dc143c;">math</span>.<span style="color: black;">cos</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">math</span>.<span style="color: black;">radians</span><span style="color: black;">&#40;</span>location<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span> <span style="color: #66cc66;">*</span> <span style="color: #dc143c;">math</span>.<span style="color: black;">cos</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">math</span>.<span style="color: black;">radians</span><span style="color: black;">&#40;</span>location<span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    y_n = <span style="color: #dc143c;">math</span>.<span style="color: black;">cos</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">math</span>.<span style="color: black;">radians</span><span style="color: black;">&#40;</span>location<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span> <span style="color: #66cc66;">*</span> <span style="color: #dc143c;">math</span>.<span style="color: black;">sin</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">math</span>.<span style="color: black;">radians</span><span style="color: black;">&#40;</span>location<span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    z_n = <span style="color: #dc143c;">math</span>.<span style="color: black;">sin</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">math</span>.<span style="color: black;">radians</span><span style="color: black;">&#40;</span>location<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: black;">&#91;</span>x_n,y_n,z_n<span style="color: black;">&#93;</span>
&nbsp;
<span style="color: #808080; font-style: italic;">## Convert a 3d Corordinates into a location</span>
<span style="color: #808080; font-style: italic;">## cor is a list of [x,y,z]                                                                                                                          </span>
<span style="color: #808080; font-style: italic;">## return: a list of [latitude, longtitude]</span>
<span style="color: #ff7700;font-weight:bold;">def</span> convert_cor_location<span style="color: black;">&#40;</span>cor<span style="color: black;">&#41;</span>:
    r = <span style="color: #dc143c;">math</span>.<span style="color: black;">sqrt</span><span style="color: black;">&#40;</span>cor<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span> <span style="color: #66cc66;">*</span> cor<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span> + cor<span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span> <span style="color: #66cc66;">*</span> cor<span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>+ cor<span style="color: black;">&#91;</span><span style="color: #ff4500;">2</span><span style="color: black;">&#93;</span> <span style="color: #66cc66;">*</span> cor<span style="color: black;">&#91;</span><span style="color: #ff4500;">2</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
    lat = <span style="color: #dc143c;">math</span>.<span style="color: black;">asin</span><span style="color: black;">&#40;</span>cor<span style="color: black;">&#91;</span><span style="color: #ff4500;">2</span><span style="color: black;">&#93;</span> / r<span style="color: black;">&#41;</span>
    log = <span style="color: #dc143c;">math</span>.<span style="color: black;">atan2</span><span style="color: black;">&#40;</span>cor<span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>, cor<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: black;">&#91;</span><span style="color: #dc143c;">math</span>.<span style="color: black;">degrees</span><span style="color: black;">&#40;</span>lat<span style="color: black;">&#41;</span>,<span style="color: #dc143c;">math</span>.<span style="color: black;">degrees</span><span style="color: black;">&#40;</span>log<span style="color: black;">&#41;</span>,<span style="color: #dc143c;">math</span>.<span style="color: black;">degrees</span><span style="color: black;">&#40;</span>r<span style="color: black;">&#41;</span><span style="color: black;">&#93;</span>
&nbsp;
<span style="color: #808080; font-style: italic;">## Compute the geographical midpoint of a set of locations</span>
<span style="color: #808080; font-style: italic;">## location_list is a list of locations [locaiton 0, location 1, location 2]</span>
<span style="color: #808080; font-style: italic;">## return: the location of midpoint                                                                                              </span>
<span style="color: #ff7700;font-weight:bold;">def</span> geo_midpoint<span style="color: black;">&#40;</span>location_list<span style="color: black;">&#41;</span>:
    x_list = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    y_list = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    z_list = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #008000;">len</span><span style="color: black;">&#40;</span>location_list<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>:
	m = convert_location_cor<span style="color: black;">&#40;</span>location_list<span style="color: black;">&#91;</span>i<span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
	x_list.<span style="color: black;">append</span><span style="color: black;">&#40;</span>m<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
	y_list.<span style="color: black;">append</span><span style="color: black;">&#40;</span>m<span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
	z_list.<span style="color: black;">append</span><span style="color: black;">&#40;</span>m<span style="color: black;">&#91;</span><span style="color: #ff4500;">2</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
    x_mean = <span style="color: #008000;">sum</span><span style="color: black;">&#40;</span>x_list<span style="color: black;">&#41;</span> / <span style="color: #008000;">float</span><span style="color: black;">&#40;</span><span style="color: #008000;">len</span><span style="color: black;">&#40;</span>location_list<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    y_mean = <span style="color: #008000;">sum</span><span style="color: black;">&#40;</span>y_list<span style="color: black;">&#41;</span> / <span style="color: #008000;">float</span><span style="color: black;">&#40;</span><span style="color: #008000;">len</span><span style="color: black;">&#40;</span>location_list<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    z_mean = <span style="color: #008000;">sum</span><span style="color: black;">&#40;</span>z_list<span style="color: black;">&#41;</span> / <span style="color: #008000;">float</span><span style="color: black;">&#40;</span><span style="color: #008000;">len</span><span style="color: black;">&#40;</span>location_list<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> convert_cor_location<span style="color: black;">&#40;</span><span style="color: black;">&#91;</span>x_mean,y_mean,z_mean<span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #808080; font-style: italic;">## Compute the distance between two locations</span>
<span style="color: #808080; font-style: italic;">## a and b are two locations: [lat 1, lon 1] [lat 2, lon 2]</span>
<span style="color: #808080; font-style: italic;">## return: the distance in KM</span>
<span style="color: #ff7700;font-weight:bold;">def</span> geo_distance<span style="color: black;">&#40;</span>a,b<span style="color: black;">&#41;</span>:
    theta = a<span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span> - b<span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>
    dist = <span style="color: #dc143c;">math</span>.<span style="color: black;">sin</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">math</span>.<span style="color: black;">radians</span><span style="color: black;">&#40;</span>a<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span> <span style="color: #66cc66;">*</span> <span style="color: #dc143c;">math</span>.<span style="color: black;">sin</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">math</span>.<span style="color: black;">radians</span><span style="color: black;">&#40;</span>b<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span> \
     + <span style="color: #dc143c;">math</span>.<span style="color: black;">cos</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">math</span>.<span style="color: black;">radians</span><span style="color: black;">&#40;</span>a<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span> <span style="color: #66cc66;">*</span> <span style="color: #dc143c;">math</span>.<span style="color: black;">cos</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">math</span>.<span style="color: black;">radians</span><span style="color: black;">&#40;</span>b<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span> <span style="color: #66cc66;">*</span> <span style="color: #dc143c;">math</span>.<span style="color: black;">cos</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">math</span>.<span style="color: black;">radians</span><span style="color: black;">&#40;</span>theta<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    dist = <span style="color: #dc143c;">math</span>.<span style="color: black;">acos</span><span style="color: black;">&#40;</span>dist<span style="color: black;">&#41;</span>
    dist = <span style="color: #dc143c;">math</span>.<span style="color: black;">degrees</span><span style="color: black;">&#40;</span>dist<span style="color: black;">&#41;</span>
    distance = dist <span style="color: #66cc66;">*</span> <span style="color: #ff4500;">60</span> <span style="color: #66cc66;">*</span> <span style="color: #ff4500;">1.1515</span> <span style="color: #66cc66;">*</span> <span style="color: #ff4500;">1.609344</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> distance
&nbsp;
<span style="color: #808080; font-style: italic;">## main program</span>
<span style="color: #ff7700;font-weight:bold;">if</span> __name__ == <span style="color: #483d8b;">'__main__'</span>:
    l_list = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    l_list.<span style="color: black;">append</span><span style="color: black;">&#40;</span><span style="color: black;">&#91;</span>-<span style="color: #ff4500;">8.70934</span>,<span style="color: #ff4500;">115.173695</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
    l_list.<span style="color: black;">append</span><span style="color: black;">&#40;</span><span style="color: black;">&#91;</span>-<span style="color: #ff4500;">8.70934</span>,<span style="color: #ff4500;">115.235514</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
    l_list.<span style="color: black;">append</span><span style="color: black;">&#40;</span><span style="color: black;">&#91;</span>-<span style="color: #ff4500;">8.591728</span>,<span style="color: #ff4500;">115.235514</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
    l_list.<span style="color: black;">append</span><span style="color: black;">&#40;</span><span style="color: black;">&#91;</span>-<span style="color: #ff4500;">8.591728</span>,<span style="color: #ff4500;">115.173695</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
    midpoint = geo_midpoint<span style="color: black;">&#40;</span>l_list<span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">print</span> geo_distance<span style="color: black;">&#40;</span><span style="color: black;">&#91;</span>-<span style="color: #ff4500;">8.70934</span>,<span style="color: #ff4500;">115.173695</span><span style="color: black;">&#93;</span>,<span style="color: black;">&#91;</span>-<span style="color: #ff4500;">8.70934</span>,<span style="color: #ff4500;">115.235514</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

]]></content:encoded>
			<wfw:commentRss>http://www.hongliangjie.com/2011/05/27/simple-geographical-calculation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Must Read for Logistic Regression</title>
		<link>http://www.hongliangjie.com/2011/05/17/a-must-read-for-logistic-regression/</link>
		<comments>http://www.hongliangjie.com/2011/05/17/a-must-read-for-logistic-regression/#comments</comments>
		<pubDate>Tue, 17 May 2011 06:55:23 +0000</pubDate>
		<dc:creator>Liangjie Hong</dc:creator>
				<category><![CDATA[Research in General]]></category>

		<guid isPermaLink="false">http://www.hongliangjie.com/?p=535</guid>
		<description><![CDATA[A simple summary of a must-read for logistic regression and the discussions of simple generative models and discriminative models.<p class="read-more"><a href="http://www.hongliangjie.com/2011/05/17/a-must-read-for-logistic-regression/">Read more &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<p style="text-align: justify;">I came across an old technical report written by <a href="http://www.cs.berkeley.edu/~jordan/" target="_blank">Michael Jordan</a> (no, not the basketball guy):</p>
<p style="text-align: justify;">&#8220;<a href="http://www.cs.berkeley.edu/~jordan/papers/uai.ps" target="_blank">Why the logistic function? A tutorial discussion on probabilities and neural networks</a>&#8220;. M. I. Jordan. MIT Computational Cognitive Science Report 9503, August 1995.</p>
<p style="text-align: justify;">The material is amazingly straightforward and easy to understand. It answers (or at least partially) a long-standing question for me, why the form of logistic function is used in regression? Regardless of how it was used in the first place, the report shows that it is actually can be derived from a simple binary classification case where we wish to estimate the posterior probability: \[ P(w_{0}|\mathbf{x}) = \frac{P(\mathbf{x}|w_{0})P(w_{0})}{P(\mathbf{x})} \]<br />
where \( w_{0} \) can be thought as class label and \( \mathbf{x} \) can be treated as feature vector. We can expand the denominator and introduce an exponential:<br />
\[ P(w_{0}|\mathbf{x}) = \frac{P(\mathbf{x}|w_{0})P(w_{0})}{P(\mathbf{x}|w_{0})P(w_{0})+P(\mathbf{x}|w_{1})P(w_{1})}=\frac{1}{1+\exp\{-\log a - \log b\}} \]<br />
where \( a=\frac{P(\mathbf{x}|w_{0})}{P(\mathbf{x}|w_{1})} \) and \( b= \frac{P(w_{0})}{P(w_{1})} \). Without achieving anything but only through mathematical maneuvering, we have already had the flavor how logistic function can be derived from simple classification problems. Now, if we specify a particular distribution form of \( P(\mathbf{x}|w)\) ( the class-conditional densities), for instance, Gaussian distribution, we can recover the logistic regression easily.</p>
<p style="text-align: justify;">However, the whole point of the report is not just to show where logistic function comes into play, but showing how discriminative models and generative models in this particular setting are only the two sides of the same coin. In addition, Jordan demonstrated that these two sides are simply <strong><em>NOT</em></strong> equivalent but should be treated carefully when different learning criteria is considered. In general, a simple take-away is that the discriminative model (logistic regression) is more &#8220;robust&#8221; where generative model might be more accurate if the assumption is correct.</p>
<p style="text-align: justify;">More details, please refer to the report.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.hongliangjie.com/2011/05/17/a-must-read-for-logistic-regression/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Some Recent Papers About Topic Models</title>
		<link>http://www.hongliangjie.com/2011/05/01/some-recent-papers-about-topic-models/</link>
		<comments>http://www.hongliangjie.com/2011/05/01/some-recent-papers-about-topic-models/#comments</comments>
		<pubDate>Sun, 01 May 2011 19:15:51 +0000</pubDate>
		<dc:creator>Liangjie Hong</dc:creator>
				<category><![CDATA[Research in General]]></category>
		<category><![CDATA[Topic Model]]></category>

		<guid isPermaLink="false">http://www.hongliangjie.com/?p=519</guid>
		<description><![CDATA[In this post, I would like to talk about several recent papers about topic models. These papers may not belong to the same direction of applying or extending topic models. However, some of them are quite interesting and worth to be discussed here.<p class="read-more"><a href="http://www.hongliangjie.com/2011/05/01/some-recent-papers-about-topic-models/">Read more &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<p style="text-align: justify;">In this post, I would like to talk about several recent papers about topic models. These papers may not belong to the same direction of applying or extending topic models. However, some of them are quite interesting and worth to be discussed here.</p>
<p style="text-align: justify;">The first one is</p>
<p style="padding-left: 30px; text-align: justify;">Enhong Chen, Yanggang Lin, Hui Xiong, Qiming Luo, and Haiping Ma. 2011. <strong>Exploiting probabilistic topic models to improve text categorization under class imbalance</strong>. <em>Journal of Information Processing and Management.</em> 47, 2 (March 2011), 202-214.</p>
<p style="text-align: justify;">The idea is straightforward and simple. The author proposed a two-step approach to mitigate the problem of unbalanced data. The first step is to learn topic models from the existing unbalanced data. Here, for each class label, a separate set of topics is learned. Once the models are obtained, synthetic documents or new samples are drawn from learned models. This is possible since topic distribution and word distribution are fixed after learning process. The number of new samples is determined by the difference between the dominant class and the rare class. A more aggressive method is also proposed, which is used to avoid noisy labeled data. The idea is to use all synthetic samples to train a classifier, rather than original samples. The experimental results demonstrate some performance improvement of this method over other ones that are proposed to tackle the same problem.</p>
<p style="text-align: justify;">The second paper is</p>
<p style="text-align: justify; padding-left: 30px;">Wayne Xin Zhao, Jing Jiang, Hongfei Yan, and Xiaoming Li. 2010. <strong><a href="http://aclweb.org/anthology/D/D10/D10-1006.pdf" target="_blank">Jointly modeling aspects and opinions with a MaxEnt-LDA hybrid</a></strong>. In <em>Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing</em> (EMNLP &#8217;10). Association for Computational Linguistics, Stroudsburg, PA, USA, 56-65.</p>
<p style="text-align: justify;">The paper is interesting because it also demonstrates a method to incorporate term-level features into a topic model. The list of features for each term is embedded through a Maximum Entropy Model. The supervised learning part of the model learns the fixing weights of these features and Gibbs sampling for the topic model uses these weights. For details, please refer to the paper.</p>
<p style="text-align: justify;">The next one is</p>
<p style="text-align: justify; padding-left: 30px;">Xin Zhao, Jing Jiang, Jianshu Weng, Jing He, Ee-Peng Lim, Hongfei Yan and Xiaoming Li. <a href="http://www.mysmu.edu/faculty/jingjiang/papers/ECIR'11.pdf"><strong>Comparing Twitter and traditional media using topic models</strong>.</a> In <em>Proceedings of the 33rd European Conference on Information Retrieval</em> (ECIR&#8217;11) (full paper), 2011.</p>
<p style="text-align: justify;">The paper has several interesting aspects. First, it is claimed as a first study of topics obtained on Twitter and other traditional media. The authors use a standard LDA model to discover topics from NewYorkTimes corpus and a modified topic model for Twitter, separately. Then, they proposed a heuristic method to map Twitter topics onto NYT topics.  In addition, they manually assigned <em>topic types</em> to all the topics found by models. By doing all these, common topics and corpus-specific topics are obtained heuristically. It&#8217;s a little bit strange that they do not consider any techniques to mine topics from multiple corpus. Secondly, they do not compare to the method where only LDA is used. Note, the same Twitter-LDA is used in:</p>
<p style="text-align: justify; padding-left: 30px;">Xin Zhao, Jing Jiang, Jing He, Yang Song, Palakorn Achanauparp, Ee-Peng Lim and Xiaoming Li. <a href="http://www.mysmu.edu/faculty/jingjiang/papers/ACL'11.pdf"><strong>Topical keyphrase extraction from Twitter</strong>.</a> To appear in <em>Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies</em> (ACL-HLT&#8217;11) (long paper), 2011.</p>
<p style="text-align: justify;">&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.hongliangjie.com/2011/05/01/some-recent-papers-about-topic-models/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Reviews on Binary Matrix Decomposition</title>
		<link>http://www.hongliangjie.com/2011/03/15/reviews-on-binary-matrix-decomposition/</link>
		<comments>http://www.hongliangjie.com/2011/03/15/reviews-on-binary-matrix-decomposition/#comments</comments>
		<pubDate>Tue, 15 Mar 2011 05:31:20 +0000</pubDate>
		<dc:creator>Liangjie Hong</dc:creator>
				<category><![CDATA[Collaborative Filtering]]></category>
		<category><![CDATA[Research in General]]></category>

		<guid isPermaLink="false">http://www.hongliangjie.com/?p=510</guid>
		<description><![CDATA[In this post, I would like to review several existing techniques to binary matrix decomposition.<p class="read-more"><a href="http://www.hongliangjie.com/2011/03/15/reviews-on-binary-matrix-decomposition/">Read more &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<p style="text-align: justify;">In this post, I would like to review several existing techniques to binary matrix decomposition.</p>
<p style="text-align: justify;">&nbsp;</p>
<ul style="text-align: justify;">
<li>Andrew I. Schein, Lawrence K.  Saul, and Lyle H. Ungar. <strong><a href="http://www.andrewschein.com/publications/ssu-aistat2003.pdf" target="_blank">A Generalized Linear Model for Principal Component Analysis of Binary Data</a></strong>. Appeared in Proceedings of the 9&#8242;th International Workshop on Artificial Intelligence and Statistics. January 3-6, 2003. Key West, FL.<br />
This paper introduced a logistic version of PCA to binary data. The model assumes that each observation is from a single latent factor and there exists multiple latent factors. The model is quite straightforward and the inference is been done by Alternative Least Square.</li>
<li>Tao Li. 2005. <strong><a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.125.6643&amp;rep=rep1&amp;type=pdf" target="_blank">A general model for clustering binary data</a></strong>. In <em>Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining</em> (KDD &#8217;05). ACM, New York, NY, USA, 188-197.<br />
In this paper, the author introduced the problem of &#8220;binary data decomposition&#8221;. The paper demonstrated several techniques that are popular for normal matrix factorization to binary data, like k-means, spectral clustering. The proposed method is to factorize the binary matrix into two binary matrices, where the binary indicators suggest membership.</li>
<li>Tomas Singliar and Milos Hauskrecht. 2006. <strong><a href="http://jmlr.csail.mit.edu/papers/volume7/singliar06a/singliar06a.pdf" target="_blank">Noisy-OR Component Analysis and its Application to Link Analysis</a></strong>. <em>J. Mach. Learn. Res.</em> 7 (December 2006), 2189-2213.<br />
This paper introduced a probabilistic view of binary data. Like other latent factor models, each observation can be viewed as a sample from multiple binary latent Bernoulli factors, essentially a mixture model. A variational inference is conducted in the paper. The weak part of the paper is that the comparison of the model with PLSA and LDA is not quite convincing.</li>
<li>Zhongyuan Zhang, Tao Li, Chris Ding, and Xiangsun Zhang. 2007. <strong><a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.111.8668&amp;rep=rep1&amp;type=pdf" target="_blank">Binary Matrix Factorization with Applications</a></strong>. In <em>Proceedings of the 2007 Seventh IEEE International Conference on Data Mining</em> (ICDM &#8217;07). IEEE Computer Society, Washington, DC, USA, 391-400.<br />
This paper indeed introduced a variant of Non-negative Matrix Factorization to binary data, meaning that a binary matrix will be always decomposed into two matrices bounded by 0 to 1. The proposed method is a modification of NMF. However, in a document clustering problem, the performance difference between proposed method and NMF is very small.</li>
<li>Miettinen, P.; Mielikainen, T.; Gionis, A.; Das, G.; Mannila, H.; , &#8220;<strong><a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.9941&amp;rep=rep1&amp;type=pdf" target="_blank">The  Discrete Basis Problem,</a></strong>&#8221; <em>Knowledge and Data Engineering, IEEE Transactions  on</em> , vol.20, no.10, pp.1348-1362, Oct. 2008.<br />
Miettinen, P.; , &#8220;<strong><a href="http://www.computer.org/portal/web/csdl/doi/10.1109/ICDM.2010.93" target="_blank">Sparse Boolean Matrix Factorizations</a></strong>,&#8221; <em>Data Mining (ICDM),  2010 IEEE 10th International Conference on</em> , vol., no., pp.935-940, 13-17  Dec. 2010<br />
These two papers stated another view of factorization of binary data. Rather than directly using some SVD based or NMF based methods, these papers introduced a &#8220;cover&#8221; based discrete optimization method to the problem. However, through experiments, the performance advantages over traditional SVD or NMF methods are not very clear. Another drawback of their method is that some other existing methods are difficult to be incorporated with.</li>
<li>Andreas P. Streich, Mario Frank, David Basin, and Joachim M. Buhmann. 2009. <strong><a href="http://ml2.inf.ethz.ch/papers/2009/multiassignmentClustering_ICML2009.pdf" target="_blank">Multi-assignment clustering for Boolean data</a></strong>. In <em>Proceedings of the 26th Annual International Conference on Machine Learning</em> (ICML &#8217;09). ACM, New York, NY, USA, 969-976.<br />
This paper introduced a probabilistic view of the binary data. The observation is assumed to be generated either by &#8220;signal&#8221; or by &#8220;noise&#8221;, both are Bernoulli distributions. The switch variable is also sampled from the third Bernoulli distribution. This is essentially a simplified PLSA. The inference is done by deterministic annealing.</li>
<li>Ata Kaban, Ella Bingham, <strong><a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.157.5731&amp;rep=rep1&amp;type=pdf" target="_blank">Factorisation and denoising of 0-1 data: A variational approach</a></strong>, Neurocomputing, Volume 71, Issues 10-12, Neurocomputing for Vision Research; Advances in Blind Signal Processing, June 2008, Pages 2291-2308, ISSN 0925-2312.<br />
This paper is somewhat similar &#8220;Noisy-OR&#8221; model and Logistic PCA as well. However, unlike Logistic PCA, the proposed model is a mixture model, meaning that a single observation is &#8220;generated&#8221; by multiple latent factors. The authors put a Beta prior over latent factors and the inference is done by Variational Inference.<br />
Ella Bingham, Ata Kaban, and Mikael Fortelius. 2009. <strong><a href="http://www.cs.helsinki.fi/u/ebingham/publications/paleo.pdf" target="_blank">The aspect Bernoulli model: multiple causes of presences and absences</a></strong>. <em>Pattern Anal. Appl.</em> 12, 1 (January 2009), 55-78.<br />
This paper goes back to the assumption that each observation is sampled from a simple factor. The inference is done by EM.</li>
</ul>
<p style="text-align: justify;">In all, it seems that the performance advantages of specifically designed binary data models are small. However, the biggest advatange of these model is that they can give better interpretations sometimes. For computational models, NMF seems a good approximation. For probablistic models, a modified PLSA or LDA seems quite resonable.</p>
<div style="text-align: justify;"><span style="font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 16px; line-height: normal;"><br />
</span></div>
]]></content:encoded>
			<wfw:commentRss>http://www.hongliangjie.com/2011/03/15/reviews-on-binary-matrix-decomposition/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

