SIGIR 2009 Paper
The partial dataset used in the paper “A classification-based approach to question answering in discussion boards,” in SIGIR ‘09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, New York, NY, USA, 2009, pp. 171-178.
Warning: 1) This dataset is not maintained, which means that you need to clean and transform the dataset to a more usable format. 2) The question detection part data for ubuntu forum has missing. I will try my best to find it however there is no guarantee for that part.
The dataset consists of 5 files:
1) ubuntu_threads — content of threads crawled from Ubuntu Forum
2) ubuntu_answer_label — the label of which level (post) is the answer for a particular thread
3) photograph_threads — content of threads crawled from Photograph Forum
4) photograph_answer_label — the label of which level(post) is the answer for a particular thread
5) photograph_question_label — the label of whether the thread is a question thread or not (1 represents question and 2 represents non-questions)
Example format in “ubuntu_threads” and “photograph_threads”