1. HTML files Original html files crawled from Apple Discussion (https://discussions.apple.com/index.jspa) and Google Earth Community (http://bbs.keyhole.com/ubb/ubbthreads.php/Cat/0). The ground-truth replying relation is encoded by its indent structure in Apple Discussion, and by .tree.html in Google Earth Community. 2. TOEKN files Information extracted from the original html files. Meta data includes: Post ID ("local ID"_"global ID"), author ("ID"_"screen name"), title, content, time, parent ID ("local ID"_"global ID"), Token List, POS List. Originally used in [1] 3. Note Apple Discussion (https://discussions.apple.com/index.jspa) has changed its layout and interface design now. 4. Reference [1] Hongning Wang, Chi Wang, ChengXiang Zhai and Jiawei Han. Learning Online Discussion Structures by Conditional Random Fields. SIGIR'2011, P435-444