<XML><RECORDS><RECORD><REFERENCE_TYPE>10</REFERENCE_TYPE><REFNUM>8282</REFNUM><AUTHORS><AUTHOR>Macdonald,C.</AUTHOR><AUTHOR>Ounis,I.</AUTHOR></AUTHORS><YEAR>2006</YEAR><TITLE>The TREC Blogs06 Collection : Creating and Analysing a Blog Test Collection</TITLE><PLACE_PUBLISHED>DCS Technical Report Series</PLACE_PUBLISHED><PUBLISHER>Dept of Computing Science, University of Glasgow</PUBLISHER><PAGES>8</PAGES><ISBN>TR-2006-224</ISBN><LABEL>Macdonald:2006:8282</LABEL><KEYWORDS><KEYWORD>Test Collections</KEYWORD></KEYWORDS<ABSTRACT>The explosion of blogs on the Web in recent years has fostered research interest in the Information Retrieval (IR) and other communities into the properties of the so-called `blogsphere'. However, without any standard test collection available, research has been restricted to unshared collections collected by individual research groups. With the advent of the Blog Track running at TREC 2006, there was a need to create a test collection of blog data, that could be shared among participants and form the backbone of the experiments. Such a collection should be a realistic snapshot of the blogsphere, of enough blogs as to have recognisable properties of the blogsphere, and over a long enough time period that events should be recognisable. In addition, the collection should exhibit other properties of the blogsphere, such as splogs and comment spam. This paper describes the creation of the Blogs06 collection by the University of Glasgow, and reports statistics of the collected data. Moreover, we demonstrate how some characteristics of the collection vary across the spam and non-spam components of the collection.</ABSTRACT><URL>http://www.dcs.gla.ac.uk/~craigm/publications/macdonald06creating.pdf</URL></RECORD></RECORDS></XML>