Next-Generation Test Collections (NGTC '10)
Over the last 15 years, Information Retrieval research corpora have experienced more than a thousand-fold increase in size: from the 1990s TIPSTER collections of hundreds of thousands of full-text articles to the 2009 ClueWeb collection of over a billion web pages, researchers are now working with a nearly unimaginable amount of text. The standard evaluation methodology—the Cranfield paradigm of calculating evaluation measures using test collections—has struggled to keep up, as research shows that even test collections for terabyte-sized corpora suffer from unforeseen judgment bias and reusability challenges.
This workshop invites cutting-edge research on tackling the problem of building test collections at the multi-terabyte scale that are realistic, fair, and reusable. The goal of the workshop is to map out the critical research questions that need to be asked and the types of collections we need to consider building in order to answer them.
OrganizersIan Soboroff, NIST
Ben Carterette, University of Delaware
Virgil Pavlu, Northeastern University