Tools & Data

This page has the following sections:
  1. Data such as topic definitions, relevance judgments, and estimated relevances.
  2. Tools for low-cost evaluation.
  3. Instructions for replicating MQ track results using the statAP and MTC evaluation tools.
  4. Instructions for replicating Web track results using the MTC evaluation tools.
  5. A guide to reusing Web track data.

Data

2009 track data is currently hosted by NIST
on the 2009 MQ track page.

Some additional files are helpful to replicate Web and Relevance Feedback track results:

Tools

Replicating MQ Results

If you participated in the track, you received two additional prels files for your site (prels.mq.base.[site] and prels.mq.reuse.[site]). There are the prels for the queries your site contributed to and the prels for the queries your site was held out of, respectively.

To replicate statMAP results, run the script as follows:

perl statAP_MQ_eval_v4.pl -q prels.mq.base.[site] [run]
perl statAP_MQ_eval_v4.pl -q prels.mq.reuse.[site] [run]
To replicate MTC results, run the program as follows:
evaluate -d [run directory] -q prels.mq.base.[site] -p erels_docsim.20001-60000 -5
evaluate -d [run directory] -q prels.mq.reuse.[site] -p erels_docsim.20001-60000 -5
where [run directory] is a directory containing all of the runs you'd like to evaluate. You can also evaluate individual runs or pairs of runs using -r1 [run] -r2 [run] instead of -d.

Replicating Web Track Results

The ad hoc task of the Web track also used MQ methods to select documents to judge. If you submitted Category B runs, you can evaluate them with both statMAP and MTC (only MTC results were published in the overview paper). If you submitted Category A runs, you can only evaluate them with MTC.

Replicating Category A Evaluation

To replicate MTC results for Category A runs, get
the Category A erels file and run the MTC evaluate utility as follows:
evaluate -5 -o 2 -q prels.1-50 -p erels.catA.1-50 -d [run directory]
where [run directory] is a directory containing all of the runs you'd like to evaluate. You can also evaluate individual runs or pairs of runs using -r1 [run] -r2 [run] instead of -d [run directory].

Replicating Category B Evaluation

To replicate MTC results for Category B runs, first get the Category B relevance judgments and the Category B erels file. The run the MTC evaluate utility as follows:
evaluate -5 -o 2 -q prels.catB.1-50 -p erels.catB.1-50 -d [run directory]
where [run directory] is a directory containing all of the runs you'd like to evaluate. You can also evaluate individual runs or pairs of runs using -r1 [run] -r2 [run] instead of -d [run directory].

You can also use the statAP utility to evaluate Category B runs. To do so, get the the Category B relevance judgments and run the statAP utility as follows:

perl statAP_MQ_eval_v4.pl -q prels.catB.1-50 [run]

Reusability, or How to Evaluate New Runs

Because of the way data was collected, reusing it to evaluate new runs by measures like MAP is unfortunately rather difficult. Our recommendations follow: Use this checkErels.pl script to figure out which of the above options you should take.

Mail Ben Carterette if you have any other questions.