CISC689/489-010: Information Retrieval

Time:M,W 5:00-6:15pm, Spring 2009
Location:Ewing 209
Professor:Ben Carterette (email, web)TA:Rich Burns (email)
Office:440 Smith HallOffice:103 Smith Hall
Office hours:T 11:00-12:00am, Th 2:00-3:00pmOffice hours:M 3:30-5:00pm, T 3:30-5:00pm, W 4:00-5:00pm


Syllabus | Schedule | Homeworks | Project

Course Description

Information retrieval is the study of computational methods for organizing, analyzing, and searching large quantities of semi-structured data. It is one of the oldest areas of computer science, going back nearly 50 years, and it is as relevant today as ever: the amount of information users must deal with is increasing exponentially, and retrieval methods need to keep pace. IR methods are present in everything from web search engines to spam filtering software to news alerts to recommender systems. You will learn how these methods work, how they are implemented, and why they sometimes fail.

This class is about the theory and design of information retrieval systems: how to pre-process information, how to index it, how to compress it, and how to search it. To this end, there will be a ongoing, multi-part project to design and implement a retrieval engine for searching a portion of Wikipedia pages. By the end of the class, you will be able to index and search up to 10% of English-language Wikipedia entries.

Prerequisites: CISC 220 (data structures) or equivalent is required. Background in algorithms, linear algebra, and prob/stats are recommended but not required. CISC 689/489 (AI) is not required (though it cannot hurt). Programming skills in C, C++, or Java are necessary. Unix skills highly recommended.

Please check this page frequently; homeworks, solutions, and readings will be posted regularly.

News

Wednesday, May 24Homework 3 solutions posted.
Wednesday, May 24Homework 2 solutions posted.
Wednesday, May 6Homework 3 posted. This is due Wednesday, May 13.
Monday, Apr. 20Homework 2 posted. This is due Wednesday, Apr. 29.
Sunday, Apr. 19Project phase II part 3 worksheet posted. This is due Friday the 24th.
Sunday, Apr. 19Homework 1 solutions in two parts: No. 1 and 2 and No. 3 and 4 as an Excel spreadsheet.
Thursday, Apr. 2Project phase II part 2 worksheet posted. This is due Wednesday the 15th. There is no programming involved so start early!
Wednesday, Apr. 1Midterm solutions posted.
Tuesday, Mar. 31Project phase II part 1 worksheet posted. This is due Monday the 13th.
Thursday, Mar. 19Midterm review topics posted.
Monday, Mar. 16Homework 1 posted posted. This is due Monday the 23rd.
Sunday, Mar. 15Project phase I part 3 worksheet posted. This is due Monday the 23rd.
Tuesday, Mar. 3Project phase I part 2 worksheet posted. This is due Wednesday the 11th.
Monday, Feb. 16Project phase I part 1 worksheet posted. This is due Wednesday the 25th.
Monday, Feb. 9Project phase 0 worksheet posted. This is due Monday the 16th!