Course Project

The course project is to give the students hands-on experience on solving novel and practical information retrieval problems. The project appreciates either research-oriented problems or "deliverables." It is preferred that the outcome of your project could be publishable, e.g., your (unique) solution to some (interesting/important/new) problems, or tangible, e.g., some kind of prototype system that can be demonstrated. Bonus points will be given to the groups meet either one of above criteria. Group work (3-4 students) is required.

General steps

  • Pick a topic
  • Form a team
  • Survey related work
  • Work on the project
  • Write a report
  • Present the project

Your project will be graded based on the following required components:

  • Project proposal (25%)
    • Discuss with the instructor to convince him that your idea is reasonable and worth the effort.
    • Write a two-page proposal to deliver your motivation and management plan for your project.
    • Due by the end of 6th week (Mar. 12th, 11:59pm).
  • Project presentation (40%)
    • 15 minutes presentation (including Q&A) about what you have done for this course project. Format could be tailored according to the nature of the project, e.g., slides presentation and/or system demo.
    • Performance will be graded by both instructor and other students.
  • Project report (35%)
    • Detail written report of your project with the required templates.
    • Quality requirement is the same as research papers, i.e., in formal written English and rigorous paper format.
    • Due date is the last day of the semester.

The official rubric for the final report and rubric for the project presentation is provided here.

Note that you are required to use the provided templates for your project proposal and final report. See the Resources page for the template and example file.

Each team only needs to submit one copy of your proposal/final report; and all students on the same team will recieve the same grade. Please name your submission file in the following way: compID-[compID-]+{proposal, report}.pdf. For example, "hw5x-nw6a-proposal.pdf" and "hw5x-nw6a-report.pdf".

Pick a topic

You can either pick from a list of sample topics provided by the instructor or choose your own topic. Your starting point could be the essays that you wrote for Reading Assignment 1.

Leveraging existing resources is especially encouraged as it allows you to minimize the amount of work that you have to do and focus on developing truly your ideas.

When picking a topic, try to ask yourself the following questions:

  • What is exactly the (research) problem that you want to solve? Will it matter if nobody realizes this problem?
  • What kind of changes could your project make to the others?
  • Is there any existing alternative? If so, why do you still want to do it? How is your idea different from theirs? Would people appreciate about the difference?
  • What would be the major challenge(s) in this problem? Any specific background or resource you have to solve the identified problem?
  • What is the minimum goal to be achieved during this semester? (Try to drop everything non-essential and only keep the part that is truly novel.)
  • How do you plan to demonstrate that method to be developed is indeed solving the pain? Empirical experimentation and/or demo are required, unless you are doing a purely theoretic work.

Form a team

You are encouraged to work with other students as a team. Teams should consist of up to four total students. Teamwork not only gives you some experience on working with others, but also allows you to work on a larger (presumably more important) topic. If your team requires more than four members, please talk to the instructor and convince him.

Note that it is your responsibility to figure out how to contribute to your group project, so you will need to act proactively and in a timely manner if your group leader has not assigned a task to you. The instructor will believe all team members actively contribute to the project and the same grade will be applied to the group member (unless special treatment is required).

Survey related work

While choosing a topic, it is very important to be aware of whether the problem you would like to tackle has already been solved. If so, you may want to figure out where exactly your novelty is and whether novelty leads to any benefit to others. Your goal is to go beyond, rather than simply duplicate, the existing work. To minimize your effort, you are encouraged to leverage existing algorithms, toolkits, and other useful resources as much as possible. The instructor can also help you check related work. Please feel free to discuss your plan with the instructor before finalizing your proposal.

Work on the project

You should leverage any existing tools or methods as much as possible. For example, consider using the Lucene toolkit for indexing and searching in a large text corpus; using Stanford NLP parser or OpenNLP toolkit for text analysis; using MALLET or WEKA for classification or clustering. There are also many tools available on the Internet. See the resources page for some useful pointers. Discuss any problems or issues with your teammates or classmates. If you need special support, please let the instructor know.

Consider documenting your work regularly. This way, you will already have a lot of things written down by the end of the semester. In addition, we strongly suggest using version control for your project! Nothing is more frustrating than losing a lot of your hard work, especially if it's close to a deadline.

Present the course project

At the end of the semester, each project team is expected to present their project in class. The purpose of this presentation is

  • Let you know about others' projects.
  • Give you some opportunity to practice presentation skills, which are very important for a successful career in both academia and industry.
  • Obtain some feedback from others about your project.

In general, the structure of your presentation should be prepared like a conference presentation. So it should touch all the following aspects (text in parenthesis states the instructor's expectation):

  • What is the background/motivation of your work? What research question will you address? (Learn how to attract public attention.)
  • Why is this problem important? (Learn how to persuade others.)
  • Is there any existing work? How novel is yours? (Learn how to sell your ideas.)
  • How did you solve this problem? (Learn how to deliver your solution.)
  • How good was your method? (Learn how to quantitatively/qualitatively evaluate your work.)
  • Any ideas for further improvement? (Learn how to look ahead.)

Think about how you can best present your work so as to make it as easy as possible for your audience to understand your main messages. Try to be concise, to the point. Pictures, illustrations, and examples are generally more effective than text for explaining your project. Try to show screen shots and/or plots of your experimental results. Watching some top conference presentations (e.g., KDD, SIGIR, ICML) on VideoLectures will be beneficial.

In order to be fair to all members in the same group, the instructor will randomly pick team members for question answering during the presentation.

Write a project report

You should write your report as if you were writing a regular conference paper. You should address the same questions as those you have addressed in the proposal and presentation, only with more details. Pay special attention to the challenges that you have solved and your detailed solutions. Basic sections to be included in the report should be the same as those in a conference paper, e.g., abstract, introduction, related work, method, experiment and conclusion. If you are developing a demo system or toolkit, your report should follow the format of a demo paper.

You are required to use LaTeX for your project report. See the Resources page for the template and example file. The project report must be at most ten pages with that format (no minimal requirement, as long as you feel it is sufficient to prove the merit of your work).

The instructor will provide feedback about your course project during the final presentation if we see any way to further improve your work, and bonus points will be given immediately.

Topics suggested by the instructor

  • Interactive Recommendation with Neural Models: Deep neural networks have found its success in recommender systems, such as predicting the items that a user would likely to interact in a sequence (a.k.a sequential recommendation). However, given the complexity of such models, existing neural solutions are all trained offline with log data, which is potential biased and noisy. Learning the model online via the interactions with users would be highly useful and with great practical potential. But how to effectively solicit user feedback to ensure the convergence of model estimation is still largely unknown. The instructors group has recently developed a suite of online bandit learning solutions for neural models to address the challenge; and we are looking for interested students to collaborate on this important direction of research.

  • Fair Information Retrieval: Algorithmic fairness has attracted increasing attention in research communities across a wide spectrum of areas, including information retrieval. The unfairness issue arises in several important IR application scenarios, mostly concerning the ranked output from an IR system. For example, the ranking function greatly influences which products get purchased, which candidates get a job, and which movies get streamed. We would like to focus on two main application scenarios: a recommender system, where the system recommends an ordered list of items to a user; and a document retrieval system, where the system ranks the documents in a descending order of usefulness to a given query. And the project is to identify new fairness metrics and also new solutions to guarantee the systems fairness.

  • Building a Conversational Recommender System: Great amount of research efforts has been devoted to conversational systems (e.g., chat bot) in natural language processing community, and substantial progress has been achieved in both the generality and quality of such systems. Recommendation is supposed to be a two-way communication between a user and the system, and it provides a great opportunity to leverage the successful practices in both domains. In this project, we seek to build a conversational recommendation algorithm, which talks to the users and serves them. Various aspects can be targeted, e.g., effectiveness of the generated conversation for recommendations, connivence and efficiency of the system, and engagement of the users.

  • Utility Inference from Users' Reveiled Preference: User reviews and their logged interactions with a recommender system are gold mine to identify user preference and item utility. However, existing research rigidly treated user-generated data, e.g., review ratings or result clicks, as ground-truth measures of user preference and item utility. This unfortunately ignored the fact that users are not omniscient; and oftentimes, they are also learning the utility of items by interacting with them (e.g., purchase the item). How to model and incorporate this user learning into system learning is an important new area of research. The instructors group has recently developed some techniques to handle such a problem, and we are looking for interested students to join our exploration.

  • Unbiased Offline Learning to Rank: User clicks, though can be collected at a large scale with low cost, are known to be biased and noisy. Necessary debiasing techniques have to be applied to ensure the validity of a trained model from such data. Various unbiased offline learning algorithms have been proposed to tackle the problem, and most of them are based on the Inverse Propensity Scoring (IPS) technique. The instructors group has recently identified a serious issue of using IPS for the purpose of unbiased offline learning to rank and proposed a simple fix to approximately address the issue. We would like to look for interested students to continue this direction of effort to further improve the effectiveness of the proposed solution.

  • Explainable Personalization: The lack of transparency in personalization leaves users in a dilemma: a user can only assess the quality of personalized results by taking the suggested items, e.g., read the recommended articles; however, in order for him/her to adopt the system's customized results, he/she needs to first build trust over the system. Arguably, the most important value of explanations in an information system is not to convince users to adopt customized results (i.e., promotion), but to allow them to make more informed and accurate decisions about which results to utilize (i.e., satisfaction). If users are convinced to accept personalized results that are subsequently found to be inaccurate, their confidence in the system will rapidly deteriorate. How do we explain the personalized result ranking to users? What should be the criterion for our explanation: how personalization was achieved or why the user should be aware of it? What form of explanation would be the most effective? How do we evaluate the effectiveness of the explanations? All such questions need our attention.

  • Optimal Review Ranking for Improving Shopper's Decision Making: Uesr-generated review content is helpful for shoppers to make more informed decisions in online shopping websites. However, due to the large volume of reviews, especially for popular items, it is very hard for anybody to easily find relevant information. Oftentimes, the website only provides some simple, but not very useful, ranking methods, such as rank by time or rating or helpfulness votes. Can we provide a more personalized ranking of reviews for each individual user? Various factors can be considered for this purpose, including the users recent shopping history, browsing history, and property of the items.

  • A Search Engine for Structured Relational Data: Although modern search engine is designed for indexing and searching unstructured data, such as text and image data, there is still high demand for retrieving structured data in practice. Different from traditional SQL-like search scheme, we would like to building a search engine for structured relational data with natural language queries. This is of particular importance for ordinary users to access large amount of data. For example, support querying "finding overheated room in Rice Hall" in a building sensor database. Effective query parsing and translation would be the key of this project.

  • Adversarial Attacks of a Personalized Search Engine: Search personalization improves utility of search engine systems at a per-user basis. It becomes a must-have feature in all major search engines. However, it also makes adversarial attacks possible: malicious users can exploit such features to hajack the search engine, i.e., launch extraction attack to infer the other users' recent search queries and clicks or manipulation attack to influence the other users' search results. There are two sides in this story: attack and defense, both of which are interetsing and receive little attention in the research community yet. We can focus on a particular personalization algorithm (e.g., content-based personalization), study its vulnerability and proposal corresponding solutions for attack/defense.

  • Collaborative Sequential Recommendation: Traditional recommendation algorithms take a static view of users' interest, i.e., in a given period of time the same set of items will be repeatedly recommended to users, as they are believed to be most relevant to the users. This is clearly sub-optimal: when a user has purchased a laptop, it is meaningless to keep recommending another laptop. Recent research focus has been shifted to sequential recommendation, i.e., the next item to be recommended depends on the users' action sequence. It gives the algorithm full flexbility to be contextual and adaptive. Can we leverage this concept with classicial collaborative filtering solutions, i.e., leveraging the action sequence across users, to further improve the quality of recommendation?

  • Deep Understanding of Query Intent: To fulfill a complex information need, such as job hunting, users have to issue a series of queries. Independently analyze such sequentially submitted queries would clearly ignore the important contextual information embedded in the query reformulation chain. Given Recurrent Neural Network (RNN) models have shown great potential in modeling sequence data, we would like to explore its feasibility in query understanding, especially in a sequence of queries. Can we building RNN models for query suggestion or auto-completion?

  • Personalization or Bias?: Modern search engines customize their search results on a per-user basis to satisfy the users' heterogeneous information need. And typical solutions for personalization would promote the results that users searched or clicked before. As a user might be exploring an unfamiliar topic or potential biased towards a specific aspect, will the personalization techniques actually amplifying the bias or worsen the situation? Can we study the side-effect of search personalization techniques?

Rubric for project report

Aspects Score
Strictly follow the provided template [0-10]
Background and research question were clearly stated in the introduction and the logic and argument were reasonable [0-10]
Contribution of the work was properly articulated in the introduction [0-10]
Sufficient discussion of state-of-the-art in related work section [0-10]
Description of the proposed method was clear, comprehensive, coherent and consistent with the claim in the introduction [0-20]
Precise description of experiment design and experimental data set [0-10]
Thorough experimentations that proved all necessary components in the proposed method and detailed analysis of the experimental results [0-20]
Summarization of the work, reasonable discussion of limitation of the proposed solution and future work [0-10]

Rubric for project presentation

Aspects Score
Slides content was clearly visible and self-explainable [0-10]
Presenters were confident about their work and clearly explained it to me [0-10]
Background and research question were clearly highlighted, and the logic and argument were reasonable [0-10]
There was sufficient discussion of state-of-the-art and why do we need this new method [0-10]
Description of the proposed method is clear, comprehensive, coherent and consistent with the claim in the introduction [0-15]
Thorough experimentations that proved all necessary components in the proposed method and detailed analysis of the experimental results [0-15]
The presenters well managed their time during presentation. [0-10]
The presenters did a good job in answering the questions. [0-10]
I like this work! [0-10]

Schedule for project presentation

NameDate Paper Title
Ryan Kann, Malcolm MashigMay 12 Re-ranking Search Results via Interactive Disambiguation
Yushun Dong, Sihang Jiang, Simon Zhu, Hanzhi ZhouMay 12 Promoting Individual Fairness in Graph Neural Networks
Deepak Goel, Zachary Bilmen, Pamela Beardsell, Timothy HanMay 12 A Search Engine for Structured Relational Data
William Helmrath, David Dimmett, Ram Karri, Jihong MinMay 12 Building a Personalized Review Ranker for Improving Shoppers Decision Making
Sung Joon Park, Michael Chang, Henry CarscaddenMay 12 Explainable Personalizations
Ramya Bhaskara, James Perry, LaDawna McEnhimer, Farhan ZamanMay 12 Evaluation of Bias Among Different Age Groups in Google Ads
Rohan Nair, Amar Kulkarni, Christopher Raley, William WongMay 13 Explainable recommendation via multi-task learning in opinionated text data
Jay Rothenberger, Sammy Lahrime, Clara Na, Nikash SethiMay 13 Political Bias Identification and Reduction in News Searches
Theodore Rose, Daniel Wang, Rajiv SarvepalliMay 13 Search Engine Poisoning on Personalized Search Engines
Zhiming Fan, Jingyuan Chou, Wenbo PangMay 13 Conversational Recommendation System
Ben Barrett, Jagroop Sarkaria, Shuche Wang, Fred StoneyMay 13 Alternative Click Models for Learning to Rank
Meng Hua, Siyu Jian, Veena Ramesh, Johannes Johnson May 13 A Search Engine for Structured Relational Data
Aldrick Johan, Keshav Ailaney, Wei Wang, Matthew BaconMay 14 Top-k Lottery Ranking System for Fairness
Jie Fan, Linyang Du, Jiajia LiangMay 14 An Explainable Conversational Movie Recommendation System
Shengyuan Piao, Peiyi Yang, Fangzhou Xu, Zice WeiMay 14 Conversational Restaurant Recommendation System with Yelp Dataset
Mengchen Wang, Zetao Wang, Tianyang Chen, Chenlin LiuMay 14 Mega: A Matadata-Based Search Engine That Provides Explainable Recommendation for E-Commerce Platforms
Joseph Banks, Edward Noe, Julian WilsonMay 14 Natural Language Queries for Lou's List
Parv Ahuja, Richard ParkMay 14 What is Fair? An Exploration of Fairness in Search Queries