CAREER: Human-Centric Knowledge Discovery and Decision Optimization

Principal Investigator: Hongning Wang, CAREER#1553568

Janurary 15, 2016 to April 31, 2021

 

Research Objectives

Humans are both producers and consumers of Big Data. Modeling the role of humans in the process of mining Big Data is thus critical for unleashing the vast potential of mined knowledge in various important domains such as health, education, security, and scientific discovery. The objective of this research is to build a human-centric learning framework, which harnesses the power of human-generated Big Data with computational models. This research project also focuses on incorporating conducted research activities into teaching materials for student training and education in the areas of information retrieval and data mining. In addition, specialized training in data analytics methods and applications is provided to enhance the education of Big Data related techniques for non-STEM and community college students.
Human-centric knowledge discovery and decision optimization

Figure 1. Human-centric knowledge discovery and decision optimization. In this loop, improved systems' utilities can be produced by in-depth understanding of humans (i.e., the flow from humans to systems); and optimized humans' decision making can be realized by customized knowledge services (i.e., the flow from systems to humans). The proposed research tasks are labeled on their corresponding positions in the loop.

Research Thrusts

  1. Joint text and behavior analysis. To exploit as many types of human-generated data as possible and capture the dependencies among them, this project develops a set of novel probabilistic generative models to perform integrative analysis of text and behavior data.
  2. Task-based online decision optimization. Traditional static, ad-hoc and passive machine-human interactions are inadequate to optimize humans' dynamic decision making processes. To address this limitation, users' longitudinal information seeking activities are organized into tasks, where new online learning algorithms are applied to proactively infer users' intents and adapt the systems for long-term utility optimization.
  3. Explainable personalization. Existing personalized systems are black boxes to their users. Users typically have little control over how their information is used to personalize systems. To help ordinary users be aware of how the system's behavior is customized and increase their trust in such systems, statistical learning algorithms are built to generate both system-oriented and user-oriented explanations.
  4. System implementation and prototyping. User studies are conducted in a prototype system integrated with all the algorithms developed in this project to evaluate the deployed algorithms. Evaluation and feedback from real users are circulated back to refine the assumptions and design of the developed algorithms.

Expected Outcome

  • Algorithmic solutions for user behavior modeling and online interactive learning.
  • Open source tools and web services that will provide joint analysis of human-generated text data and behavior data in various applications, such as search logs, forum discussions, and opinionated reviews.
  • Annotated corpora and new evaluation metrics that will enable researchers to conduct follow-up research in related domains.

Acknowledgement

This material is based upon work supported by the National Science Foundation under Grant CAREER#1553568.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Supported Students

  • Qingyun Wu, PhD, 2015-now, area of research: online learning and bandit algorithms
  • Huazheng Wang, PhD, 2015-2019, area of research: online learning and bandit algorithms
  • Lin Gong, PhD, 2015-2019, area of research: user modeling and sentiment analysis
  • Nan Wang, PhD, 2018-now, area of research: explainable recommendation

Research Progress

  1. Hidden Topic Sentiment Model (Thrust I)
  2. Summary: The simple topic-sentiment mixture assumption in existing topic models prohibits them from finding fine-grained dependency between topical aspects and sentiments. In this work, we build a Hidden Topic Sentiment Model (HTSM) to explicitly capture topic coherence and sentiment consistency in an opinionated text document to accurately extract latent aspects and corresponding sentiment polarities. In HTSM, 1) topic coherence is achieved by enforcing words in the same sentence to share the same topic assignment and modeling topic transition between successive sentences; 2) sentiment consistency is imposed by constraining topic transitions via tracking sentiment changes; and 3) both topic transition and sentiment transition are guided by a parameterized logistic function based on the linguistic signals directly observable in a document. Extensive experiments on four categories of product reviews from both Amazon and NewEgg validate the effectiveness of the proposed model.
    Result Highlights:
    perplexity comparison

    Figure T1.1.1 Perplexity comparision against baseline topic models with increasing amount of training data on four different review document sets.

    classification comparison

    Figure T1.1.2 Sentiment classification comparison against baseline topic models with increasing amount of training data on four different review document sets.

    topical transition

    Figure T1.1.3 Topic transition and top words under selected topics estimated from HTSM based on tablet data set.

    summary

    Figure T1.1.4 Aspect-based contrastive summarization from HTSM on tablet dataset.

    Dissemination:
    1. Publication:
      • Md Mustafizur Rahman and Hongning Wang. Hidden Topic Sentiment Model. The 25th International World-Wide Web Conference (WWW'2016), p155-165, 2016. (PDF)
      • Yue Wang, Hongning Wang and Hui Fang. Extracting User-Reported Mobile Application Defects from Online Reviews. 2017 IEEE International Conference on Data Mining Workshops (ICDMW), SENTIRE (2017), p422-429, 2017. (PDF)
    2. Code: Java Implementation of HTSM is available here.
    3. Data sets: NewEgg Reviews (JSON, Readme), Amazon Reviews (JSON, Readme), Manually Selected Prior and Seed Words (ZIP)

    Read More

  3. Accounting for the Correspondence in Commented Data (Thrust I)
  4. Summary: One important way for people to make their voice heard is to comment on the articles they have read online, such as news reports and each other's posts. The user-generated comments together with the commented documents form a unique correspondence structure. Properly modeling the dependency in such data is thus vital for one to obtain accurate insight of people's opinions and attention.
    In this work, we develop a Commented Correspondence Topic Model to model correspondence in commented text data. We focus on two levels of correspondence. First, to capture topic-level correspondence, we treat the topic assignments in commented documents as the prior to their comments' topic proportions. This captures the thematic dependency between commented documents and their comments. Second, to capture word-level correspondence, we utilize the Dirichlet compound multinomial distribution to model topics. This captures the word repetition patterns within the commented data. By integrating these two aspects, our model demonstrated encouraging performance in capturing the correspondence structure, which provides improved results in modeling user-generated content, spam comment detection, and sentence-based comment retrieval compared with state-of-the-art topic model solutions for correspondence modeling.
    Result Highlights:
    topics learnt by CCTM

    Figure T1.2.1 Illustration of the topics learned by CCTM in the ArsTechnica news dataset. In each box of the first level, top 10 words are selected from a global topic; and in the second level, top 10 words from six randomly selected article-comment threads accordingly. The article's title is labeled under each leaf node. Word clouds are used to to highlight the content of selected articles and comments on the second level. The inferred topic distributions in articles and comments are shown at the bottom of this figure.

    spam detection by CCTM

    Figure T1.2.2 Spam comment detection based on the inferred topical correspondence in a news comment data set.

    document retrieval via CCTM

    Figure T1.2.3 Retrieval performance in mapping user comments to selected sentences in news articles.

    Dissemination:
    1. Publication:
      • Renqin Cai, Chi Wang and Hongning Wang. Account for Correspondence in Commented Data. The 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017), p365-374, 2017. (PDF)
    2. Code: Java Implementation of CCTM is available here.
    3. Chrome Extension: A Chrome Extension for amazon.com that automatically maps users' review comments with the product specifications (currently only works for the "Computers & Accessories" category in Amazon). It can be downloaded here.

    Read More

  5. Modeling Student Learning Styles in MOOCs (Thrust I)
  6. Summary: The recorded student activities in Massive Open Online Course (MOOC) provide us a unique opportunity to model their learning behaviors, identify their particular learning intents, and enable personalized assistance and guidance in online education. In this work, based on a thorough qualitative study of students' behaviors recorded in two MOOC courses with large student enrollments, we develop a non-parametric Bayesian model to capture students' sequential learning activities in a generative manner. Homogeneity of students' learning behaviors is captured by clustering them into latent student groups, where shared model structure characterizes the transitional patterns, intensity and temporal distribution of their learning activities. In the meanwhile, heterogeneity is captured by clustering students into different groups. Both qualitative and quantitative studies on those two MOOC courses confirmed the effectiveness of the proposed model in identifying students' learning behavior patterns and clustering them into related groups for predictive analysis. The identified student groups accurately predict student retention, course satisfaction and demographics.
    Result Highlights:
    behavior transition in dropout students

    Figure T1.3.1 Latent learning style transition learned by the L2S model in dropout students.

    behavior transition in non-dropout students

    Figure T1.3.2 Latent learning style transition learned by the L2S model in non-dropout students.

    behavior transition in dropout students

    Figure T1.3.3 Student drop-out prediction performance in the course of Statistical Learning (Winter 2015) from edX.

    behavior transition in non-dropout students

    Figure T1.3.4 Student drop-out prediction performance in the course of Computer Science 101 (Summer 2014) from edX.

    Dissemination:
    1. Publication:
      • Yuling Shi, Zhiyong Peng and Hongning Wang. Modeling Student Learning Styles in MOOCs. The 26th International Conference on Information and Knowledge Management (CIKM 2017), p979-988, 2017. (PDF)
      • Diheng Zhang, Yuling Shi, Hongning Wang, and Bethany A. Teachman. Predictors of Attrition in a Public Online Interpretation Training Program for Anxiety. 51st Annual Convention of Association for Behavioral and Cognitive Therapies (poster paper), 2017. (Poster)
      • Renqin Cai, Xueying Bai, Yuling Shi, Zhenrui Wang, Parikshit Sondhi and Hongning Wang. Modeling Sequential Online Interactive Behaviors with Temporal Point Process. The 27th International Conference on Information and Knowledge Management (CIKM 2018), p873-882, 2018. (PDF)
    2. Code: Python Implementation of L2S model is available here.

    Read More

  7. Modeling Social Norms Evolution for Personalized Sentiment Classification (Thrust I)
  8. Summary: Motivated by the findings in social science that people's opinions are diverse and variable while together they are shaped by evolving social norms, we perform personalized sentiment classification via shared model adaptation over time. In our proposed solution, a global sentiment model is constantly updated to capture the homogeneity in which users express opinions, while personalized models are simultaneously adapted from the global model to recognize the heterogeneity of opinions from individuals. Global model sharing alleviates data sparsity issue, and individualized model adaptation enables efficient online model learning. Extensive experimentations are performed on two large review collections from Amazon and Yelp, and encouraging performance gain is achieved against several state-of-the-art transfer learning and multi-task learning based sentiment classification solutions.
    Result Highlights:
    batch classification

    Figure T1.4.1 Sentiment classification comparison against baseline classifiers on Amazon and Yelp reviews.

    online classification

    Figure T1.4.2 Online Sentiment classification on Amazon reviews.

    Dissemination:
    1. Publication:
      • Lin Gong, Mohammad Al Boni and Hongning Wang. Modeling Social Norms Evolution for Personalized Sentiment Classification. The 54th Annual Meeting of the Association for Computational Linguistics (ACL'2016), p855-865, 2016. (PDF)
      • Lin Gong, Benjamin Haines and Hongning Wang. Clustered Model Adaptation for Personalized Sentiment Analysis. The 26th International World Wide Web Conference (WWW 2017), p937-946, 2017. (PDF)
      • Lin Gong and Hongning Wang. When Sentiment Analysis Meets Social Network: A Holistic User Behavior Modeling in Opinionated Data. The 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2018), p1455-1464, 2018. (PDF, Video)
    2. Code:
      • A Java Implementation of our multi-task linear model adaptation algorithm can be found here.
      • A Java Implementation of clustered linear model adaptation algorithm can be found here.
      • A Java Implementation of our Holistic User Behavior model can be found here.
    3. Data sets:
      • Our experimentation data sets for our clustered model adaptation can be found here.
      • Our experimentation data sets for our holistic user behavior model can be found here.

    Read More

  9. Joint Topic and Network Embedding for User Representation Learning (Thrust I)
  10. Summary: User representation learning is vital to capture diverse user preferences, while it is also challenging as user intents are latent and scattered among complex and different modalities of user-generated data, thus, not directly measurable. Inspired by the concept of user schema in social psychology, we take a new perspective to perform user representation learning by constructing a shared latent space to capture the dependency among different modalities of user-generated data. Both users and topics are embedded to the same space to encode users' social connections and text content, to facilitate joint modeling of different modalities, via a probabilistic generative framework.
    Result Highlights:
    JNET embedding results

    Figure T1.5.1 Visualization of user embedding and learnt topics in 2-D space of Yelp (left) and StackOverflow (right).

    Expert recommendation

    Figure T1.5.2 Expert recommendation on StackOverflow.

    Dissemination:
    1. Publication:
      • Lin Gong, Lu Lin, Weihao Song and Hongning Wang. JNET: Learning User Representations via Joint Network Embedding and Topic Embedding. The 13th ACM International Conference on Web Search and Data Mining (WSDM 2020), p205-213, 2020. (PDF)
      • Lu Lin, Lin Gong and Hongning Wang. Learning Personalized Topical Compositions with Item Response Theory. The 12th ACM International Conference on Web Search and Data Mining (WSDM 2019), p609-617, 2019. (PDF)
    2. Code:
      • A Java Implementation of our joint topic and network embedding algorithm can be found here.
    3. Data sets: Our experimentation data sets can be found here.

    Read More

  11. Contextual Bandits in A Collaborative Environment (Thrust II)
  12. Summary: Contextual bandit algorithms provide principled online learning solutions to find optimal trade-offs between exploration and exploitation with companion side-information. They have been extensively used in many important practical scenarios, such as display advertising and content recommendation. A common practice estimates the unknown bandit parameters pertaining to each user independently. This unfortunately ignores dependency among users and thus leads to suboptimal solutions, especially for the applications that have strong social components.
    In this work, we develop a collaborative contextual bandit algorithm, in which the adjacency graph among users is leveraged to share context and payoffs among neighboring users while online updating. We rigorously prove an improved upper regret bound of the proposed collaborative bandit algorithm comparing to conventional independent bandit algorithms. Extensive experiments on both synthetic and three large-scale real-world datasets verified the improvement of our proposed algorithm against several state-of-the-art contextual bandit algorithms.
    Result Highlights:
    simulation results

    Figure T2.1.1 Regret comparison with baseline multi-armed bandit algorithms in simulation study.

    results on Yahoo

    Figure T2.1.2 Comparison with baseline multi-armed bandit algorithms of normalized Click-Through-Rate on Yahoo data set.

    results on LastFM

    Figure T2.1.3 Recommendation performance improvement comparison with baselines from warm-start to cold-start on LastFM data set.

    results on Delicious

    Figure T2.1.4 Recommendation performance improvement comparison with baselines from warm-start to cold-start on Delicious data set.

    Dissemination:
    1. Publication:
      • Qingyun Wu, Huazheng Wang, Quanquan Gu and Hongning Wang. Contextual Bandits in A Collaborative Environment. The 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'2016), p529-538, 2016. (PDF)
    2. Code: Python Implementation of CoLin can be found here.
    3. Data sets: Our experimentation data sets (include the simulator, processed Yahoo Frontpage, LastFM and Delicious data sets) can be found here.

    Read More

  13. Learning Hidden Features for Contextual Bandits (Thrust II)
  14. Summary: Most contextual bandit algorithms simply assume the learner would have access to the entire set of features, which govern the generation of payoffs from a user to an item. However, in practice it is challenging to exhaust all relevant features ahead of time, and oftentimes due to privacy or sampling constraints many factors are unobservable to the algorithm. Failing to model such hidden factors leads a system to make constantly suboptimal predictions.
    In this work, we propose to learn the hidden features for contextual bandit algorithms. Hidden features are explicitly introduced in our reward generation assumption, in addition to the observable contextual features. A scalable bandit algorithm is achieved via coordinate descent, in which closed form solutions exist at each iteration for both hidden features and bandit parameters. Most importantly, we rigorously prove that the developed contextual bandit algorithm achieves a sublinear upper regret bound with high probability, and a linear regret is inevitable if one fails to model such hidden features. Extensive experimentation on both simulations and large-scale real-world datasets verified the advantages of the proposed algorithm compared with several state-of-the-art contextual bandit algorithms and existing ad-hoc combinations between bandit algorithms and matrix factorization methods.
    Result Highlights:
    simulation results

    Figure T2.2.1 Regret comparison with baseline multi-armed bandit algorithms in simulation study.

    results on Yahoo

    Figure T2.2.2 Comparison with baseline multi-armed bandit algorithms of normalized Click-Through-Rate on Yahoo data set.

    results on LastFM

    Figure T2.2.3 Recommendation performance improvement comparison with baselines from warm-start to cold-start on LastFM data set.

    results on Delicious

    Figure T2.2.4 Recommendation performance improvement comparison with baselines from warm-start to cold-start on Delicious data set.

    Dissemination:
    1. Publication:
      • Huazheng Wang, Qingyun Wu and Hongning Wang. Learning Hidden Features for Contextual Bandits. The 25th ACM International Conference on Information and Knowledge Management (CIKM 2016), p1633-1642, 2016. (PDF)
      • Huazheng Wang, Qingyun Wu and Hongning Wang. Factorization Bandits for Interactive Recommendation. The Thirty-First AAAI Conference on Artificial Intelligence (AAAI 2017). (PDF, Supplement)
    2. Code: Python Implementation of hLinUCB and factorUCB can be found here.
    3. Data sets: Our experimentation data sets (include the simulator, processed Yahoo Frontpage, LastFM and Delicious data sets) can be found here.

    Read More

  15. Returning is Believing: Optimizing Long-term User Engagement in Recommender Systems (Thrust II)
  16. Summary: Improving long-term user engagement is essential for the success of recommender systems. Existing recommendation algorithms mostly focus on optimizing short-term user engagement metrics, such as immediate user clicks, but with little or no regards for their effects in the long-term, such as on users' re-visit/return behaviors.
    In this work, we propose to improve long-term user engagement in a recommender system from the perspective of sequential decision optimization, where users' click and return behaviors are directly modeled for online optimization. A bandit-based solution is formulated to balance three competing factors during online learning, including exploitation for immediate click, exploitation for expected future clicks, and exploration of unknowns for model estimation. We rigorously prove that with a high probability our proposed solution achieves a sublinear upper regret bound in maximizing cumulative clicks from a population of users in a given period of time, while a linear regret is inevitable if a user's temporal return behavior is not considered when making the recommendations. Extensive experimentation on both simulations and a large-scale real-world dataset collected from Yahoo frontpage news recommendation log verified the effectiveness and significant improvement of our proposed algorithm compared with several state-of-the-art online learning baselines for recommendation.
    Result Highlights:
    Yahoo clickthrough rate

    Figure T2.3.1 Click-through rate comparison in Yahoo frontpage news recommendation module between our r2Bandit versus several baseline bandit algorithms.

    Yahoo cumulative clicks

    Figure T2.3.2 Culumative clicks over time in Yahoo frontpage news recommendation module from our r2Bandit versus several baseline bandit algorithms.

    Yahoo average return time

    Figure T2.3.3 Average return time comparison in Yahoo frontpage news recommendation module between our r2Bandit versus several baseline bandit algorithms.

    Yahoo return rate

    Figure T2.3.4 Reweighted return rate comparison in Yahoo frontpage news recommendation module between our r2Bandit versus several baseline bandit algorithms.

    Dissemination:
    1. Publication:
      • Qingyun Wu, Hongning Wang, Liangjie Hong and Yue Shi. Returning is Believing: Optimizing Long-term User Engagement in Recommender Systems. The 26th International Conference on Information and Knowledge Management (CIKM 2017), p1927-1936, 2017. (PDF)
    2. Code: Python Implementation of r2Bandit can be found here.

    Read More

  17. Learning Contextual Bandits in a Non-stationary Environment (Thrust II)
  18. Summary: Multi-armed bandit algorithms have become a reference solution for handling the explore/exploit dilemma in recommender systems, and many other important real-world problems, such as display advertisement. However, such algorithms usually assume a stationary reward distribution, which hardly holds in practice as users' preferences are dynamic. This inevitably costs a recommender system consistent suboptimal performance. In this paper, we consider the situation where the underlying distribution of reward remains unchanged over (possibly short) epochs and shifts at unknown time instants. In accordance, we propose a contextual bandit algorithm that detects possible changes of environment based on its reward estimation confidence and updates its arm selection strategy respectively. Rigorous upper regret bound analysis of the proposed algorithm demonstrates its learning effectiveness in such a non-trivial environment. Extensive empirical evaluations on both synthetic and real-world datasets for recommendation confirm its practical utility in a changing environment.
    Result Highlights:
    Dynamic linear bandit

    Figure T2.4.1 Illustration of dLinUCB. The master bandit model maintains the `badness' estimation of slave models over time to detect changes in the environment. At each round, the most promising slave model is chosen to interact with the environment; and the acquired feedback is shared across all admissible slave models for model update.

    Context dependent change detection

    Figure T2.4.3 An illustrative example of context-dependent changes in a piecewise stationary environment. A linear reward assumption is postulated to simplify the illustration.

    dLinUCB simulation

    Figure T2.4.2 Cumulative regret in a piece-wise stationary environment.

    DenBand on SNAP

    Figure T2.3.4 Normalized Click Through Rate on Snapchat dataset.

    Dissemination:
    1. Publication:
      • Qingyun Wu, Naveen Iyer and Hongning Wang. Learning Contextual Bandits in a Non-stationary Environment. The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2018), p495-504, 2018. (PDF)
      • Qingyun Wu, Huazheng Wang, Yanen Li and Hongning Wang. Dynamic Ensemble of Contextual Bandits to Satisfy Users' Changing Interests. The Web Conference 2019 (WWW 2019), p2080-2090, 2019 (PDF)
    2. Code: Python Implementation of dLinUCB and DenBand can be found here.

    Read More

  19. Online Learning to Rank (Thrust II)
  20. Summary: Online Learning to Rank (OL2R) algorithms learn from implicit user feedback on the fly. The key to such algorithms is an unbiased estimate of gradient, which is often (trivially) achieved by uniformly sampling from the entire parameter space. Unfortunately, this leads to high-variance in gradient estimation, resulting in high regret during model updates, especially when the dimension of the parameter space is large.
    In this work, we focus on Dueling Bandit Gradient Descent (DBGD) based OL2R algorithms, which constitute a major endeavor in this direction of research. In particular, we aim at reducing the variance of gradient estimation in DBGD-type OL2R algorithms. We project the selected updating direction (i.e., the winning direction) into a space spanned by the feature vectors from examined documents under the current query (termed the "document space" for short), after an interleaved test. Our key insight is that the result of an interleaved test is solely governed by a user's relevance evaluation over the examined documents. Hence, the true gradient introduced by this test is only reflected in the constructed document space, and components of the proposed gradient which are orthogonal to the document space can be safely removed, for variance reduction purpose. We prove that this projected gradient is still an unbiased estimation of the true gradient, and show that this lower-variance gradient estimation results in significant regret reduction. Our proposed method is compatible with all existing DBGD-type OL2R algorithms which rank documents using a linear model. Extensive experimental comparisons with several best-performing DBGD-type OL2R algorithms have confirmed the effectiveness of our proposed method in reducing the variance of gradient estimation and improving overall ranking performance.
    Result Highlights:
    null space exploration

    Figure T2.5.1 An illustrative example of context-dependent changes in a piecewise stationary environment. A linear reward assumption is postulated to simplify the illustration.

    Dynamic linear bandit

    Figure T2.5.2 Illustration of model update for DBGD-DSP in a three dimensional space. Dashed lines represent the trajectory of DBGD following different update directions. ut is the selected direction by DBGD, which is in the 3-d space. Red bases present the document space St on a 2-d plane. ut is projected onto St to become gt for model update.

    dLinUCB simulation

    Figure T2.5.2 Offline NDCG@10 on Yahoo! dataset under a perfect click model.

    Dissemination:
    1. Publication:
      • Huazheng Wang, Ramsey Langley, Sonwoo Kim, Eric McCord-Snook and Hongning Wang. Efficient Exploration of Gradient Space for Online Learning to Rank. The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2018), p145-154, 2018. (PDF)
      • Huazheng Wang, Sonwoo Kim, Eric McCord-Snook, Qingyun Wu and Hongning Wang. Variance Reduction in Gradient Exploration for Online Learning to Rank. The 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019), p835-844, 2019. Best Paper Award (PDF)
    2. Code: Python Implementation of NSGD and DSP algorithms can be found here.

    Read More

  21. Explainable Recommendation (Thrust III)
  22. Summary: Explaining automatically generated recommendations allows users to make more informed and accurate decisions about which results to utilize, and therefore improves their satisfaction. In this work, we develop a multi-task learning solution for explainable recommendation. Two companion learning tasks of user preference modeling for recommendation and opinionated content modeling for explanation are integrated via a joint tensor factorization. As a result, the algorithm predicts not only a user's preference over a list of items, i.e., recommendation, but also how the user would appreciate a particular item at the feature level, i.e., opinionated textual explanation.
    Later, we integrate regression trees to guide the learning of latent factor models for recommendation, and use the learnt tree structure to explain the resulting latent factors. Specifically, we build regression trees on users and items respectively with user-generated reviews, and associate a latent profile to each node on the trees to represent users and items. With the growth of regression tree, the latent factors are gradually refined under the regularization imposed by the tree structure. As a result, we are able to track the creation of latent profiles by looking into the path of each factor on regression trees, which thus serves as an explanation for the resulting recommendations.
    Extensive experiments on two large collections of Amazon and Yelp reviews demonstrate the advantage of our model over several competitive baseline algorithms. Besides, our extensive user study also confirms the practical value of explainable recommendations generated by our model.
    Result Highlights:
    Multi-task explainable recommendation

    Figure T3.1.1 Joint tensor decomposition scheme. Task relatedness is captured by sharing latent factors across the tensors; in-task variance is captured by corresponding core tensors.

    Factorization Tree

    Figure T3.1.2 An example user tree constructed for explainable recommendation: Top three levels of our FacT model learnt for restaurant recommendations.

    Dissemination:
    1. Publication:
      • Yiyi Tao, Yiling Jia, Nan Wang and Hongning Wang. The FacT: Taming Latent Factor Models for Explainability with Factorization Trees. The 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019), p295-304, 2019. (PDF)
      • Nan Wang, Yiling Jia, Yue Yin and Hongning Wang. Explainable Recommendation via Multi-Task Learning in Opinionated Text Data. The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2018), p165-174, 2018. (PDF)
    2. Code:
      • A Python Implementation of our factorization tree algorithm for explainable recommendation can be found here.

    Read More

  • ReviewMiner: An Aspect-based Review Analytics System (Thrust IV)
  • Summary: We develop an aspect-based sentiment analysis system named ReviewMiner. It analyzes opinions expressed about an entity in an online review at the level of topical aspects to discover each individual reviewer's latent opinion on each aspect as well as his/her relative emphasis on different aspects when forming the overall judgment of the entity. The system personalizes the retrieved results according to users' input preferences over the identified aspects, recommends similar items based on the detailed aspect-level opinions, and summarizes aspect-level opinions in textual, temporal and spatial dimensions. The unique multi-modal opinion summarization and visualization mechanisms provide users with rich perspectives to digest information from user-generated opinionated content for making informed decisions.
    Result Highlights:
    Entity search result page

    Figure T4.1.1 Entity search result page for the hotel domain. Users can directly issue natural language based queries in the search bar, navigate the map to select hotels in the region, or specify preferences over the identified topical aspects to rank the entities accordingly.

    Review search result page

    Figure T4.1.2 Review search result page. In ReviewMiner, we segment reviews by sentences, assign them into corresponding aspects, and highlight them with different colors. We also visualize the results in temporal dimension and construct aspect-based summarization and comparison across entities.

    Dissemination:
    1. Publication:
      • Derek Wu and Hongning Wang. ReviewMiner: An Aspect-based Review Analytics System. The 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017), p1285-1288, 2017. (PDF)
    2. Demo system: ReviewMiner

    Read More

  • Privacy Protection in Personalized Web Search (Thrust IV)
  • Summary: Modern web search engines exploit users' search history to personalize search results, with a goal of improving their service utility on a per-user basis. But it is this very dimension that leads to the risk of privacy infringement and raises serious public concerns. In this work, we propose a client-centered intent-aware query obfuscation solution for protecting user privacy in a personalized web search scenario. In our solution, each user query is submitted with $l$ additional cover queries and corresponding clicks, which act as decoys to mask users' genuine search intent from a search engine. The cover queries are sequentially sampled from a set of hierarchically organized language models to ensure the coherency of fake search intents in a cover search task. Our approach emphasizes the plausibility of generated cover queries, not only to the current genuine query but also to previous queries in the same task, to increase the complexity for a search engine to identify a user's true intent. We also develop two new metrics from an information theoretic perspective to evaluate the effectiveness of provided privacy protection.
    Based on the proposed algorithm, we develop Hide-n-Seek, an intent-aware privacy protection plugin for personalized web search. In addition to users' genuine search queries, Hide-n-Seek submits k cover queries and corresponding clicks to an external search engine to disguise a user's search intent grounded and reinforced in a search session by mimicking the true query sequence. The cover queries are synthesized and randomly sampled from a topic hierarchy, where each node represents a coherent search topic estimated by both n-gram and neural language models constructed over crawled web documents. Hide-n-Seek also personalizes the returned search results by re-ranking them based on the genuine user profile developed and maintained on the client side. With a variety of graphical user interfaces, we present the topic-based query obfuscation mechanism to the end users for them to digest how their search privacy is protected.
    Result Highlights:
    Workflow for intent-aware query obfuscation

    Figure T4.2.1 Intent-aware Query-obfuscation framework for Privacy-protection. The dashed arrows represent the generalization and specification operations on the topic tree to generate sequential cover queries with respect to a user's genuine search task.

    Workflow for Hide-n-Seek

    Figure T4.2.2 Overview of Hide-n-Seek System, a Google Chrome extension for privacy-preserving personalization in web search.

    Word cloud for query obfuscation results

    Figure T4.2.3. Word-cloud based on submitted true user queries and cover queries.

    Dissemination:
    1. Publication:
      • Wasi Ahmad, Kai-Wei Chang and Hongning Wang. Intent-aware Query Obfuscation for Privacy Protection in Personalized Web Search. The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2018), p285-294, 2018. (PDF)
      • Puxuan Yu, Wasi Ahmad and Hongning Wang. Hide-n-Seek: An Intent-aware Privacy Protection Plugin for Personalized Web Search. The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2018), demo track, p1333-1336, 2018. (PDF)
    2. Implementations:
      • IQP query obfuscation algorithm: a Python implementation of the algorithm can be found here.
      • Hide-n-Seek: Chrome Extension

    Read More

    Research Dissemination