Basic Information:

UVA "Machine Learning QData" group focuses on developing novel machine-learning techniques on important challenges with medical impacts, especially those dealing with complex data sets. We strive toward building and sharing benchmarked datasets and open-source research toolkits.

Table of grant awards we are working on :

Name Role Summary Project Website Year
NSF CNS-124538: PI SaTC: CORE: Small: Generalizing Adversarial Examples in Natural Language Project Website 2021-2024
NSF CCF-1900676: co-PI SHF: Medium: Rearchitecting Neural Networks for Verification Project Website 2019-2023
Selected UVa internal grant awards we are working and worked on: Year
UVA Engineering-In-Medicine (EIM) Research Innovation Awards 2020-2021
UVA Human and Machine Intelligence Seed Award 2020
UVA SEAS Research Innovation Award 2017-2018
UVA Brain Institute Seed Award for Transformative, Collaborative Neuroscience Pilot Grant 2017-2018
UVA SEAS Cybersec Initiative Seed Award 2016-2017

Table of external grant awards we have completed :

Name Role Summary Project Website Year
NSF IIS-1453580: PI CAREER: A Data-Driven Network Inference Framework for Context-Conditioned Protein Interaction Graphs Project Website 2015-2020
NSF CNS-1619098: PI TWC:Small: Automatic Techniques for Evaluating and Hardening Machine Learning Classifiers in the Presence of Adversaries Project Website 2016-2019
ONR: CO-PI Deep Learning of Passage Structure for Scalable Semantic Discovery Project Website 2015-2018
NSF CNS-1441875: co-PI PI Meeting for Secure and Trustworthy Cyberspace: Matching Meeting Attendees Through Text Mining of their Scientific Reports. Project Website 2014-2015
OLD legacy research sites
  • [ Our deep learning tools on discrete data types]
  • [ Our tools for making machine learning robust and trustworthy]
  • [ Our deep learning tools for analyzing sequential data types from bio-medicine]
  • [ Our tools for learning dependency networks from heterogeneous samples (by joint structure learning of multiple Graphical Models)]

  • Representative publications:

    Category Representative Papers


    Trustworthy Machine Learning

  • [ Our tools for making machine learning robust and trustworthy]
    • "Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks",NDSS 2018 (PDF)
    • "Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers", 2018 IEEE Security and Privacy Workshops (SPW), Ji Gao, Jack Lanchantin, Mary Lou Soffa, Yanjun Qi, (PDF)
    • "Automatically Evading Classifiers",NDSS 2016 (PDF)(Talk)
    • "A Theoretical Framework for Robustness of (Deep) Classifiers Against Adversarial Samples", ICLR-17 (Arxiv)
    • "Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks", (Arxiv)
    • "DeepCloak: Masking Deep Neural Network Models for Robustness against Adversarial Samples",ICLR-17 workshop (Arxiv)
    • "Adversarial Playground: A Visualization Suite for Adversarial Sample Generation", Norton, Andrew and Qi, Yanjun, (Arxiv)

    (Deep) Representation Learning on sequential data

    (genome or epigenomic data, product review text, bio-reports text, protein sequence strings, etc.)

  • [ Our Tools for Mining Sequential Bio-Data]
  • [ Our deep Learning tools on Discrete Structures]
    • "Attend and Predict: Understanding Gene Regulation by Selective Attention on Chromatin", at (NeurIPS 2017 ) (PDF)
    • "DeepDiff: Deep-learning for predicting Differential gene expression from histone modifications", Bioinformatics 2018 (PDF)
    • "DeepChrome: Deep-learning for predicting gene expression from histone modifications", Bioinformatics 2016 (PDF)
    • "Memory Matching Networks for Genomic Sequence Classification", ICLR2017, (Arxiv) (Poster)
    • "GaKCo: a Fast GApped k-mer string Kernel using COunting", ECML17, (Arxiv)
    • "Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks", PSB17 (Arxiv)
    • "Deep Motif: Visualizing Genomic Sequence Classifications", ICLR2016, the International Conference on Learning Representations (Arxiv)
    • "MUST-CNN: A Multilayer Shift-and-Stitch Deep Convolutional Architecture for Sequence-based Protein Structure Prediction", AAAI 2016. (PDF) (Talk)

    • "Unsupervised Feature Learning by Deep Sparse Coding", SDM 2014. (PDF)(Talk)
    • "Deep Learning for Character-based Information Extraction",ECIR 2014, (PDF) (SupplementaryDoc)(Talk)
    • "Learning the Dependency Structure of Latent Factors",NIPS 2012, (PDF)(Talk)
    • "A unified multitask architecture for predicting local protein properties", PLoS ONE 2012((PDF))
    • "Sentiment Classification with Supervised Sequence Encoder", ECML 2012, (PDF)(Talk)
    • "Sentiment Classification Based on Supervised Latent n-gram Analysis",CIKM 2011, (PDF)(Talk)
    • "Semi-Supervised Abstraction-Augmented String Kernel for Multi-Level Bio-Relation Extraction", ECML 2010, (PDF)(Talk)
    • Semi-Supervised Bio-Named Entity Recognition with Word-Codebook Learning", SDM 2010, (PDF)
    • "Polynomial Semantic Indexing", NIPS 2009, (PDF)
    • "Semi-Supervised Sequence Labeling with Self-Learned Feature", ICDM 2009, (PDF)(Talk)
    • "Combining Labeled and Unlabeled Data for Word-Class Distribution" , CIKM 2009, (PDF)

    Learning Graph from Data

    (Helping researchers effectively translate aggregated data into knowledge that take the form of graphs)

  • [ Our tools for learning networks from heterogeneous samples (by joint structure learning of Gaussian Graphical Models)]
    • "A fast and scalable joint estimator for integrating additional knowledge in learning multiple related sparse gaussian graphical models.", ICML (2018)(PDF)
    • "Fast and Scalable Learning of Sparse Changes in High-Dimensional Gaussian Graphical Model Structure", AISTATS (2018)(PDF)
    • "A Fast and Scalable Joint Estimator for Learning Multiple Related Sparse Gaussian Graphical Models", AISTATS (2017)(Arxiv)(PDF)
    • "A constrained L1 minimization approach for estimating multiple Sparse Gaussian or Nonparanormal Graphical Models", (2017) Machine Learning Journal ; (PDF )
    • "Learning the Dependency Structure of Latent Factors",NIPS 2012, (PDF)(Talk)
    • "Transfer String Kernel for Cross-Context DNA-Protein Binding Prediction",@ (TCBB) (Arxiv)
    • "Protein Complex Identification by Supervised Graph Clustering", Bioinformatics 2008, (PDF)(Talk)
    Add Weak Supervision in Machine Learning
    • "Curriculum Labeling: Self-paced Pseudo-Labeling for Semi-Supervised Learning", P Cascante-Bonilla, F Tan, Y Qi, V Ordonez, arXiv:2001.06001
    • " Yanjun Qi, B. Bai, X. Ning, and P. Kuksa. Systems and methods for Semi-supervised Relationship Extraction, Apr. 2014.
    • "Semi-Supervised Convolution Graph Kernels for Relation Extraction", SDM 2011, (PDF) (Talk)
    • "Semi-Supervised Multi-Task Learning for Predicting Interactions between HIV-1 and Human Proteins", ECCB 2010, (PDF) (Talk)
    Machine Learning for Health
    • "Kernelized Information-Theoretic Metric Learning for Cancer Diagnosis using High-Dimensional Molecular Profiling Data", TKDD 2016, (PDF)
    • "MAPer: A Multi-scale Adaptive Personalized Model for Temporal Human Behavior Prediction ",CIKM 2015, (PDF) (Talk)
    • "Causal Analysis of Inertial Body Sensors for Enhancing Gait Assessment Separability towards Multiple Sclerosis Diagnosis", IEEE Body Sensor Network (BSN) 2015, (PDF) (Talk)
    • "Piecewise Linear Dynamical Model for Action Clustering from Real-World Deployments of Inertial Body Sensors", BodyNets 2014, (PDF) (Talk)
    • Retrieving Medical Records with sennamed: NEC Labs America at TREC 2012 Medical Record Track, 2012 Text Retrieval Conference, (PDF)

    Info Highlights GrantNow GrantThen Back2Top