Mental Health And Affective Computing

Learn more »

linklab
logo3

Anxiety And Depression:

A Weakly Supervised Learning Framework For Detecting Social Anxiety And Depression


angdep

Although social anxiety and depression are common, they are often underdiagnosed and undertreated, in part due to difficulties identifying and accessing individuals in need of services. Current assessments rely on client self-report and clinician judgment, which are vulnerable to social desirability and other subjective biases. Identifying objective, nonburdensome markers of these mental health problems, such as features of speech, could help advance assessment, prevention, and treatment approaches. Prior research examining speech detection methods has focused on fully supervised learning approaches employing strongly labeled data. However, strong labeling of individuals high in symptoms or state affect in speech audio data is impractical, in part because it is not possible to identify with high confidence which regions of a long speech indicate the person's symptoms or affective state. We propose a weakly supervised learning framework for detecting social anxiety and depression from long audio clips. Specifically, we present a novel feature modeling technique named NN2Vec that identifies and exploits the inherent relationship between speakers' vocal states and symptoms/affective states. Detecting speakers high in social anxiety or depression symptoms using NN2Vec features achieves F-1 scores 17% and 13% higher than those of the best available baselines. In addition, we present a new multiple instance learning adaptation of a BLSTM classifier, named BLSTM-MIL. Our novel framework of using NN2Vec features with the BLSTM-MIL classifier achieves F-1 scores of 90.1% and 85.44% in detecting speakers high in social anxiety and depression symptoms.

View details »





Real Time Distant Speech Emotion Recognition in Indoor Environments


We develop solutions to various challenges in different stages of the processing pipeline of a real time indoor distant speech emotion recognition system to reduce the discrepancy between training and test conditions for distant emotion recognition. We use a novel combination of distorted feature elimination, classifier optimization, several signal cleaning techniques and train classifiers with synthetic reverberation obtained from a room impulse response generator to improve performance in a variety of rooms with various source-to-microphone distances. Our comprehensive evaluation is based on a popular emotional corpus from the literature, two new customized datasets and a dataset made of YouTube videos. The two new datasets are the first ever distance aware emotional corpuses and we created them by 1) injecting room impulse responses collected in a variety of rooms with various source-to-microphone distances into a public emotional corpus; and by 2) re-recording the emotional corpus with microphones placed at different distances. The overall performance results show as much as 15.51% improvement in distant emotion detection over baselines, with a final emotion recognition accuracy ranging between 79.44%-95.89% for different rooms, acoustic configurations and source-to-microphone distances. We experimentally evaluate the CPU time of various system components and demonstrate the real time capability of our system.

View details »

RTDER




DER:

Distant emotion recognition


DERPIC

Distant emotion recognition (DER) extends the application of speech emotion recognition to the very challenging situation that is determined by variable speaker to microphone distances. The performance of conventional emotion recognition systems degrades dramatically as soon as the microphone is moved away from the mouth of the speaker. This is due to a broad variety of effects such as background noise, feature distortion with distance, overlapping speech from other speakers, and reverberation. This paper presents a novel solution for DER, addressing the key challenges by identification and deletion of features from consideration which are significantly distorted by distance, creating a novel, called Emo2vec, feature modeling and overlapping speech filtering technique, and the use of an LSTM classifier to capture the temporal dynamics of speech states found in emotions. A comprehensive evaluation is conducted on two acted datasets (with artificially generated distance effect) as well as on a new emotional dataset of spontaneous family discussions with audio recorded from multiple microphones placed in different distances. Our solution achieves an average 91.6%, 90.1% and 89.5% accuracy for emotion happy, angry and sad, respectively, across various distances which is more than a 16% increase on average in accuracy compared to the best baseline method.

View details »





DAVE:

Detecting Agitated Vocal Events


DAVE

DAVE is a comprehensive set of event detection techniques to monitor and detect 5 important verbal agitations: asking for help, verbal sexual advances, questions, cursing, and talking with repetitive sentences. The novelty of DAVE includes combining acoustic signal processing with three different text mining paradigms to detect verbal events (asking for help, verbal sexual advances, and questions) which need both lexical content and acoustic variations to produce accurate results. To detect cursing and talking with repetitive sentences we extend word sense disambiguation and sequential pattern mining algorithms. The solutions have applicability to monitoring dementia patients, for online video sharing applications, human computer interaction (HCI) systems, home safety, and other health care applications. A comprehensive performance evaluation across multiple domains includes audio clips collected from 34 real dementia patients, audio data from controlled environments, movies and Youtube clips, online data repositories, and healthy residents in real homes. The results show significant improvement over baselines and high accuracy for all 5 vocal events.

View details »





Home Wireless Sensing System for Monitoring Nighttime Agitation and Incontinence in Patients with Alzheimer’s Disease


Patients with Alzheimer’s Disease (AD) often experience urinary incontinence and agitation during sleep. There is some evidence that these phenomena are related, but the relationships (and the subsequent opportunity for caregiver intervention) has never been formally studied. In this work, the relationships among the times of occurrence of nighttime agitation, sleep continuity and duration, and urinary incontinence are identified for persons with AD by using innovative, non-invasive technology. Deployments in 12 homes demonstrate both the utility of the technical monitoring system and the discovered correlations between agitation and incontinence for these 12 AD patients. Implications of possible interventions are discussed. Lessons learned for technical, non-technical and health care implications are presented.

View details »

alzheimers




SocialSense:

A Collaborative Mobile Platform for Speaker and Mood Identification


socialsense

SocialSense is a collaborative smartphone based speaker and mood identification and reporting system that uses a user's voice to detect and log his/her speaking and mood episodes. SocialSense collaboratively works with other phones that are running the app present in the vicinity to periodically send/receive speaking and mood vectors to/from other users present in a social interaction setting, thus keeping track of the global speaking episodes of all users with their mood. In addition, it utilizes a novel event-adaptive dynamic classification scheme for speaker identification which updates the speaker classification model every time one or more users enter or leave the scenario, ensuring a most updated classifier based on user presence. Evaluation of using dynamic classifiers shows that SocialSense improves speaker identification accuracy by 30% compared to traditional static speaker identification systems, and a 10% to 43% performance boost under various noisy environments. SocialSense also improves the mood classification accuracy by 4% to 20% compared to the base-line approaches. Energy consumption experiments show that its device daily lifetime is between 10-14 hours.

View details »





MOBI-COG:

A Mobile Application for Instant Screening of Dementia Using the Mini-Cog Test


MOBI-COG is an application that runs on a mobile device, such as a tablet or a smartphone, and provides an automated and instant dementia screening service. The MOBI-COG App is a complete automation of a widely used 3-minute dementia screening test called the Mini-Cog test, which is administered by primary caregivers for a quick screening of dementia in elderly. Besides asking the patient to remember and then recall a set of three words, the test involves a free-hand clock drawing test. The MOBI-COG App automates all these steps including the automatic assessment of the correctness of a clock drawn on the touch screen of a mobile device. We train the MOBI-COG App with over 1000 touch-drawn clocks and show that the system is capable of detecting and recognizing digits in less than 100 ms, in-situ (i.e. without the help of any back-end server), with 99.53% accuracy, and is robust to changes in people, sizes of the drawn digits, and screen sizes of the mobile devices. We perform a usability study of MOBI-COG involving eight healthy human subjects and show that the system is capable of performing all three steps of the test effectively. We also provide a summary of the users' comments on the application.

View details »

mobicog




Kintense:

A robust, accurate, real-time and evolving system for detecting aggressive actions from streaming 3D skeleton data


kintense

Kintense is a robust, accurate, real-time, and evolving system for detecting aggressive actions such as hitting, kicking, pushing, and throwing from streaming 3D skeleton joint coordinates obtained from Kinect sensors. Kintense uses a combination of: (1) an array of supervised learners to recognize a predefined set of aggressive actions, (2) an unsupervised learner to discover new aggressive actions or refine existing actions, and (3) human feedback to reduce false alarms and to label potential aggressive actions. This paper describes the design and implementation of Kintense and provides empirical evidence that the system is 11% – 16% more accurate and 10% – 54% more robust to changes in distance, body orientation, speed, and person when compared to standard techniques such as dynamic time warping (DTW) and posture based gesture recognizers. We deploy Kintense in two multi-person households and demonstrate how it evolves to discover and learn unseen actions, achieves up to 90% accuracy, runs in real-time, and reduces false alarms with up to 13 times fewer user interactions than a typical system.

View details »





KinSpace:

Passive Obstacle Detection via Kinect


Falls are a significant problem for the elderly living independently in the home. Many falls occur due to household objects left in open spaces. We present KinSpace, a passive obstacle detection system for the home. KinSpace employs the use of a Kinect sensor to learn the open space of an environment through observation of resident walking patterns. It then monitors the open space for obstacles that are potential tripping hazards and notifies the residents accordingly. KinSpace uses real-time depth data and human-in-the-loop feedback to adjust its understanding of the open space of an environment. We present a 5,000-frame deployment dataset spanning multiple homes and classes of objects. We present results showing the effectiveness of our underlying technical solutions in identifying open spaces and obstacles. The results for both lab testing and a deployment in an actual home show roughly 80% accuracy for both open space detection and obstacle detection even in the presence of many real-world issues. Consequently, this new technology shows great potential to reduce the risk of falls in the home due to environmental hazards.

View details »

kinspace





Publications

  • Asif Salekin, Jeremy W. Eberle, Jeffrey J. Glenn, Bethany A. Teachman, and John A. Stankovic. 2018. A Weakly Supervised Learning Framework for Detecting Social Anxiety and Depression, ACM Interactive, Mobile, Wearable, and Ubiquitious Technologies (IMWUT), Vol. 2, No. 2, Article 81 (June 2018), 26 pages. (and Ubicomp 2018) [pdf]

  • M. Ahmed, Z. Chen, E. Fass, and J. Stankovic, Real-Time Distant Speech Emotion Recognition in Indoor Environments, Mobiquitous, Nov. 2017. [pdf]

  • A. Salekin, Z. Chen, M. Ahmed, J. Lach, D. Metz, K. de la Haye, B. Bell, and J. Stankovic, Distance Emotion Recognition, ACM Interactive, Mobile, Wearable, and Ubiquitious Technologies (IMWUT), Vol. 1, Issue 3, Sept. 2017, 96:1-96:24 (Ubicomp 2017) [pdf]

  • A. Salekin, H. Wang, K. Williams, and J. Stankovic, DAVE: Detecting Agitated Vocal Events, IEEE CHASE, July 2017. [pdf]

  • A. Salekin, H. Wang, and J. Stankovic, Demo Abstract: KinVocal: Detecting Agitated Vocal Events, ACM Sensys, Nov. 2015. [pdf]

  • J. Gong, K. Rose, I. Emi, J. Specht, E. Hoque, D. Fan, S. Dandu, R. Dickerson, Y. Perkounkova, J. Lach, and J. Stankovic, Home Wireless Sensing System for Monitoring Incontinence and Sleep Agitation, Wireless Health, Oct. 2015. [pdf]

  • M. Ahmed, S. Kenkeremath, and J. Stankovic, SocialSense: A Collaborative Mobile Platform for Speaker and Mood Identification, EWSN, Feb. 2015. [pdf]

  • S. Nirjon, I. Emi, A. Mondol, A. Salekin, and J. Stankovic, MOBI-COG: A Mobile Application for Instant Screening of Dementia Using the Mini-COG Test, Wireless Health, Oct. 2014. [pdf]

  • S. Nirjon, C. Greenwood, C. Torres, S. Zhou, J. Stankovic, H. Yoon, H. Ra, C. Basaran, T. Park, and S. Son, Kintense: A Robust, Accurate, Real-Time and Evolving System for Detecting Aggressive Actions from Streaming 3D Skeleton Data, PerCom2014, March 2014. (acceptance rate 14%). [pdf]

  • C. Greenwood, S. Nirjon, J. Stankovic, H. Yoon, H. Ra, T. Park, S. Son, KinSpace: Passive Obstacle Detection via Kinect, EWSN, Feb. 2014. [pdf]



People

John A. Stankovic (PI)

BP America Professor, Director, Link Lab

Department of Computer Science

University of Virginia


Asif Salekin

Ph.D. Candidate

Department of Computer Science

University of Virginia


Mohsin Ahmed

Ph.D. Candidate

Department of Computer Science

University of Virginia


Zeya Chen

Ph.D. Candidate

Department of Computer Science

University of Virginia


Ifat Afrin Emi

Ph.D. Candidate

Department of Computer Science

University of Virginia


Md Abu Sayeed Mondol

Ph.D. Candidate

Department of Computer Science

University of Virginia