AMHAT

Autonomous Mental Health Assessment Tool (AMHAT) developed by the Data Science Innovations Lab at SUNY Canton. A multimodal, privacy-preserving pipeline for stress screening.

Autonomous Mental Health Assessment Tool

Overview of AMHAT

AMHAT is a multimodal assessment framework that integrates speech, text, and detailed human–computer interaction signals to screen for psychological stress in a privacy-preserving way.

The system focuses on offline, privacy-first processing: raw audio, video, and text are not stored. Instead, AMHAT retains only derived features that are necessary for modeling, which reduces risk while preserving predictive power.

AMHAT is designed for controlled laboratory studies and scalable deployment in real-world cognitive-aware applications, including mental wellness tools, digital therapeutics, and stress-aware interfaces.

AMHAT multimodal overview illustration

AMHAT consumes acoustic speech properties, linguistic features, and rich interaction signals such as keystroke timing and mouse dynamics. The modeling pipeline is uncertainty-aware and robust to missing modalities, which enables deployment in imperfect and heterogeneous environments without storing raw content.

AMHAT Project Team

Portrait of Dr. Mehdi Ghayoumi

Dr. Mehdi Ghayoumi

Principal Investigator, Lab Director

Portrait of Prof. Tiffany Forsythe

Prof. Tiffany Forsythe

Scientific Advisor

Portrait of Cory Liu

Cory Liu

Data Science Researcher

Portrait of Elena M. Nye

Elena M. Nye

Data Science Researcher

Portrait of Dena Barmas

Dena Barmas

Data Science Researcher

Portrait of Anthony

Anthony Marrero

Data Science Researcher

Outputs and Sample Results

ROC Curves — Unimodal vs Fusion
Receiver operating characteristic (ROC) curves for acoustic, lexical, human–computer interaction (HCI), and fused models. The fused model achieves the highest AUROC (0.955), improving substantially over acoustic (0.733) and lexical (0.662) baselines, and slightly over the strong HCI-only model (0.912).
Precision–Recall Curves — Unimodal vs Fusion
Precision–recall curves comparing the same models. Fusion yields the highest area under the precision–recall curve (AUPRC = 0.958), indicating strong performance in identifying stressed segments under class imbalance, with HCI-only (0.882) as the next-best modality.
DET Curves — Unimodal vs Fusion
Detection error tradeoff (DET) curves showing false negative rate versus false positive rate. Across operating points, the fused model consistently achieves lower error rates than unimodal baselines, capturing the benefit of combining complementary acoustic, lexical, and interaction features.
Calibration Reliability — Unimodal vs Fusion
Reliability (calibration) plot of predicted stress probabilities versus observed frequencies. Acoustic and lexical models obtain expected calibration error (ECE) near 0.04–0.05, while HCI and fused models show mild overconfidence in some regions. This motivates further calibration refinement for deployment.
Calibration Histogram — Fusion model
Histogram of fused-model stress probabilities. Most samples fall in low to moderate probability ranges, with a smaller cluster of high-risk predictions. This distribution reflects realistic stress prevalence in the synthetic cohort and supports threshold tuning for screening versus triage use cases.
Threshold Sweep — Fusion model
Threshold sweep for the fused model, showing sensitivity, specificity, F1 score, and accuracy as the decision threshold varies. Around a threshold of 0.45–0.55, the model balances false positives and false negatives and maximizes F1, which is a practical region for screening in laboratory and pilot deployments.
Permutation importance for acoustic features
Permutation-based importance for the top 10 acoustic features. Several pause-related and variability features (for example, a_14 and a_5) produce the largest drops in AUROC when permuted, indicating that prosodic instability and pausing patterns are key acoustic markers of stress in this setting.
Permutation importance for lexical features
Permutation importance for lexical features. Diversity and stress-linked lexical bins (such as l_6, l_15, and l_14) have the highest impact on AUROC, reflecting that reduced lexical variety and shifts in content categories accompany elevated stress in AMHAT-style tasks.
Permutation importance for HCI features
Permutation importance for interaction (HCI) features. Timing irregularities, bursty corrections, and heavy-tail latency statistics (for example, h_3 and h_12) drive the strongest degradation in AUROC when shuffled, emphasizing the diagnostic value of fine-grained interaction dynamics.
Confusion matrix for the fused model at threshold 0.5
Confusion matrix for the fused model at threshold 0.5. The system correctly classifies 490 non-stress and 477 stress segments, with 57 false positives and 56 false negatives, yielding a balanced, high-accuracy operating point suitable as a default for offline experimental analyses.

Publication and Poster

Code, Data Pipelines, and Resources

The AMHAT codebase includes data ingestion utilities, feature extraction scripts, model training and evaluation pipelines, and experiment configuration files. Documentation and example notebooks support reproducibility and adaptation to new datasets.

The public repository also links to de-identified, synthetic examples that demonstrate AMHAT feature representations while preserving participant privacy.

View AMHAT on GitHub

Contact and Collaboration

We welcome collaborations with researchers, clinicians, and industry partners who are interested in privacy-preserving stress and mental health assessment, multimodal sensing, and human-centered AI systems.

For inquiries about AMHAT, datasets, or potential collaborations, please contact the project lead using the email below.