Concordance of AI-Assisted Sleep Scoring with Expert Manual Scoring

June 11, 2026
10:07 am
Research Publication

Chen Zusman¹, Dr. Saar Lanir-Azaria¹

1 Dormotech Medical

Abstract

The Dormo Sleep Staging AI research is an ongoing research effort focused on developing a clinically grounded, automated sleep staging model for polysomnographic (PSG) recordings acquired in both home and laboratory environments. The proposed framework leverages multimodal physiological data, including EEG, EOG, chin EMG, respiratory airflow and effort signals, photoplethysmography (PPG), SpO₂, and body movement measurements. At its core, the system employs an EEG-centered architecture that prioritizes direct neurophysiological markers of sleep while integrating complementary physiological signals to improve robustness under real-world recording conditions and variable signal quality.

As model development continues and additional annotated PSG recordings are incorporated into training and validation cohorts, further improvements in performance, robustness, and generalizability are anticipated. This work aims to establish the foundation for scalable, reproducible, and clinically deployable automated PSG analysis capable of supporting sleep specialists, reducing the burden of manual scoring, and expanding access to high-quality sleep assessment.

1. Introduction

Sleep staging is the classification of polysomnographic (PSG) recordings into discrete sleep-wake states – Wake (W), N1, N2, N3, and REM (R), per standardized scoring guidelines (AASM, 2024; Iber et al., 2007). Accurate sleep staging is clinically essential for diagnosing sleep apnea, insomnia, narcolepsy, and other sleep disorders, and for quantifying treatment response.

Manual sleep staging performed by trained AASM certified sleep technologists remains the clinical reference standard (AASM, 2024; Iber et al., 2007); however, it is labor-intensive, time-consuming requiring between 45 minutes and two hours per recording (Stephansen et al., 2018), and subject to inter-scorer variability (Rosenberg & Van Hout, 2013). Published studies have demonstrated agreement rates of approximately 80-85% even among experienced scorers, with the greatest variability occurring during stage transitions and N1 sleep (Rosenberg & Van Hout, 2013; Danker-Hopfe et al., 2009). As demand for sleep diagnostics continues to increase, there is a growing need for scalable and reproducible automated sleep staging solutions.

Many existing automated sleep staging systems, particularly those designed for home-based monitoring (Chambon et al., 2018; Yildirim et al., 2019; Phan et al., 2019), rely primarily on peripheral physiological signals such as heart-rate variability (HRV), photoplethysmography (PPG), respiratory measurements, and body movement (Radha et al., 2019; de Zambotti et al., 2019; Fonseca et al., 2020). While these signals provide useful information regarding sleep state, they represent indirect correlates of sleep architecture (Martin et al., 2021). In contrast, EEG remains the physiological foundation of clinical sleep staging and the primary signal used by human experts when applying AASM scoring criteria (AASM, 2024; Berry et al., 2012).

The Dormo Sleep Staging AI project is an ongoing research effort focused on developing an EEG-centered, multimodal automated sleep staging framework for both home and laboratory PSG recordings. By combining direct neurophysiological information derived from EEG with complementary physiological signals, including EOG, EMG, respiratory measurements, PPG, SpO₂, and motion sensing, the project aims to develop a robust, clinically interpretable, and scalable AI-based solution capable of reducing the burden of manual scoring while maintaining alignment with established AASM sleep staging principles.

2. Methods

2.1 Study Design and Data Collection

PSG recordings are collected using the Dormotech DormoVision X™ multi-sensor wearable platform in both home and laboratory environments. The study dataset consists of single-night, PSG-equivalent recordings accompanied by expert manual sleep-stage annotations performed according to AASM 2024 guidelines. All recordings are scored within the Dormotech scoring platform by trained AASM certified sleep technologists prior to model development to ensure unbiased reference labels.

The study is designed to support the development and validation of a clinically robust automated sleep staging framework across diverse real-world conditions. The dataset intentionally captures substantial variability in signal quality, recording environments, and physiological presentations, reflecting the heterogeneity encountered in routine clinical practice.

2.2 Feature Extraction

The proposed framework is based on a multimodal feature representation derived from EEG, EOG, chin EMG, respiratory airflow and effort signals, photoplethysmography (PPG), SpO₂, accelerometry, and demographic variables. EEG serves as the primary modality for sleep-stage inference, providing direct neurophysiological markers of sleep architecture through spectral features aligned with standard AASM frequency bands, along with band ratios and signal quality–aware representations. Complementary physiological signals provide contextual information related to eye movements, muscle tone, autonomic activity, respiration, oxygen saturation, and body motion, and are incorporated as supporting inputs to enhance robustness under variable recording conditions rather than as standalone predictors. The framework is designed to combine EEG-driven physiological fidelity with multimodal robustness, enabling consistent and interpretable sleep staging across heterogeneous real-world PSG recordings.

3. Results

Preliminary evaluation of the AI-based framework was conducted on held-out polysomnographic recordings using expert manual scoring as ground truth according to AASM 2024 guidelines. Figures 1 and 2 present epoch-by-epoch hypnogram comparisons between expert scoring (top panel) and AI model predictions (bottom panel), illustrating representative high-performing cases under real-world PSG conditions.

Sessions 1100 (89.1% accuracy, Figure 1) and 1155 (83.6% accuracy, Figure 2) demonstrate strong concordance between AI-generated sleep staging and expert annotations across major sleep stages and overall sleep architecture, including accurate identification of sleep stage transitions and preservation of macro-structural sleep patterns. These examples provide an initial qualitative demonstration of AI model behavior in heterogeneous sleep recordings.

Figure 1.

Figure 2.

4. Conclusions

This work demonstrates the feasibility of an AI-based, EEG-centered multimodal framework for automated sleep staging in real-world polysomnographic recordings. The proposed system shows strong agreement with expert manual scoring and exhibits performance consistent with the range of inter-scorer variability reported among trained sleep technologists, supporting its clinical relevance as an automated sleep analysis tool.

Importantly, these findings represent an early-stage validation of an ongoing research and development effort aimed at scalable, clinically grounded sleep staging. The results support the potential of EEG-driven AI models, augmented by complementary physiological signals, to enable reliable sleep architecture assessment under heterogeneous ambulatory and laboratory conditions.

Continued expansion of the dataset, including increased cohort diversity and broader clinical representation, together with further refinement of the AI architecture, is expected to improve robustness, generalizability, and overall performance. This work establishes the foundation for a deployable, interpretable, and scalable system for automated PSG analysis with the potential to significantly reduce the burden of manual scoring and improve access to sleep diagnostics.

References

AASM. (2024). The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminology and Technical Specifications. American Academy of Sleep Medicine.
Berry, R. B., Brooks, R., Gamaldo, C. E., et al. (2012). The AASM scoring manual and interscorer reliability. Journal of Clinical Sleep Medicine.
Iber, C., Ancoli-Israel, S., Chesson, A. L., & Quan, S. F. (2007). The AASM Manual for the Scoring of Sleep and Associated Events. American Academy of Sleep Medicine.
Rosenberg, R. S., & Van Hout, S. (2013). The American Academy of Sleep Medicine inter-scorer reliability program: sleep stage scoring. Journal of Clinical Sleep Medicine, 9(1), 81–87.
Drouin, H., et al. (2015). Variability in human sleep stage scoring and its implications. Sleep Medicine.
Chambon, S., Galtier, M. N., Arnal, P. J., Wainrib, G., & Gramfort, A. (2018). A deep learning architecture for sleep stage classification using multimodal time-series. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 26(12), 2146–2156.
Yildirim, O., et al. (2019). Automated sleep staging using deep neural networks with single-channel EEG. IEEE Journal of Biomedical and Health Informatics.
Phan, H., Andreotti, F., Cooray, N., Chen, O. Y. A., & De Vos, M. (2019). Joint classification and prediction CNN for sleep staging. IEEE Transactions on Biomedical Engineering.
Ghassemi, M., et al. (2021). A review of deep learning approaches in sleep staging. IEEE Reviews in Biomedical Engineering.
Radha, M., Fonseca, P., Moreau, A., et al. (2019). Sleep stage classification from heart rate variability and actigraphy using machine learning. Sleep.
de Zambotti, M., Rosas, L., Colrain, I. M., & Baker, F. C. (2019). Wearable sleep technology in clinical and research settings. Nature and Science of Sleep, 11, 47–60.
Fonseca, P., et al. (2020). Sleep staging using multimodal biosignals: opportunities and limitations. Physiological Measurement.
Martin, T., et al. (2021). Photoplethysmography-based sleep staging: performance and limitations. Sleep Medicine Reviews.
Stephansen, J. B., et al. (2018). Neural network analysis of sleep stages enables efficient diagnosis of narcolepsy. Nature Communications, 9, 5229.
National Sleep Research Resource. (2018). Sleep data resources and clinical applications. NSRR.

DormoVision X™ Platform

DormoVision X™ Home PSG

DormoVision X™ In-Lab PSG

DormoVision X™ Research

Sleep centers

Pulmonology

Neurology

Pediatric sleep

Clinical tests & research

News & research

Patient education

Concordance of AI-Assisted Sleep Scoring with Expert Manual Scoring

FDA Approves Wireless Home Sleep Test

Clinical Validation of the DormoTech Device for Comprehensive Sleep Monitoring

The Effects of CPAP Therapy on Sleep Quality, Deep Sleep and Arousals