
class yasa.SleepStatsAgreement(ref_data, obs_data, *, ref_scorer='Reference', obs_scorer='Observed', agreement=1.96, confidence=0.95, alpha=0.05, verbose=True, bootstrap_kwargs={})#

Evaluate agreement between sleep statistics reported by two different scorers. Evaluation includes bias and limits of agreement (as well as both their confidence intervals), various plotting options, and calibration functions for correcting biased values from the observed scorer.

Features include: * Get summary calculations of bias, limits of agreement, and their confidence intervals. * Test statistical assumptions of bias, limits of agreement, and their confidence intervals, and apply corrective procedures when the assumptions are not met. * Get bias and limits of agreement in a string-formatted table. * Calibrate new data to correct for biases in observed data. * Return individual calibration functions. * Visualize discrepancies for outlier inspection. * Visualize Bland-Altman plots.

Added in version 0.7.0.


A pandas.DataFrame with sleep statistics from the reference scorer. Rows are unique observations and columns are unique sleep statistics.


A pandas.DataFrame with sleep statistics from the observed scorer. Rows are unique observations and columns are unique sleep statistics. Shape, index, and columns must be identical to ref_data.


Name of the reference scorer.


Name of the observed scorer.


Multiple of the standard deviation to plot agreement limits. The default is 1.96, which corresponds to a 95% confidence interval if the differences are normally distributed.


agreement gets adjusted for regression-modeled limits of agreement.


The percentage confidence interval for the confidence intervals that are applied to bias and limits of agreement. The same confidence interval percentage is applied to both standard and bootstrapped confidence intervals.


Alpha cutoff used for all assumption tests.

verbosebool or str

Verbose level. Default (False) will only print warning and error messages. The logging levels are ‘debug’, ‘info’, ‘warning’, ‘error’, and ‘critical’. For most users the choice is between ‘info’ (or verbose=True) and warning (verbose=False).


Sleep statistics that are identical between scorers are removed from analysis.

Many steps here are influenced by guidelines proposed in Menghini et al., 2021 [Menghini2021]. See https://sri-human-sleep.github.io/sleep-trackers-performance/AnalyticalPipeline_v1.0.0.html



Menghini, L., Cellini, N., Goldstone, A., Baker, F. C., & de Zambotti, M. (2021). A standardized framework for testing the performance of sleep-tracking technology: step-by-step guidelines and open-source code. SLEEP, 44(2), zsaa170. https://doi.org/10.1093/sleep/zsaa170


>>> import pandas as pd
>>> import yasa
>>> # Generate fake reference and observed datasets with similar sleep statistics
>>> ref_scorer = "Henri"
>>> obs_scorer = "Piéron"
>>> ref_hyps = [yasa.simulate_hypnogram(tib=600, scorer=ref_scorer, seed=i) for i in range(20)]
>>> obs_hyps = [h.simulate_similar(scorer=obs_scorer, seed=i) for i, h in enumerate(ref_hyps)]
>>> # Generate sleep statistics from hypnograms using EpochByEpochAgreement
>>> eea = yasa.EpochByEpochAgreement(ref_hyps, obs_hyps)
>>> sstats = eea.get_sleep_stats()
>>> ref_sstats = sstats.loc[ref_scorer]
>>> obs_sstats = sstats.loc[obs_scorer]
>>> # Create SleepStatsAgreement instance
>>> ssa = yasa.SleepStatsAgreement(ref_sstats, obs_sstats)
>>> ssa.summary().round(1).head(3)
variable   bias_intercept             ...   uloa_parm
interval           center lower upper ...      center lower upper
sleep_stat                            ...
%N1                  -5.4 -13.9   3.2 ...         6.1   3.7   8.5
%N2                 -27.3 -49.1  -5.6 ...        12.4   7.2  17.6
%N3                  -9.1 -23.8   5.5 ...        20.4  12.6  28.3
>>> ssa.get_table().head(3)[["bias", "loa"]]
                      bias                            loa
%N1                   0.25  Bias ± 2.46 * (-0.00 + 1.00x)
%N2         -27.34 + 0.55x   Bias ± 2.46 * (0.00 + 1.00x)
%N3                   1.38   Bias ± 2.46 * (0.00 + 1.00x)
>>> ssa.assumptions.head(3)
            unbiased  normal  constant_bias  homoscedastic
%N1             True    True           True          False
%N2             True    True          False          False
%N3             True    True           True          False
>>> ssa.auto_methods.head(3)
            bias   loa    ci
%N1         parm  regr  parm
%N2         regr  regr  parm
%N3         parm  regr  parm
>>> ssa.get_table(bias_method="parm", loa_method="parm").head(3)[["bias", "loa"]]
             bias            loa
%N1          0.25    -5.55, 6.06
%N2         -0.23  -12.87, 12.40
%N3          1.38  -17.67, 20.44
>>> new_hyps = [h.simulate_similar(scorer="Kelly", seed=i) for i, h in enumerate(obs_hyps)]
>>> new_sstats = pd.Series(new_hyps).map(lambda h: h.sleep_statistics()).apply(pd.Series)
>>> new_sstats = new_sstats[["N1", "TST", "WASO"]]
>>> new_sstats.round(1).head(5)
     N1    TST   WASO
0  42.5  439.5  147.5
1  84.0  550.0   38.5
2  53.5  489.0  103.0
3  57.0  469.5  120.0
4  71.0  531.0   69.0
>>> new_stats_calibrated = ssa.calibrate_stats(new_sstats, bias_method="auto")
>>> new_stats_calibrated.round(1).head(5)
     N1    TST   WASO
0  42.9  433.8  150.0
1  84.4  544.2   41.0
2  53.9  483.2  105.5
3  57.4  463.8  122.5
4  71.4  525.2   71.5
>>> import matplotlib.pyplot as plt
>>> ax = ssa.plot_discrepancies_heatmap()
>>> ax.set_title("Sleep statistic discrepancies")
>>> plt.tight_layout()
>>> ssa.plot_blandaltman()
__init__(ref_data, obs_data, *, ref_scorer='Reference', obs_scorer='Observed', agreement=1.96, confidence=0.95, alpha=0.05, verbose=True, bootstrap_kwargs={})#


__init__(ref_data, obs_data, *[, ...])

calibrate(data[, bias_method, adjust_all])

Calibrate a DataFrame of sleep statistics from a new scorer based on observed biases in obs_data/obs_scorer.


Return a function for calibrating a specific sleep statistic, based on observed biases in obs_data/obs_scorer.

get_table([bias_method, loa_method, ...])

Return a DataFrame with bias, loa, bias_ci, loa_ci as string equations.


Return a DataFrame that includes all calculated metrics: * Parametric bias * Parametric lower and upper limits of agreement * Regression intercept and slope for modeled bias * Regression intercept and slope for modeled limits of agreement * Lower and upper confidence intervals for all metrics



A pandas.DataFrame containing boolean values indicating the pass/fail status of all statistical tests performed to test assumptions.


A pandas.DataFrame containing the methods applied when 'auto' is selected.


A long-format pandas.DataFrame containing all raw sleep statistics from ref_data and obs_data.


The number of sessions.


The name of the observed scorer.


The name of the reference scorer.


Return a list of all sleep statistics included in the agreement analyses.