yasa.EpochByEpochAgreement#

class yasa.EpochByEpochAgreement(ref_hyps, obs_hyps)#

Evaluate agreement between two hypnograms or two collections of hypnograms.

Evaluation includes averaged agreement scores, one-vs-rest agreement scores, agreement scores summarized across all sleep and summarized by sleep stage, and various plotting options to visualize the two hypnograms simultaneously. See examples for more detail.

Added in version 0.7.0.

Parameters:
ref_hypsiterable of yasa.Hypnogram

A collection of reference hypnograms (i.e., those considered ground-truth).

Each yasa.Hypnogram in ref_hyps must have the same scorer.

If a dict, key values are use to generate unique sleep session IDs. If any other iterable (e.g., list or tuple), then unique sleep session IDs are automatically generated.

obs_hypsiterable of yasa.Hypnogram

A collection of observed hypnograms (i.e., those to be evaluated).

Each yasa.Hypnogram in obs_hyps must have the same scorer, and this scorer must be different than the scorer of hypnograms in ref_hyps.

If a dict, key values must match those of ref_hyps.

.. important::

It is assumed that the order of hypnograms are the same in ref_hyps and obs_hyps. For example, the third hypnogram in ref_hyps and obs_hyps must come from the same sleep session, and they must only differ in that they have different scorers.

.. seealso:: For comparing just two hypnograms, use :py:meth:`yasa.Hynogram.evaluate`.

Notes

Many steps here are influenced by guidelines proposed in Menghini et al., 2021 [Menghini2021]. See https://sri-human-sleep.github.io/sleep-trackers-performance/AnalyticalPipeline_v1.0.0.html

References

[Menghini2021]

Menghini, L., Cellini, N., Goldstone, A., Baker, F. C., & de Zambotti, M. (2021). A standardized framework for testing the performance of sleep-tracking technology: step-by-step guidelines and open-source code. SLEEP, 44(2), zsaa170. https://doi.org/10.1093/sleep/zsaa170

Examples

>>> import yasa
>>> ref_hyps = [yasa.simulate_hypnogram(tib=600, scorer="Human", seed=i) for i in range(10)]
>>> obs_hyps = [h.simulate_similar(scorer="YASA", seed=i) for i, h in enumerate(ref_hyps)]
>>> ebe = yasa.EpochByEpochAgreement(ref_hyps, obs_hyps)
>>> agr = ebe.get_agreement()
>>> agr.head(5).round(2)
          accuracy  balanced_acc  kappa   mcc  precision  recall     f1
sleep_id
1             0.31          0.26   0.07  0.07       0.31    0.31   0.31
2             0.33          0.33   0.14  0.14       0.35    0.33   0.34
3             0.35          0.24   0.06  0.06       0.35    0.35   0.35
4             0.22          0.21   0.01  0.01       0.21    0.22   0.21
5             0.21          0.17  -0.06 -0.06       0.20    0.21   0.21
>>> ebe.get_agreement_bystage().head(12).round(3)
                fbeta  precision  recall  support
stage sleep_id
WAKE  1         0.391      0.371   0.413    189.0
      2         0.299      0.276   0.326    184.0
      3         0.234      0.204   0.275    255.0
      4         0.268      0.285   0.252    321.0
      5         0.228      0.230   0.227    181.0
      6         0.407      0.384   0.433    284.0
      7         0.362      0.296   0.467    287.0
      8         0.298      0.519   0.209    263.0
      9         0.210      0.191   0.233    313.0
      10        0.369      0.420   0.329    362.0
N1    1         0.185      0.185   0.185    124.0
      2         0.121      0.131   0.112    160.0
>>> ebe.get_confusion_matrix(sleep_id=1)
YASA   WAKE  N1   N2  N3  REM
Human
WAKE     78  24   50   3   34
N1       23  23   43  15   20
N2       60  58  183  43  139
N3       30  10   50   5   32
REM      19   9  121  50   78
>>> import matplotlib.pyplot as plt
>>> fig, ax = plt.subplots(figsize=(6, 3), constrained_layout=True)
>>> ebe.plot_hypnograms(sleep_id=10)
>>> fig, ax = plt.subplots(figsize=(6, 3))
>>> ebe.plot_hypnograms(
>>>     sleep_id=8, ax=ax, obs_kwargs={"color": "red", "lw": 2, "ls": "dotted"}
>>> )
>>> plt.tight_layout()
>>> session = 8
>>> fig, ax = plt.subplots(figsize=(6.5, 2.5), constrained_layout=True)
>>> style_a = dict(alpha=1, lw=2.5, ls="solid", color="gainsboro", label="Michel")
>>> style_b = dict(alpha=1, lw=2.5, ls="solid", color="cornflowerblue", label="Jouvet")
>>> legend_style = dict(
>>>     title="Scorer", frameon=False, ncol=2, loc="lower center", bbox_to_anchor=(0.5, 0.9)
>>> )
>>> ax = ebe.plot_hypnograms(
>>>     sleep_id=session, ref_kwargs=style_a, obs_kwargs=style_b, legend=legend_style, ax=ax
>>> )
>>> acc = ebe.get_agreement().multiply(100).at[session, "accuracy"]
>>> ax.text(
>>>     0.01, 1, f"Accuracy = {acc:.0f}%", ha="left", va="bottom", transform=ax.transAxes
>>> )

When comparing only 2 hypnograms, use the evaluate method:

>>> hypno_a = yasa.simulate_hypnogram(tib=90, scorer="RaterA", seed=8)
>>> hypno_b = hypno_a.simulate_similar(scorer="RaterB", seed=9)
>>> ebe = hypno_a.evaluate(hypno_b)
>>> ebe.get_confusion_matrix()
RaterB  WAKE  N1  N2  N3
RaterA
WAKE      71   2  20   8
N1         1   0   9   0
N2        12   4  25   0
N3        24   0   1   3
__init__(ref_hyps, obs_hyps)#

Methods

__init__(ref_hyps, obs_hyps)

get_agreement([sample_weight, scorers])

Return a pandas.DataFrame of weighted (i.e., averaged) agreement scores.

get_agreement_bystage([beta])

Return a pandas.DataFrame of unweighted (i.e., one-vs-rest) agreement scores.

get_confusion_matrix([sleep_id, agg_func])

Return a ref_hyp/``obs_hyp``confusion matrix from either a single session or all sessions concatenated together.

get_sleep_stats()

Return a pandas.DataFrame of sleep statistics for each hypnogram derived from both reference and observed scorers.

multi_scorer(df, scorers)

Compute multiple agreement scores from a 2-column dataframe (an optional 3rd column may contain sample weights).

plot_hypnograms([sleep_id, legend, ax, ...])

Plot the two hypnograms of one session overlapping on the same axis.

summary([by_stage])

Return group-level agreement scores.

Attributes

data

A pandas.DataFrame including all hypnograms.

n_sleeps

The number of unique sleep sessions.

obs_scorer

The name of the observed scorer.

ref_scorer

The name of the reference scorer.