Database Open Access

Icentia11k Single Lead Continuous Raw Electrocardiogram Dataset

Shawn Tan Satya Ortiz-Gagné Nicolas Beaudoin-Gagnon Pierre Fecteau Aaron Courville Yoshua Bengio Joseph Paul Cohen

Published: April 12, 2022. Version: 1.0


When using this resource, please cite: (show more options)
Tan, S., Ortiz-Gagné, S., Beaudoin-Gagnon, N., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2022). Icentia11k Single Lead Continuous Raw Electrocardiogram Dataset (version 1.0). PhysioNet. https://doi.org/10.13026/kk0v-r952.

Additionally, please cite the original publication:

Tan, S., Androz, G., Ortiz-Gagné, S., Chamseddine, A., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2021, October 21). Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery. Computing in Cardiology Conference (CinC).

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

This is a dataset of continuous raw electrocardiogram (ECG) signals containing 11 thousand patients and 2 billion labelled beats. The signals were recorded with a 16-bit resolution at 250Hz with a fixed chest mounted single lead probe for up to 2 weeks. The average age of the patient is 62.2±17.4 years. 20 technologists annotated each beat's type (Normal, Premature Atrial Contraction, Premature Ventricular contraction) and rhythm (Normal Sinusal Rhythm, Atrial Fibrillation, Atrial Flutter).


Background

Arrhythmia detection is presently performed by cardiologists or technologists familiar with ECG readings. Recently, supervised machine learning has been successfully applied to perform automated detection of many arrhythmias [1,2,3,4]. However, there may be ECG anomalies that warrant further investigation because they do not fit the morphology of presently known arrhythmia. We seek to use a data driven approach to finding these differences that cardiologists have anecdotally observed. Existing public ECG datasets include the the MIMIC-III Waveform Database and the ECG-ViEW II dataset [5,6]. Here we present Icentia11k, a dataset of continuous raw electrocardiogram (ECG) signals containing 11 thousand patients and 2 billion labelled beats


Methods

Our data is collected by the CardioSTAT, a single-lead heart monitor device from Icentia [7]. The raw signals were recorded with a 16-bit resolution and sampled at 250Hz with the CardioSTAT in a modified lead 1 position. The wealth of data this provides us can allow us to improve on the techniques currently used by the medical industry to process days worth of ECG data, and perhaps to catch anomalous events earlier than currently possible.

The dataset is processed from data provided by 11,000 patients who used the CardioSTAT device predominantly in Ontario, Canada, from various medical centers. While the device captures ECG data for up to two weeks, the majority of the prescribed duration of wear was one week.

The data is analyzed by Icentia's team of 20 technologists who performed annotation using proprietary analysis tools. Initial beat detection is performed automatically and then a technologist analyses the record labelling beat and rhythm types performing a full disclosure analysis (i.e. they see the whole recording). Finally the analysis is approved by a senior technologist before making it to the dataset.

The ethics institutional review boards at the Université de Montréal approved the study and release of data (CERSES-19-065-D).


Data Description

We segment each patient record into segments of 2 20 + 1 2^{20}+1  signal samples (≈70 minutes). This longer time context was informed by discussions with technologists: the context is useful for rhythm detection. We made it a power of two with a middle sample to allow for easier convolution stack parameterization. From this, we randomly select 50 of the segments and their respective labels from the list of segments. The goal here is to reduce the size of the dataset while maintaining a fair representation of each patient.

Data structure

The data is structured into patients and segments.

Patient level (3-14 days)

At this level, the data can capture features which vary in a systematic way and not isolated events, like the placement of the probes or patient specific noise.

Segment level (1,048,577 int16 samples, approximately 1 hour)

A cardiologist can look at a specific segment and identify patterns which indicate a disease while ignoring noise from the signal such as a unique signal amplitude. Looking at trends in the segment help to correctly identify arrhythmia as half an hour provides the necessary context to observe the stress of a specific activity.

Aggregate statistics

Aggregate statistics are shown below:

Statistic # (units)
Number of patients 11,000
Number of labeled beats 2,774,054,987
Sample rate 250Hz
Segment size 2 20 + 1 2^{20}+1  = 1,048,577
Total number of segments 541,794 (not all patients have enough for 50 segments)

Beats are annotated in ann.symbols at the R timepoint in the QRS complex. The timepoint in the rec.signal for each annotation is found in ann.sample Below shows the counts for beats over the entire dataset. There are also annotations with a '+' symbol which just mean there is a rhythm annotation (next table).

Symbol Beat Description Count
N Normal 2,061,141,216
S ESSV (PAC): Premature or ectopic supraventricular beat, premature atrial contraction 19,346,728
V ESV (PVC): Premature ventricular contraction, premature ventricular contraction 17,203,041
Q Undefined: Unclassifiable beat 676,364,002

Rhythms are annotated in ann.aux_note at each timepoint. For example a normal sinusal rhythm will start with a '(N' annotation and then end with a ')' annotation. The entire sequence in between is annotated as a normal sinusal rhythm. Below are the counts of each annotated region which could be one beat or thousands.

Symbol Rhythm Labels Count
(N ... ) NSR (Normal sinusal rhythm) 16,083,158
(AFIB ... ) AFib (Atrial fibrillation) 848,564
(AFL ... ) AFlutter (Atrial flutter) 313,251

Details on how the dataset is encoded into wfdb format are available on GitHub [8].


Usage Notes

By releasing this dataset, we seek to enable the research community to develop better models for detection of arrhythmia and related heart disease. The dataset is described in more detail in our accompanying paper [9], which also describes our efforts to evaluation existing models for classification of arrhythmia. Code for working with the data, including executable notebooks, is available on GitHub [8].

Example code

To look at patient 9000 and segment 0 the filename would be: p09/09000/p09000_s00 and it can loaded using wfdb as follows:

import wfdb
patient_id=9000
segment_id=0
start=2000
length=1024
filename = f'{data_path}/p0{str(patient_id)[:1]}/p{patient_id:05d}/p{patient_id:05d}_s{segment_id:02d}'
rec = wfdb.rdrecord(filename, sampfrom=start, sampto=start+length)
ann = wfdb.rdann(filename, "atr", sampfrom=start, sampto=start+length, shift_samps=True)
wfdb.plot_wfdb(rec, ann, plot_sym=True, figsize=(15,4));

Limitations

It should be noted that since the people who wear the device are patients, the dataset does not represent a true random sample of the global population.  For one, the average age of the patient is 62.2±17.4 years of age.  Furthermore, whereas the CardioSTAT can be worn by any patient, it is mostly used for third line exam, so the majority of records in the dataset exhibit arrhythmias. No particular effort has been done on patient selection except data collection has been conducted over years 2017 and 2018.


Release Notes

Version 1.0: First release on PhysioNet. Prior to this release data was made available on AcademicTorrents [10].


Ethics

The authors declare no ethics concerns. The ethics institutional review boards at the University of Montreal approved the study and release of data (#CERSES-19-065-D).


Acknowledgements

We thank Leon Glass, Yannick Le Devehat, Germain Ethier, and Margaux Luck, Kris Sankaran, and Gabriele Prato for useful discussions. This work is partially funded by a grant from Icentia, Fonds de Recherche en Santé du Québec, and the Institut de valorisation des donnees (IVADO). This work utilized the supercomputing facilities managed by Compute Canada and Calcul Quebec. We thank AcademicTorrents.com for making data available for our research.


Conflicts of Interest

None


References

  1. Hannun AY, Rajpurkar P, Haghpanahi M, Tison GH, Bourn C, Turakhia MP, Ng AY. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nature Medicine 2019
  2. Yıldırım O, Pławiak P, Tan RS, Acharya UR. Arrhythmia detection using deep convolutional neural network with long duration ecg signals. Computers in biology and medicine 2018.
  3. Minchole A, Rodriguez B. Artificial intelligence for the electrocardiogram. Nature Medicine 1 2019.
  4. Porumb M, Iadanza E, Massaro S, Pecchia L. A convolutional neural network approach to detect congestive heart failure. Biomedical Signal Processing and Control 2020.
  5. Johnson, A., Pollard, T., & Mark, R. (2016). MIMIC-III Clinical Database (version 1.4). PhysioNet. https://doi.org/10.13026/C2XW26.
  6. Kim YG, Shin D, Park MY, Lee S, Jeon MS, Yoon D, Park RW. ECG-ViEW II, a freely accessible electrocardiogram database. PloS one 2017.
  7. Icentia website. https://www.icentia.com/
  8. Icentia11k project on GitHub. https://github.com/shawntan/icentia-ecg/tree/master/physionet
  9. Tan, S., Androz, G., Ortiz-Gagné, S., Chamseddine, A., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2021, October 21). Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery. Computing in Cardiology Conference (CinC). https://www.cinc.org/2021/Program/accepted/229_Preprint.pdf
  10. Icentia11k Dataset on Academic Torrents. https://academictorrents.com/details/af04abfe9a3c96b30e5dd029eb185e19a7055272

Share
Access

Access Policy:
Anyone can access the files, as long as they conform to the terms of the specified license.

License (for files):
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License

Corresponding Author
You must be logged in to view the contact information.

Files

Total uncompressed size: 1.1 TB.

Access the files

Visualize waveforms

Folder Navigation: <base>/p05
Name Size Modified
Parent Directory
p05000
p05001
p05002
p05003
p05004
p05005
p05006
p05007
p05008
p05009
p05010
p05011
p05012
p05013
p05014
p05015
p05016
p05017
p05018
p05019
p05020
p05021
p05022
p05023
p05024
p05025
p05026
p05027
p05028
p05029
p05030
p05031
p05032
p05033
p05034
p05035
p05036
p05037
p05038
p05039
p05040
p05041
p05042
p05043
p05044
p05045
p05046
p05047
p05048
p05049
p05050
p05051
p05052
p05053
p05054
p05055
p05056
p05057
p05058
p05059
p05060
p05061
p05062
p05063
p05064
p05065
p05066
p05067
p05068
p05069
p05070
p05071
p05072
p05073
p05074
p05075
p05076
p05077
p05078
p05079
p05080
p05081
p05082
p05083
p05084
p05085
p05086
p05087
p05088
p05089
p05090
p05091
p05092
p05093
p05094
p05095
p05096
p05097
p05098
p05099
p05100
p05101
p05102
p05103
p05104
p05105
p05106
p05107
p05108
p05109
p05110
p05111
p05112
p05113
p05114
p05115
p05116
p05117
p05118
p05119
p05120
p05121
p05122
p05123
p05124
p05125
p05126
p05127
p05128
p05129
p05130
p05131
p05132
p05133
p05134
p05135
p05136
p05137
p05138
p05139
p05140
p05141
p05142
p05143
p05144
p05145
p05146
p05147
p05148
p05149
p05150
p05151
p05152
p05153
p05154
p05155
p05156
p05157
p05158
p05159
p05160
p05161
p05162
p05163
p05164
p05165
p05166
p05167
p05168
p05169
p05170
p05171
p05172
p05173
p05174
p05175
p05176
p05177
p05178
p05179
p05180
p05181
p05182
p05183
p05184
p05185
p05186
p05187
p05188
p05189
p05190
p05191
p05192
p05193
p05194
p05195
p05196
p05197
p05198
p05199
p05200
p05201
p05202
p05203
p05204
p05205
p05206
p05207
p05208
p05209
p05210
p05211
p05212
p05213
p05214
p05215
p05216
p05217
p05218
p05219
p05220
p05221
p05222
p05223
p05224
p05225
p05226
p05227
p05228
p05229
p05230
p05231
p05232
p05233
p05234
p05235
p05236
p05237
p05238
p05239
p05240
p05241
p05242
p05243
p05244
p05245
p05246
p05247
p05248
p05249
p05250
p05251
p05252
p05253
p05254
p05255
p05256
p05257
p05258
p05259
p05260
p05261
p05262
p05263
p05264
p05265
p05266
p05267
p05268
p05269
p05270
p05271
p05272
p05273
p05274
p05275
p05276
p05277
p05278
p05279
p05280
p05281
p05282
p05283
p05284
p05285
p05286
p05287
p05288
p05289
p05290
p05291
p05292
p05293
p05294
p05295
p05296
p05297
p05298
p05299
p05300
p05301
p05302
p05303
p05304
p05305
p05306
p05307
p05308
p05309
p05310
p05311
p05312
p05313
p05314
p05315
p05316
p05317
p05318
p05319
p05320
p05321
p05322
p05323
p05324
p05325
p05326
p05327
p05328
p05329
p05330
p05331
p05332
p05333
p05334
p05335
p05336
p05337
p05338
p05339
p05340
p05341
p05342
p05343
p05344
p05345
p05346
p05347
p05348
p05349
p05350
p05351
p05352
p05353
p05354
p05355
p05356
p05357
p05358
p05359
p05360
p05361
p05362
p05363
p05364
p05365
p05366
p05367
p05368
p05369
p05370
p05371
p05372
p05373
p05374
p05375
p05376
p05377
p05378
p05379
p05380
p05381
p05382
p05383
p05384
p05385
p05386
p05387
p05388
p05389
p05390
p05391
p05392
p05393
p05394
p05395
p05396
p05397
p05398
p05399
p05400
p05401
p05402
p05403
p05404
p05405
p05406
p05407
p05408
p05409
p05410
p05411
p05412
p05413
p05414
p05415
p05416
p05417
p05418
p05419
p05420
p05421
p05422
p05423
p05424
p05425
p05426
p05427
p05428
p05429
p05430
p05431
p05432
p05433
p05434
p05435
p05436
p05437
p05438
p05439
p05440
p05441
p05442
p05443
p05444
p05445
p05446
p05447
p05448
p05449
p05450
p05451
p05452
p05453
p05454
p05455
p05456
p05457
p05458
p05459
p05460
p05461
p05462
p05463
p05464
p05465
p05466
p05467
p05468
p05469
p05470
p05471
p05472
p05473
p05474
p05475
p05476
p05477
p05478
p05479
p05480
p05481
p05482
p05483
p05484
p05485
p05486
p05487
p05488
p05489
p05490
p05491
p05492
p05493
p05494
p05495
p05496
p05497
p05498
p05499
p05500
p05501
p05502
p05503
p05504
p05505
p05506
p05507
p05508
p05509
p05510
p05511
p05512
p05513
p05514
p05515
p05516
p05517
p05518
p05519
p05520
p05521
p05522
p05523
p05524
p05525
p05526
p05527
p05528
p05529
p05530
p05531
p05532
p05533
p05534
p05535
p05536
p05537
p05538
p05539
p05540
p05541
p05542
p05543
p05544
p05545
p05546
p05547
p05548
p05549
p05550
p05551
p05552
p05553
p05554
p05555
p05556
p05557
p05558
p05559
p05560
p05561
p05562
p05563
p05564
p05565
p05566
p05567
p05568
p05569
p05570
p05571
p05572
p05573
p05574
p05575
p05576
p05577
p05578
p05579
p05580
p05581
p05582
p05583
p05584
p05585
p05586
p05587
p05588
p05589
p05590
p05591
p05592
p05593
p05594
p05595
p05596
p05597
p05598
p05599
p05600
p05601
p05602
p05603
p05604
p05605
p05606
p05607
p05608
p05609
p05610
p05611
p05612
p05613
p05614
p05615
p05616
p05617
p05618
p05619
p05620
p05621
p05622
p05623
p05624
p05625
p05626
p05627
p05628
p05629
p05630
p05631
p05632
p05633
p05634
p05635
p05636
p05637
p05638
p05639
p05640
p05641
p05642
p05643
p05644
p05645
p05646
p05647
p05648
p05649
p05650
p05651
p05652
p05653
p05654
p05655
p05656
p05657
p05658
p05659
p05660
p05661
p05662
p05663
p05664
p05665
p05666
p05667
p05668
p05669
p05670
p05671
p05672
p05673
p05674
p05675
p05676
p05677
p05678
p05679
p05680
p05681
p05682
p05683
p05684
p05685
p05686
p05687
p05688
p05689
p05690
p05691
p05692
p05693
p05694
p05695
p05696
p05697
p05698
p05699
p05700
p05701
p05702
p05703
p05704
p05705
p05706
p05707
p05708
p05709
p05710
p05711
p05712
p05713
p05714
p05715
p05716
p05717
p05718
p05719
p05720
p05721
p05722
p05723
p05724
p05725
p05726
p05727
p05728
p05729
p05730
p05731
p05732
p05733
p05734
p05735
p05736
p05737
p05738
p05739
p05740
p05741
p05742
p05743
p05744
p05745
p05746
p05747
p05748
p05749
p05750
p05751
p05752
p05753
p05754
p05755
p05756
p05757
p05758
p05759
p05760
p05761
p05762
p05763
p05764
p05765
p05766
p05767
p05768
p05769
p05770
p05771
p05772
p05773
p05774
p05775
p05776
p05777
p05778
p05779
p05780
p05781
p05782
p05783
p05784
p05785
p05786
p05787
p05788
p05789
p05790
p05791
p05792
p05793
p05794
p05795
p05796
p05797
p05798
p05799
p05800
p05801
p05802
p05803
p05804
p05805
p05806
p05807
p05808
p05809
p05810
p05811
p05812
p05813
p05814
p05815
p05816
p05817
p05818
p05819
p05820
p05821
p05822
p05823
p05824
p05825
p05826
p05827
p05828
p05829
p05830
p05831
p05832
p05833
p05834
p05835
p05836
p05837
p05838
p05839
p05840
p05841
p05842
p05843
p05844
p05845
p05846
p05847
p05848
p05849
p05850
p05851
p05852
p05853
p05854
p05855
p05856
p05857
p05858
p05859
p05860
p05861
p05862
p05863
p05864
p05865
p05866
p05867
p05868
p05869
p05870
p05871
p05872
p05873
p05874
p05875
p05876
p05877
p05878
p05879
p05880
p05881
p05882
p05883
p05884
p05885
p05886
p05887
p05888
p05889
p05890
p05891
p05892
p05893
p05894
p05895
p05896
p05897
p05898
p05899
p05900
p05901
p05902
p05903
p05904
p05905
p05906
p05907
p05908
p05909
p05910
p05911
p05912
p05913
p05914
p05915
p05916
p05917
p05918
p05919
p05920
p05921
p05922
p05923
p05924
p05925
p05926
p05927
p05928
p05929
p05930
p05931
p05932
p05933
p05934
p05935
p05936
p05937
p05938
p05939
p05940
p05941
p05942
p05943
p05944
p05945
p05946
p05947
p05948
p05949
p05950
p05951
p05952
p05953
p05954
p05955
p05956
p05957
p05958
p05959
p05960
p05961
p05962
p05963
p05964
p05965
p05966
p05967
p05968
p05969
p05970
p05971
p05972
p05973
p05974
p05975
p05976
p05977
p05978
p05979
p05980
p05981
p05982
p05983
p05984
p05985
p05986
p05987
p05988
p05989
p05990
p05991
p05992
p05993
p05994
p05995
p05996
p05997
p05998
p05999