Database Open Access

Icentia11k Single Lead Continuous Raw Electrocardiogram Dataset

Shawn Tan Satya Ortiz-Gagné Nicolas Beaudoin-Gagnon Pierre Fecteau Aaron Courville Yoshua Bengio Joseph Paul Cohen

Published: April 12, 2022. Version: 1.0


When using this resource, please cite: (show more options)
Tan, S., Ortiz-Gagné, S., Beaudoin-Gagnon, N., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2022). Icentia11k Single Lead Continuous Raw Electrocardiogram Dataset (version 1.0). PhysioNet. https://doi.org/10.13026/kk0v-r952.

Additionally, please cite the original publication:

Tan, S., Androz, G., Ortiz-Gagné, S., Chamseddine, A., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2021, October 21). Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery. Computing in Cardiology Conference (CinC).

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

This is a dataset of continuous raw electrocardiogram (ECG) signals containing 11 thousand patients and 2 billion labelled beats. The signals were recorded with a 16-bit resolution at 250Hz with a fixed chest mounted single lead probe for up to 2 weeks. The average age of the patient is 62.2±17.4 years. 20 technologists annotated each beat's type (Normal, Premature Atrial Contraction, Premature Ventricular contraction) and rhythm (Normal Sinusal Rhythm, Atrial Fibrillation, Atrial Flutter).


Background

Arrhythmia detection is presently performed by cardiologists or technologists familiar with ECG readings. Recently, supervised machine learning has been successfully applied to perform automated detection of many arrhythmias [1,2,3,4]. However, there may be ECG anomalies that warrant further investigation because they do not fit the morphology of presently known arrhythmia. We seek to use a data driven approach to finding these differences that cardiologists have anecdotally observed. Existing public ECG datasets include the the MIMIC-III Waveform Database and the ECG-ViEW II dataset [5,6]. Here we present Icentia11k, a dataset of continuous raw electrocardiogram (ECG) signals containing 11 thousand patients and 2 billion labelled beats


Methods

Our data is collected by the CardioSTAT, a single-lead heart monitor device from Icentia [7]. The raw signals were recorded with a 16-bit resolution and sampled at 250Hz with the CardioSTAT in a modified lead 1 position. The wealth of data this provides us can allow us to improve on the techniques currently used by the medical industry to process days worth of ECG data, and perhaps to catch anomalous events earlier than currently possible.

The dataset is processed from data provided by 11,000 patients who used the CardioSTAT device predominantly in Ontario, Canada, from various medical centers. While the device captures ECG data for up to two weeks, the majority of the prescribed duration of wear was one week.

The data is analyzed by Icentia's team of 20 technologists who performed annotation using proprietary analysis tools. Initial beat detection is performed automatically and then a technologist analyses the record labelling beat and rhythm types performing a full disclosure analysis (i.e. they see the whole recording). Finally the analysis is approved by a senior technologist before making it to the dataset.

The ethics institutional review boards at the Université de Montréal approved the study and release of data (CERSES-19-065-D).


Data Description

We segment each patient record into segments of 2 20 + 1 2^{20}+1  signal samples (≈70 minutes). This longer time context was informed by discussions with technologists: the context is useful for rhythm detection. We made it a power of two with a middle sample to allow for easier convolution stack parameterization. From this, we randomly select 50 of the segments and their respective labels from the list of segments. The goal here is to reduce the size of the dataset while maintaining a fair representation of each patient.

Data structure

The data is structured into patients and segments.

Patient level (3-14 days)

At this level, the data can capture features which vary in a systematic way and not isolated events, like the placement of the probes or patient specific noise.

Segment level (1,048,577 int16 samples, approximately 1 hour)

A cardiologist can look at a specific segment and identify patterns which indicate a disease while ignoring noise from the signal such as a unique signal amplitude. Looking at trends in the segment help to correctly identify arrhythmia as half an hour provides the necessary context to observe the stress of a specific activity.

Aggregate statistics

Aggregate statistics are shown below:

Statistic # (units)
Number of patients 11,000
Number of labeled beats 2,774,054,987
Sample rate 250Hz
Segment size 2 20 + 1 2^{20}+1  = 1,048,577
Total number of segments 541,794 (not all patients have enough for 50 segments)

Beats are annotated in ann.symbols at the R timepoint in the QRS complex. The timepoint in the rec.signal for each annotation is found in ann.sample Below shows the counts for beats over the entire dataset. There are also annotations with a '+' symbol which just mean there is a rhythm annotation (next table).

Symbol Beat Description Count
N Normal 2,061,141,216
S ESSV (PAC): Premature or ectopic supraventricular beat, premature atrial contraction 19,346,728
V ESV (PVC): Premature ventricular contraction, premature ventricular contraction 17,203,041
Q Undefined: Unclassifiable beat 676,364,002

Rhythms are annotated in ann.aux_note at each timepoint. For example a normal sinusal rhythm will start with a '(N' annotation and then end with a ')' annotation. The entire sequence in between is annotated as a normal sinusal rhythm. Below are the counts of each annotated region which could be one beat or thousands.

Symbol Rhythm Labels Count
(N ... ) NSR (Normal sinusal rhythm) 16,083,158
(AFIB ... ) AFib (Atrial fibrillation) 848,564
(AFL ... ) AFlutter (Atrial flutter) 313,251

Details on how the dataset is encoded into wfdb format are available on GitHub [8].


Usage Notes

By releasing this dataset, we seek to enable the research community to develop better models for detection of arrhythmia and related heart disease. The dataset is described in more detail in our accompanying paper [9], which also describes our efforts to evaluation existing models for classification of arrhythmia. Code for working with the data, including executable notebooks, is available on GitHub [8].

Example code

To look at patient 9000 and segment 0 the filename would be: p09/09000/p09000_s00 and it can loaded using wfdb as follows:

import wfdb
patient_id=9000
segment_id=0
start=2000
length=1024
filename = f'{data_path}/p0{str(patient_id)[:1]}/p{patient_id:05d}/p{patient_id:05d}_s{segment_id:02d}'
rec = wfdb.rdrecord(filename, sampfrom=start, sampto=start+length)
ann = wfdb.rdann(filename, "atr", sampfrom=start, sampto=start+length, shift_samps=True)
wfdb.plot_wfdb(rec, ann, plot_sym=True, figsize=(15,4));

Limitations

It should be noted that since the people who wear the device are patients, the dataset does not represent a true random sample of the global population.  For one, the average age of the patient is 62.2±17.4 years of age.  Furthermore, whereas the CardioSTAT can be worn by any patient, it is mostly used for third line exam, so the majority of records in the dataset exhibit arrhythmias. No particular effort has been done on patient selection except data collection has been conducted over years 2017 and 2018.


Release Notes

Version 1.0: First release on PhysioNet. Prior to this release data was made available on AcademicTorrents [10].


Ethics

The authors declare no ethics concerns. The ethics institutional review boards at the University of Montreal approved the study and release of data (#CERSES-19-065-D).


Acknowledgements

We thank Leon Glass, Yannick Le Devehat, Germain Ethier, and Margaux Luck, Kris Sankaran, and Gabriele Prato for useful discussions. This work is partially funded by a grant from Icentia, Fonds de Recherche en Santé du Québec, and the Institut de valorisation des donnees (IVADO). This work utilized the supercomputing facilities managed by Compute Canada and Calcul Quebec. We thank AcademicTorrents.com for making data available for our research.


Conflicts of Interest

None


References

  1. Hannun AY, Rajpurkar P, Haghpanahi M, Tison GH, Bourn C, Turakhia MP, Ng AY. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nature Medicine 2019
  2. Yıldırım O, Pławiak P, Tan RS, Acharya UR. Arrhythmia detection using deep convolutional neural network with long duration ecg signals. Computers in biology and medicine 2018.
  3. Minchole A, Rodriguez B. Artificial intelligence for the electrocardiogram. Nature Medicine 1 2019.
  4. Porumb M, Iadanza E, Massaro S, Pecchia L. A convolutional neural network approach to detect congestive heart failure. Biomedical Signal Processing and Control 2020.
  5. Johnson, A., Pollard, T., & Mark, R. (2016). MIMIC-III Clinical Database (version 1.4). PhysioNet. https://doi.org/10.13026/C2XW26.
  6. Kim YG, Shin D, Park MY, Lee S, Jeon MS, Yoon D, Park RW. ECG-ViEW II, a freely accessible electrocardiogram database. PloS one 2017.
  7. Icentia website. https://www.icentia.com/
  8. Icentia11k project on GitHub. https://github.com/shawntan/icentia-ecg/tree/master/physionet
  9. Tan, S., Androz, G., Ortiz-Gagné, S., Chamseddine, A., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2021, October 21). Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery. Computing in Cardiology Conference (CinC). https://www.cinc.org/2021/Program/accepted/229_Preprint.pdf
  10. Icentia11k Dataset on Academic Torrents. https://academictorrents.com/details/af04abfe9a3c96b30e5dd029eb185e19a7055272

Share
Access

Access Policy:
Anyone can access the files, as long as they conform to the terms of the specified license.

License (for files):
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License

Corresponding Author
You must be logged in to view the contact information.

Files

Total uncompressed size: 1.1 TB.

Access the files

Visualize waveforms

Folder Navigation: <base>/p02
Name Size Modified
Parent Directory
p02000
p02001
p02002
p02003
p02004
p02005
p02006
p02007
p02008
p02009
p02010
p02011
p02012
p02013
p02014
p02015
p02016
p02017
p02018
p02019
p02020
p02021
p02022
p02023
p02024
p02025
p02026
p02027
p02028
p02029
p02030
p02031
p02032
p02033
p02034
p02035
p02036
p02037
p02038
p02039
p02040
p02041
p02042
p02043
p02044
p02045
p02046
p02047
p02048
p02049
p02050
p02051
p02052
p02053
p02054
p02055
p02056
p02057
p02058
p02059
p02060
p02061
p02062
p02063
p02064
p02065
p02066
p02067
p02068
p02069
p02070
p02071
p02072
p02073
p02074
p02075
p02076
p02077
p02078
p02079
p02080
p02081
p02082
p02083
p02084
p02085
p02086
p02087
p02088
p02089
p02090
p02091
p02092
p02093
p02094
p02095
p02096
p02097
p02098
p02099
p02100
p02101
p02102
p02103
p02104
p02105
p02106
p02107
p02108
p02109
p02110
p02111
p02112
p02113
p02114
p02115
p02116
p02117
p02118
p02119
p02120
p02121
p02122
p02123
p02124
p02125
p02126
p02127
p02128
p02129
p02130
p02131
p02132
p02133
p02134
p02135
p02136
p02137
p02138
p02139
p02140
p02141
p02142
p02143
p02144
p02145
p02146
p02147
p02148
p02149
p02150
p02151
p02152
p02153
p02154
p02155
p02156
p02157
p02158
p02159
p02160
p02161
p02162
p02163
p02164
p02165
p02166
p02167
p02168
p02169
p02170
p02171
p02172
p02173
p02174
p02175
p02176
p02177
p02178
p02179
p02180
p02181
p02182
p02183
p02184
p02185
p02186
p02187
p02188
p02189
p02190
p02191
p02192
p02193
p02194
p02195
p02196
p02197
p02198
p02199
p02200
p02201
p02202
p02203
p02204
p02205
p02206
p02207
p02208
p02209
p02210
p02211
p02212
p02213
p02214
p02215
p02216
p02217
p02218
p02219
p02220
p02221
p02222
p02223
p02224
p02225
p02226
p02227
p02228
p02229
p02230
p02231
p02232
p02233
p02234
p02235
p02236
p02237
p02238
p02239
p02240
p02241
p02242
p02243
p02244
p02245
p02246
p02247
p02248
p02249
p02250
p02251
p02252
p02253
p02254
p02255
p02256
p02257
p02258
p02259
p02260
p02261
p02262
p02263
p02264
p02265
p02266
p02267
p02268
p02269
p02270
p02271
p02272
p02273
p02274
p02275
p02276
p02277
p02278
p02279
p02280
p02281
p02282
p02283
p02284
p02285
p02286
p02287
p02288
p02289
p02290
p02291
p02292
p02293
p02294
p02295
p02296
p02297
p02298
p02299
p02300
p02301
p02302
p02303
p02304
p02305
p02306
p02307
p02308
p02309
p02310
p02311
p02312
p02313
p02314
p02315
p02316
p02317
p02318
p02319
p02320
p02321
p02322
p02323
p02324
p02325
p02326
p02327
p02328
p02329
p02330
p02331
p02332
p02333
p02334
p02335
p02336
p02337
p02338
p02339
p02340
p02341
p02342
p02343
p02344
p02345
p02346
p02347
p02348
p02349
p02350
p02351
p02352
p02353
p02354
p02355
p02356
p02357
p02358
p02359
p02360
p02361
p02362
p02363
p02364
p02365
p02366
p02367
p02368
p02369
p02370
p02371
p02372
p02373
p02374
p02375
p02376
p02377
p02378
p02379
p02380
p02381
p02382
p02383
p02384
p02385
p02386
p02387
p02388
p02389
p02390
p02391
p02392
p02393
p02394
p02395
p02396
p02397
p02398
p02399
p02400
p02401
p02402
p02403
p02404
p02405
p02406
p02407
p02408
p02409
p02410
p02411
p02412
p02413
p02414
p02415
p02416
p02417
p02418
p02419
p02420
p02421
p02422
p02423
p02424
p02425
p02426
p02427
p02428
p02429
p02430
p02431
p02432
p02433
p02434
p02435
p02436
p02437
p02438
p02439
p02440
p02441
p02442
p02443
p02444
p02445
p02446
p02447
p02448
p02449
p02450
p02451
p02452
p02453
p02454
p02455
p02456
p02457
p02458
p02459
p02460
p02461
p02462
p02463
p02464
p02465
p02466
p02467
p02468
p02469
p02470
p02471
p02472
p02473
p02474
p02475
p02476
p02477
p02478
p02479
p02480
p02481
p02482
p02483
p02484
p02485
p02486
p02487
p02488
p02489
p02490
p02491
p02492
p02493
p02494
p02495
p02496
p02497
p02498
p02499
p02500
p02501
p02502
p02503
p02504
p02505
p02506
p02507
p02508
p02509
p02510
p02511
p02512
p02513
p02514
p02515
p02516
p02517
p02518
p02519
p02520
p02521
p02522
p02523
p02524
p02525
p02526
p02527
p02528
p02529
p02530
p02531
p02532
p02533
p02534
p02535
p02536
p02537
p02538
p02539
p02540
p02541
p02542
p02543
p02544
p02545
p02546
p02547
p02548
p02549
p02550
p02551
p02552
p02553
p02554
p02555
p02556
p02557
p02558
p02559
p02560
p02561
p02562
p02563
p02564
p02565
p02566
p02567
p02568
p02569
p02570
p02571
p02572
p02573
p02574
p02575
p02576
p02577
p02578
p02579
p02580
p02581
p02582
p02583
p02584
p02585
p02586
p02587
p02588
p02589
p02590
p02591
p02592
p02593
p02594
p02595
p02596
p02597
p02598
p02599
p02600
p02601
p02602
p02603
p02604
p02605
p02606
p02607
p02608
p02609
p02610
p02611
p02612
p02613
p02614
p02615
p02616
p02617
p02618
p02619
p02620
p02621
p02622
p02623
p02624
p02625
p02626
p02627
p02628
p02629
p02630
p02631
p02632
p02633
p02634
p02635
p02636
p02637
p02638
p02639
p02640
p02641
p02642
p02643
p02644
p02645
p02646
p02647
p02648
p02649
p02650
p02651
p02652
p02653
p02654
p02655
p02656
p02657
p02658
p02659
p02660
p02661
p02662
p02663
p02664
p02665
p02666
p02667
p02668
p02669
p02670
p02671
p02672
p02673
p02674
p02675
p02676
p02677
p02678
p02679
p02680
p02681
p02682
p02683
p02684
p02685
p02686
p02687
p02688
p02689
p02690
p02691
p02692
p02693
p02694
p02695
p02696
p02697
p02698
p02699
p02700
p02701
p02702
p02703
p02704
p02705
p02706
p02707
p02708
p02709
p02710
p02711
p02712
p02713
p02714
p02715
p02716
p02717
p02718
p02719
p02720
p02721
p02722
p02723
p02724
p02725
p02726
p02727
p02728
p02729
p02730
p02731
p02732
p02733
p02734
p02735
p02736
p02737
p02738
p02739
p02740
p02741
p02742
p02743
p02744
p02745
p02746
p02747
p02748
p02749
p02750
p02751
p02752
p02753
p02754
p02755
p02756
p02757
p02758
p02759
p02760
p02761
p02762
p02763
p02764
p02765
p02766
p02767
p02768
p02769
p02770
p02771
p02772
p02773
p02774
p02775
p02776
p02777
p02778
p02779
p02780
p02781
p02782
p02783
p02784
p02785
p02786
p02787
p02788
p02789
p02790
p02791
p02792
p02793
p02794
p02795
p02796
p02797
p02798
p02799
p02800
p02801
p02802
p02803
p02804
p02805
p02806
p02807
p02808
p02809
p02810
p02811
p02812
p02813
p02814
p02815
p02816
p02817
p02818
p02819
p02820
p02821
p02822
p02823
p02824
p02825
p02826
p02827
p02828
p02829
p02830
p02831
p02832
p02833
p02834
p02835
p02836
p02837
p02838
p02839
p02840
p02841
p02842
p02843
p02844
p02845
p02846
p02847
p02848
p02849
p02850
p02851
p02852
p02853
p02854
p02855
p02856
p02857
p02858
p02859
p02860
p02861
p02862
p02863
p02864
p02865
p02866
p02867
p02868
p02869
p02870
p02871
p02872
p02873
p02874
p02875
p02876
p02877
p02878
p02879
p02880
p02881
p02882
p02883
p02884
p02885
p02886
p02887
p02888
p02889
p02890
p02891
p02892
p02893
p02894
p02895
p02896
p02897
p02898
p02899
p02900
p02901
p02902
p02903
p02904
p02905
p02906
p02907
p02908
p02909
p02910
p02911
p02912
p02913
p02914
p02915
p02916
p02917
p02918
p02919
p02920
p02921
p02922
p02923
p02924
p02925
p02926
p02927
p02928
p02929
p02930
p02931
p02932
p02933
p02934
p02935
p02936
p02937
p02938
p02939
p02940
p02941
p02942
p02943
p02944
p02945
p02946
p02947
p02948
p02949
p02950
p02951
p02952
p02953
p02954
p02955
p02956
p02957
p02958
p02959
p02960
p02961
p02962
p02963
p02964
p02965
p02966
p02967
p02968
p02969
p02970
p02971
p02972
p02973
p02974
p02975
p02976
p02977
p02978
p02979
p02980
p02981
p02982
p02983
p02984
p02985
p02986
p02987
p02988
p02989
p02990
p02991
p02992
p02993
p02994
p02995
p02996
p02997
p02998
p02999