Database Open Access

PADS - Parkinsons Disease Smartwatch dataset

Julian Varghese Alexander Brenner Lucas Plagwitz Catharina van Alen Michael Fujarski Tobias Warnecke

Published: March 25, 2024. Version: 1.0.0


When using this resource, please cite: (show more options)
Varghese, J., Brenner, A., Plagwitz, L., van Alen, C., Fujarski, M., & Warnecke, T. (2024). PADS - Parkinsons Disease Smartwatch dataset (version 1.0.0). PhysioNet. https://doi.org/10.13026/m0w9-zx22.

Additionally, please cite the original publication:

Varghese, J., Brenner, A., Fujarski, M., van Alen, C.M., Plagwitz, L., & Warnecke, T. (2024). Machine Learning in the Parkinson's disease smartwatch (PADS) dataset. npj Parkinsons Dis. 10, 9.

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

Parkinson’s disease (PD) is the second-most common neurodegenerative disorder, while incidence and worldwide burden are further increasing. In the era of digital health transformation, smart devices and mobile sensors, including smartphones and smartwatches, can provide an affordable source to capture and analyse digital objective biomarkers. These can aid in early diagnosis and studying phenotypical characteristics. The Parkinson’s Disease Smartwatch (PADS) dataset comprises clinical assessments of a broad spectrum of PD patients, other similar movement disorders and healthy controls. The assessments were recorded using a smart-device-based system consisting of two smartwatches and one smartphone. The two smartwatches were worn on each of the patient’s wrists and their sensors synchronously recorded 11 interactive movement tasks that were designed by expert neurologists to provoke subtle changes in movement pathologies. In total, 5159 measurement steps of 469 individuals were captured. The PADS dataset includes all acceleration and rotation sensor signals, as well as details on movement steps, demographics, medical history, and PD-specific non-motor symptoms. We believe that our extensively annotated dataset provides a well-suited data foundation for the training, validation, and optimisation of future technology- and sensor-driven systems for movement disorders.


Background

The dataset was created in the context of a cross-sectional prospective study that was conducted from the year 2018 until 2021 to research digital biomarkers of Parkinson’s disease (PD). A custom application running on smart consumer devices and integrating a centralised database, which we shortly refer to as smart device system (SDS), was used to record interactive neurological assessments (see original study design [1]). All sessions took place at the outpatient clinic of movement disorders at the University Hospital Münster in Germany. Three participant groups were recorded: 1) PD patients, 2) differential diagnoses (DD), including essential tremor, atypical Parkinsonism, secondary causes of Parkinsonism, and multiple sclerosis, and 3) healthy controls (HC). Guided by a study nurse, each subject completed a self-reported digital questionnaire and performed active movement tasks. These have the potential benefit of provoking subtle movement pathologies, such as tremor induced by cognitive stress. Annotations on symptoms, demographics and medical history complement the records.


Methods

Data acquisition consisted of two parts: 1) self-completion of the electronic questionnaire and 2) active movement-based assessment, all guided by the central app on the smartphone. The questionnaire yields information about age, height, weight, gender, kinship with PD, and effect of alcohol on tremor. Additionally, 30 yes/no answers about PD specific non-motor symptoms (PDNMS) based on the PDNMS questionnaire from the International Parkinson and Movement Disorder Society [2] are included (Table 1). The movement-based assessment steps were performed by the participants while seated in an armchair and with two wrist-worn smartwatches (Apple Watch Series 4). Each participant conducted 11 different movement tasks of 10 to 20 seconds length (Table 2). During the executions, the smartwatch sensors recorded acceleration and rotation signals. Before each recording, the assisting study nurses checked whether the tasks had been understood correctly. Board-certified neurologists confirmed all diagnoses that are based on ICD-10 codes. Correctness of the labels and raw data were controlled throughout the study. Cases with an uncertain class assignment were corrected and erroneous records were removed. Personal data was pseudonymised: all subjects were randomly assigned a unique identifier. All date-time variables in the smartwatch-records were shifted to start from 0.

Table 1: Non-motor symptoms questions categorised.

Category

Question number

Symptom

Gastrointestinal tract

1

Dribbling

3

Swallowing

4

Vomiting

5

Constipation

6

Bowel inconsistence

7

Bowel emptying incomplete

Urinal tract

8

Urgency

9

Nocturia

Pain

10

Pains

Miscellaneous

11

Weight

28

Sweating

29

Diplopia

Apathy/attention/memory

12

Remembering

13

Loss of interest

15

Concentrating

Distortion of perception

2

Taste/smelling

14

Hallucinations

30

Delusions

Depression/anxiety

16

Sad, blues

17

Anxiety

Sexual function

18

Sex drive

19

Sex difficulty

Cardiovascular

20

Dizzy

21

Falling

27

Swelling

Sleep/fatigue

22

Daytime sleepiness

23

Insomnia

24

Intense vivid dreams

25

Acting out during dreams

26

Restless legs

Table 2: Smartwatch-based assessment steps. Duration in seconds.

Steps

Durations

Description

Task category

1a

20

Resting with closed eyes while sitting, positioning standardised to Zhang et al. [3]

Resting

1b

20

Resting while patient is calculating serial sevens.

Resting

2

10

Lift and extend arms according to Zhang et al. [3]

Postural

3

10

Remain arms lifted.

Postural

4

10

Hold one-kilogram weight in each hand for 5 s. Start with the right hand. Then, have the arm rested again as in 1a.

Postural

5

10

Point index finger to the examiners lifted hand. Start with right index, then left Repeat the movement.

Kinetic

6

10

Drink from glass. Grasp an empty glass as if drinking from it. Start with the right hand. Then repeat with the left hand

Kinetic

7

10

Cross and extend both arms.

Kinetic

8

10

Bring both index fingers to each other.

 

9

10

Tap own nose with index finger. Start with the right, then with left index. Then extend the arms.

Kinetic

10

20

Entrainment. The examiner stomps on the ground, setting the pace. Start stomping with the right foot according to the pace. Leave the arms extended during the movement. Repeat this with the left foot.

Postural


Data Description

The dataset holds the following folders and files:

pads_dataset
├── movement
│   ├── observation_001.json
│   ├── observation_002.json
│   ├── ...
│   ├── observation_469.json
│   ├── timeseries
│   │   ├── 001_CrossArms_LeftWrist.txt
│   │   ├── 001_CrossArms_RightWrist.txt
│   │   ├── ...
│   │   ├── 469_TouchNose_RightWrist.txt
├── questionnaire
│   ├── questionnaire_response_001.json
│   ├── questionnaire_response_002.json
│   ├── ...
│   ├── questionnaire_response_469.json
├── patients
│   ├── patient_001.json
│   ├── patient_002.json
│   ├── ...
│   ├── patient_469.json
├── scripts
│   ├── run_preprocessing.py
│   ├── ...
├── preprocessed
│   ├── movement
│   │   ├── ...
│   ├── questionnaire
│   │   ├── ...
│   ├── file_list.csv
│   │   ├── ...

The dataset comprises 469 individual participants that are numbered from 1 to 469, assigning each sample a unique identifier. The two data modalities recorded in our study (questionnaire and movement data) are organised in separate folders. One JSON file is stored per id and modality.
movement/observation_001.json: The file holds all relevant meta information and links to the smartwatch records of the movement tasks for participant 001. The format is oriented at the proposed Time Series Data Format (TSDF) from Claes et al. [4]. The smartwatches recorded acceleration (in g) and rotation (in rad/s) data with a sampling rate of 100 Hz. The channel encodes the sensor and the axis. All individual records are listed under "session" and reference the .txt file that holds the time series data. For each record, the entry "rows" defines how many sample points are stored in the referenced .txt file. Further, each recorded channel and it's corresponding unit are named. An excerpt of the JSON file can be seen below:

{
  "resource_type": "observation",
  "subject_id": "001",
  "study_id": "PADS",
  "device_id": "Apple Watch Series 4",
  "id": "Neurological Assessment",
  "endianness": "little",
  "sampling_rate": 100,
  "data_type": "float",
  "bits": 32,
  "session": [
    {
      "record_name": "Relaxed",
      "rows": 2048,
      "records": [
        {
          "device_location": "LeftWrist",
          "channels": [
            "Accelerometer_X",
            "Accelerometer_Y",
            "Accelerometer_Z",
            "Gyroscope_X",
            "Gyroscope_Y",
            "Gyroscope_Z"
          ],
          "units": [
            "g",
            "g",
            "g",
            "rad/s",
            "rad/s",
            "rad/s"
          ],
          "file_name": "bins/001_Relaxed_LeftWrist.bin"
        },
        {
          "device_location": "RightWrist",
          "channels": [
            "Accelerometer_X",
            "Accelerometer_Y",
            "Accelerometer_Z",
            "Gyroscope_X",
            "Gyroscope_Y",
            "Gyroscope_Z"
          ],
          "units": [
            "g",
            "g",
            "g",
            "rad/s",
            "rad/s",
            "rad/s"
          ],
          "file_name": "bins/001_Relaxed_RightWrist.bin"
        }
      ]
    },
...

movement/timeseries/…: The folder contains all text files that hold the actual smartwatch records. Each column encodes a signal record and represents a certain channel, the signal values are stored row-wise. The values are stored as comma seperated values ​​with a precision of 10 decimal places. Each file stores 1024 data points that are 10.24 seconds of time series data. Some assessments were recorded for 20.48 seconds. For simpler processing, they can be cut into two equal parts to fit all records into one 2D matrix format. The names can be extended with suffixes "1" and "2" respectively and the data can then be treated like independent assessment steps.

questionnaire/questionnaire_response_001.json: The file holds the questionnaire data including all PDNMS answers. For each question, the question text and the answer are stored in the list "item". The file is structured as follows:

{
  "resource_type": "questionnaire_response",
  "subject_id": "001",
  "study_id": "PADS",
  "id": "Non-motor Symptoms",
  "questionnaire_name": "NMS",
  "item": [
    {
      "link_id": "01",
      "text": "Dribbling of saliva during the daytime",
      "answer": false
    },
    {
      "link_id": "02",
      "text": "Loss or change in your ability to taste or smell",
      "answer": false
    },
...

patients/patient_001.json: The file holds all relevant meta data of the patient, including age, height, weight and gender information:

{
  "resource_type": "patient",
  "id": "001",
  "study_id": "PADS",
  "condition": "Healthy",
  "disease_comment": "-",
  "age_at_diagnosis": 56,
  "age": 56,
  "height": 173,
  "weight": 78,
  "gender": "male",
  "handedness": "right",
  "appearance_in_kinship": true,
  "appearance_in_first_grade_kinship": true,
  "effect_of_alcohol_on_tremor": "Unknown"
}

scripts/…: The folder holds exemplary code that demonstrates how data can be accessed using Python.

scripts/run_preprocessing.py: The script extracts a processed version of the dataset that can directly be used for machine learning applications. The pre-processed data is used in our machine learning evaluation, see the git repository [5]. Since the process requires a relatively large amount of computing time, the pre-computed files are already contained in the "preprocessed" directory.

preprocessed/…: Contains the pre-processed files and a simple filelist in csv format that gives an overview of all samples.

preprocessed/movement/…: Each binary file holds all movement-based records of the respective subject. Since some assessments were recorded for 20.48 seconds, they were cut into two equal parts (10.24 seconds) to fit into one 2D matrix format. The names were extended with suffixes "1" and "2" respectively and are treated like independent assessment steps. Channel names are generated by combining elements from the following sets:

  • task = ["Relaxed1", "Relaxed2", "RelaxedTask1", "RelaxedTask2", "StretchHold", "HoldWeight", "DrinkGlas", "CrossArms", "TouchNose", "Entrainment1", "Entrainment2"]
  • wrist = ["Left", "Right"]
  • sensor = ["Accelerometer", "Gyroscope"]
  • axis = ["X", "Y", "Z"]

Channels are referred to by the task name, following the wrist, sensor and axis name in this order. The channel name is then composed by joining the descriptors with an underscore. Thus, the first 15 channel names are given as follows:

  1. "Relaxed1_Left_Acceleration_X": Record of the first task "Relaxed" (first half) from the left wrist, signal data from the acceleration sensor (x axis).
  2. "Relaxed1_Left_Acceleration_Y"': Record of the first task "Relaxed" (first half) from the left wrist, signal data from the acceleration sensor (y axis).
  3. "Relaxed1_Left_Acceleration_Z": Record of the first task "Relaxed" (first half) from the left wrist, signal data from the acceleration sensor (z axis).
  4. "Relaxed1_Right_Acceleration_X": Record of the first task "Relaxed" (first half) from the right wrist, signal data from the acceleration sensor (x axis).
  5. "Relaxed1_Right_Acceleration_Y": Record of the first task "Relaxed" (first half) from the right wrist, signal data from the acceleration sensor (y axis).
  6. "Relaxed1_Right_Acceleration_Z": Record of the first task "Relaxed" (first half) from the right wrist, signal data from the acceleration sensor (z axis).
  7. "Relaxed1_Left_Rotation_X": Record of the first task "Relaxed" (first half) from the left wrist, signal data from the rotation sensor (x axis).
  8. "Relaxed1_Left_Rotation_Y": Record of the first task "Relaxed" (first half) from the left wrist, signal data from the rotation sensor (y axis).
  9. "Relaxed1_Left_Rotation_Z": Record of the first task "Relaxed" (first half) from the left wrist, signal data from the rotation sensor (z axis).
  10. "Relaxed1_Right_Rotation_X": Record of the first task "Relaxed" (first half) from the right wrist, signal data from the rotation sensor (x axis).
  11. "Relaxed1_Right_Rotation_Y": Record of the first task "Relaxed" (first half) from the right wrist, signal data from the rotation sensor (y axis).
  12. "Relaxed1_Right_Rotation_Z": Record of the first task "Relaxed" (first half) from the right wrist, signal data from the rotation sensor (z axis).
  13. "Relaxed2_Left_Acceleration_X": Record of the first task "Relaxed" (second half) from the left wrist, signal data from the acceleration sensor (x axis).
  14. "Relaxed2_Left_Acceleration_Y": Record of the first task "Relaxed" (second half) from the left wrist, signal data from the acceleration sensor (y axis).
  15. "Relaxed2_Left_Acceleration_Z: Record of the first task "Relaxed" (second half) from the left wrist, signal data from the acceleration sensor (z axis).

Usage Notes

This dataset contains simultaneous two-handed smartwatch records of active assessments designed by neurologists to trigger tremor characteristics. Assessment steps were performed consecutively in the order they are listed above (Table 1). At the start of every recording, the smartwatches gave a vibration notification. Therefore, we recommend to cut out approximately the first 0.5 seconds per time series.
One drawback of the dataset is the imbalanced class distribution. For the application and evaluation of machine learning, we therefore recommend using class-balancing mechanics and to measure performance in terms of balanced accuracy. Although the dataset is balanced in terms of age distribution, there are still demographic differences in the study data. Since the cohort consists mainly of routine patient visits and concomitant persons, gender distribution is influenced by prevalence, which is known to be higher in men for PD. To account for this imbalance for e.g. machine learning evaluation, we propose the extraction of a stratified subset that further is matched by gender.
We used the pre-processed data for machine learning, see our git repository [5] that holds all relevant code. To ensure comparability of machine learning algorithms trained on the dataset, we provide recommended splits for training and test sets (via 5-fold cross-validation).


Ethics

The study (ClinicalTrials.gov ID: NCT03638479) was approved by the ethical board of the University of Münster and the physician’s chamber of Westphalia-Lippe (Reference number: 2018-328-f-S).


Acknowledgements

The work was funded by the Innovative Medical Research Fund (Innovative Medizinische Forschung, I-VA111809) of the University of Münster. We thank the Department of Neurology at the University Hospital Münster for integrating the study.


Conflicts of Interest

The authors have no conflicts of interest to declare.


References

  1. Varghese, J., Niewöhner, S., Soto-Rey, I., Schipmann-Miletić, S., Warneke, N., Warnecke, T., & Dugas, M. (2019). A smart device system to identify new phenotypical characteristics in movement disorders. Frontiers in Neurology, 10, 48.
  2. Chaudhuri, K. R., Martinez-Martin, P., Schapira, A. H., Stocchi, F., Sethi, K., Odin, P., Brown, R. G., Koller, W., Barone, P., MacPhee, G., & others. (2006). International multicenter pilot study of the first comprehensive self-completed nonmotor symptoms questionnaire for Parkinson’s disease: The NMSQuest study. Movement Disorders: Official Journal of the Movement Disorder Society, 21(7), 916–923.
  3. Zhang, B., Huang, F., Liu, J., & Zhang, D. (2018). A novel posture for better differentiation between Parkinson’s tremor and essential tremor. Frontiers in Neuroscience, 12, 317.
  4. Claes, K., Ticcinelli, V., Badawy, R., Raykov, Y. P., Evers, L. J. W., & Little, M. A. (2022). TSDF: A simple yet comprehensive, unified data storage and exchange format standard for digital biosensor data in health applications (arXiv:2211.11294). arXiv. https://doi.org/10.48550/arXiv.2211.11294
  5. PADS Project Git reository. Brenner, A. Available from: https://imigitlab.uni-muenster.de/published/pads-project [Accessed 8th February 2024]

Share
Access

Access Policy:
Anyone can access the files, as long as they conform to the terms of the specified license.

License (for files):
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License

Corresponding Author
You must be logged in to view the contact information.

Files

Total uncompressed size: 1.4 GB.

Access the files
Folder Navigation: <base>/scripts/utils
Name Size Modified
Parent Directory
__init__.py (download) 0 B 2023-12-12
constants.py (download) 123 B 2023-12-12
data_handling.py (download) 3.2 KB 2024-03-12
dict_handling.py (download) 1.1 KB 2024-03-12
l1_trend_filter.py (download) 1.1 KB 2023-12-12