Database Restricted Access

TAME Pain: Trustworthy AssessMEnt of Pain from Speech and Audio for the Empowerment of Patients

Tu-Quyen Dao Eike Schneiders Jennifer Williams John Robert Bautista Tina Seabrooke Ganesh Vigneswaran Rishik Kolpekwar Ritwik Vashistha Arya Farahi

Published: Jan. 21, 2025. Version: 1.0.0


When using this resource, please cite: (show more options)
Dao, T., Schneiders, E., Williams, J., Bautista, J. R., Seabrooke, T., Vigneswaran, G., Kolpekwar, R., Vashistha, R., & Farahi, A. (2025). TAME Pain: Trustworthy AssessMEnt of Pain from Speech and Audio for the Empowerment of Patients (version 1.0.0). PhysioNet. https://doi.org/10.13026/20e2-1g10.

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

Precise pain assessment is essential for medical professionals to provide appropriate treatment. However, not every patient can verbalize their pain due to various reasons, such as speech disorders or language barriers. In these cases, medical practitioners must rely on non-verbal signs to determine the pain level. The TAME Pain project aims to pave the way for the development of reliable pain assessment tools through advanced audio analysis.

We aim to create a comprehensive dataset that captures acoustic signals to accurately predict pain levels. This dataset, approved by the University of Texas at Austin's institutional review board (IRB number: STUDY00004954), will enable the investigation of whether acoustic and non-acoustic signals extracted from healthy individuals subjected to pain can reliably indicate pain levels. We augment this dataset by annotating every single audio file, including every sentence spoken by the participant, with details such as background and foreground noise, speech errors, and non-speech vocal features. These annotations enable thorough audio analysis, facilitate pain studies, and aid in identifying both speech and non-speech pain cues. This dataset provides a resource for researchers and developers working on pain assessment technologies.

The data collection and creation efforts are based at the University of Texas at Austin, with collaborative input from the University of Nottingham and the University of Southampton in the UK.


Background

Pain is a symptom found across hundreds of pathological disorders, and it's the most common reason why medical attention is sought [1]. Therefore, its accurate assessment and timely treatment are critical, as studies have shown that pain states can induce neuroplastic changes, which could have adverse effects if mismanaged [2]. In the case of acute pain resulting from post-operative care, inadequate management resulting from inaccurate pain assessments can lead to chronic pain, neurochemical alterations, psychological changes, or drug abuse [3]. Avoiding these outcomes requires a mutual understanding between a patient and healthcare worker regarding a patient's pain level - a task often limited by conventional pain assessment methods.

Traditionally, pain is self-reported by the patient. Still, this subjective method often leads to discrepancies in patient care due to communication barriers posed by cultural differences, intellectual disability, and pathological conditions [4]. Seeking a more precise and inclusive method of pain assessment, other studies have identified notable features in behavioral indicators (e.g., facial expression [5] and infant cries [6]) and diagnostic tests (e.g., heart activity (ECG) [7], skin conductance (EDA) [8], and neuromuscular activity (EMG) [9]) under acute pain. However, with the recent increase in telemedicine use [10], these measures of pain are becoming impractical in remote settings. Thus, research has recently turned to using audio data for pain assessment, a non-invasive method applicable to non-verbal populations.

However, the implementation of audio data as an objective pain assessment tool is complicated by the lack of large, high-quality, and annotated datasets needed to develop the technology. Several other studies have demonstrated potential biomarkers in adult speech, but they have used small datasets that aren't publicly accessible for advancement in this field. Our dataset addresses this shortcoming, providing 7,044 audio files from 51 participants that correlate speech data with self-reported pain levels through the utilization of a Cold Pressor Task (CPT).


Methods

Participants:

Through flyers, emails, and snowball sampling, we invited individuals to complete an online screening survey to determine their participation eligibility. To qualify for the study, individuals were required to be between 18 and 35 years old, fluent in English, and have health insurance. Individuals who self-reported any of the following conditions were excluded to minimize CPT risk [11], including high blood pressure, heart or circulation problems, dysthymia, cardiovascular disorders, or a history of Raynaud's syndrome, fainting, seizures, frostbite, an open cut, sore, or bone fracture on or near either hand, neurological disorders, diabetes, epilepsy, or pregnancy. Contact and demographic information (age, gender, race/ethnicity) was collected from those who met these requirements, and a team member contacted them to schedule an appointment to participate in the study.

All participants were briefed on the details of the study, including study aims, procedures, and individual rights, including the right to discontinue the task at any point, before they signed a consent form. Furthermore, the blood pressure of each individual was measured at the scene of data collection to ensure participant safety for the CPT procedure. If the blood pressure reading didn't exceed 130/80 mmHg, participation was permitted. After completing the study, participants were compensated with a USD 25 gift card.

Data Collection:

Data collection was carried out in an enclosed room, approximately 10x10 feet, with a set-up [12] that included two plastic containers arranged side-by-side on a table. One container contained cold water (0-4°C) for the CPT to induce mild pain. The other container contained warm water (34-37°C) for the control condition, supplemented with hot water. Each container contained a digital water thermometer to maintain the target temperature range, a water pump to circulate the water, and a detachable separator to reserve space for the participant's hand. The conditions were identical apart from the water temperature. The setup also included a monitor for the participant to read sentences, and 1 meter was placed in front of them. The primary microphone, A Røde Wireless PRO close-talking lapel mic, was worn on a lanyard at a distance of approximately 10 inches from the speaker. A chair was provided for the participant to sit in during the study.

During the procedure, each participant would complete each experimental condition, warm (W) or cold (C), for each hand, left (L) or right (R), for a total of four tasks. First, participants were assigned to one of four groups to randomize the order of conditions: (1) LC-LW-RC-RW, (2) LW-LC-RW-RC, (3) RC-RW-LC-LW, and (4) RW-RC-LW-LC. Then, a baseline hand temperature was taken for each participant using an infrared thermometer. Each task required participants to submerge a hand in the water, palm facing up, while performing a speaking task prompted by the monitor. A condition lasted three minutes or until the participant's hand was voluntarily withdrawn. Following each CPT, participants placed one hand in warm water to return to their baseline hand temperature before continuing to the following condition. The monitor prompted the speaking task, which displayed one sentence at a time, taken from Harvard Sentences [13], in a randomized order for participants to read aloud. An intermediate pain statement, occurring on the sixth, then every fifth utterance thereafter, read, "On a scale from 1 to 10, the pain I feel right now is ___." and was used to monitor the participant's pain throughout the task continuously. Each utterance was recorded from the primary mic and saved as one audio .wav file, a 16-bit mono PCM 16 kHz. We collected a total of 7,044 audio files from the 51 participants.

Annotation Process:

The annotation process was done manually after being trimmed using voice activity detection (VAD). Pain ratings, extracted from pain statements, were assigned to each utterance by backward extrapolation for the previous batch of Harvard Sentences utterances. An adjacent pain rating was used if a pain statement was unavailable for that batch. Any pain ratings of 0 were revised to 1 to keep our pain scale 1-10 and were indicated as a revised pain level. Out of 7,044 audio files, five utterances remain without a pain label after this process. Detailed notes were made for audio files with an audible audio feature or labeling technicality. Then, each audio file was assigned an action label to indicate its quality broadly. A discrete scale of 0-4, 0 indicated a high-quality audio file virtually free from confounding features. In contrast, 4 indicated a low-quality audio file with a high potential to confound with data processing.

Our research process aligned with a Responsible Research and Innovation (RRI) approach based on the AREA (Anticipate, Reflect, Engage, Act) framework and was ethically approved by the Institutional Review Board of the University of Texas at Austin (IRB number: STUDY00004954).


Data Description

​​​​​​51 individuals completed the study. The TAME Pain Dataset encompasses a collection of 7,039 annotated utterances derived from 51 participants, totaling approximately 311 minutes of audio recordings. Each utterance within the dataset is labeled with a self-reported pain level on a 1-10 scale. These pain levels are further categorized into three distinct classifications: binary (No Pain vs. Pain), three-class (Mild, Moderate, Severe), and condition-based (Cold vs. Warm), facilitating diverse analytical approaches.

Accompanying the audio recordings are two primary metadata files: meta_audio.csv and meta_participant.csv. The meta_audio.csv file provides detailed information for each audio file, including participant identification, experimental condition, utterance number and ID, pain levels, audio duration, quality ratings (ranging from 0 to 4, with 0 denoting the highest quality), and notes on any annotations related to audio disturbances or errors. The meta_participant.csv file offers demographic insights, detailing participants' gender distribution (26 female, 22 male, and 3 non-binary), age range (average age of 21.33 years with a standard deviation of 4.18 years), and racial/ethnic backgrounds (5 Hispanic/Latino, 27 Asian, 1 Black or African American, 14 White, 4 Two or More Races). Additionally, this file records the completion status of each experimental condition.

To enhance the dataset's utility, a dedicated Labels folder contains seven distinct CSV files that categorize specific sources of noise recorded during data collection, such as external disturbances, speech errors, audio cut-outs, audible breaths, and instances where pain ratings were copied or missing. These annotations could be useful for researchers aiming to preprocess the data accurately and account for potential confounding factors. Descriptive statistics reveal that most utterances (4,658) maintain high audio quality (Action Label 0), while smaller proportions exhibit varying quality issues.

Four resources from this study are (1) audio recordings, (2) audio file summary information, (3) participant data, and (4) annotation of audio file data.

  1. Audio Recordings (mic1_trim_v1.zip): A .zip file consisting of 7,044 audio files in .wav format, organized into 51 folders corresponding to 51 participants. Each audio file is identified by PID.COND.UTTNUM.UTTID where PID identifies the participant, COND labels the condition, UTTNUM indicates the sequence of utterances for a particular condition, UTTID identifies each sentence from Harvard Sentences or pain rating if UTTID=99999.
  2. Audio metadata (meta_audio.csv): The audio metadata consists of 7,044 rows corresponding to 7,044 audio files and contains columns describing audio file identification, pain level, audio duration, action label, and notes for each row.
  3. Participant metadata (meta_participant.csv): The participant metadata consists of 51 rows corresponding to 51 participants and contains columns reporting PID, demographic information (gender, age, and race/ethnicity), folder size, number of files, total audio duration, and condition completion for each participant.
  4. Annotation of Audio Data (Labels/*.csv): A folder with seven .csv files exclusively compiling audio files that contain annotations in the notes column. This folder consists of seven .csv files that categorize these audio files to be defined for analysis: (1, External_Disturbances.csv) where a noise unrelated to the speaker’s vocalization was heard, (2, Speech_Errors_and_Disturbances.csv) where a vocalized mistake was made by the speaker, (3, Audio_Cut_Out.csv) where part(s) of the prompted sentence was cut out, (4, Audible_Breath.csv) Audible Breath, where an audible inhale/exhale was heard, (5, No_Pain_Rating_So_Copied.csv) where a pain statement was unavailable so the pain rating was taken from an adjacent pain statement, (6, No_Assigned_Sentence.csv) where the prompted sentence was not spoken, and (7, No_Pain_Rating.csv) which features the five audio files without a pain label. Each file consists of columns for audio file identification and notes. The first six files contain an action label column. Only External_Disturbances.csv contains a noise relation column to label annotations as a foreground or background disturbance.

Usage Notes

This database can potentially advance AI-driven pain assessment technologies as it can be analyzed for audio features to identify pain objectively. While data collection was performed in an audio-controlled environment, some audio files feature disturbances made externally or by the speaker. The raw dataset hasn't been cleaned for these disturbances, so we recommend using our action label scores for quality control when using our data for analysis. For example, audio files with an action label of 4 should be removed as they can confound analysis results. If a research application has a low tolerance for confounding audio features, audio file use should be limited to those with an action label of 0. Additionally, our annotation can be used to develop noise identification, classification, and removal algorithms.

Furthermore, our manual annotations can provide a detailed model training and validation resource. Although our annotation process featured collaborative input, a single author (TD) carried out the primary labeling tasks. This annotation process is also limited as the labeling author was not blinded to the pain rating or experimental conditions when performing the annotation process.

Conducting experiments in controlled, quiet environments facilitates the elimination of external confounding factors and provides a reliable foundation for studying the biomarkers of interest. While these environments do not replicate the acoustic conditions of real-world settings, noise data collected under such conditions can be augmented or combined with simulated noise profiles to reflect specific practical scenarios. Therefore, this dataset allows for the adaptation to diverse applications by systematically introducing noise while maintaining the integrity of the original data. Using controlled environments for initial data collection supports a structured framework for extending the applicability of the research across varied contexts without requiring extensive additional data collection.


Release Notes

Version 1.0.0: Initial release


Ethics

Our dataset, designed to advance the development of pain detection and classification models for decision support tools, was collected in accordance with approved ethical standards. The study protocol received approval from the Institutional Review Board of the University of Texas at Austin (IRB number: STUDY00004954). All participants provided written informed consent prior to participating in the study, explicitly agreeing to the public sharing of their data. Before data collection, each individual was thoroughly informed about the study's objectives, the types of data being collected, and the intention to make the data available in a public repository.

While the dataset aims to improve pain management, we acknowledge potential ethical challenges, including the risk of biased model outputs due to the relatively small and less diverse participant pool. We also recognize the possibility of misuse, leading to the development of harmful tools. Given the open-access nature of the dataset, we strongly urge adherence to ethical principles of non-maleficence (do no harm) and beneficence (do good). To mitigate these concerns, our research activities were guided by the Responsible Research and Innovation (RRI) approach, based on the AREA (Anticipate, Reflect, Engage, Act) framework. RRI workshops, facilitated by Liz Dowthwaite, informed our ethical application and project objectives, ensuring that our work aligns with ethical best practices.


Acknowledgements

This work was supported by the Good Systems, a research grand challenge at the University of Texas at Austin, Engineering and Physical Sciences Research Council [grant number EP/V00784X/1], and UKRI Trustworthy Autonomous Systems Hub, Responsible AI UK [grant number EP/Y009800/1]. The authors would like to thank Prof. Sarvapaili (Gopal) Ramchurn, Prof. Joel Fischer, Prof. Sharon Strover, Dr. Liz Dowthwaite, Dr. Rohan Chandra, and Dr. Anna-Maria Piskopani for their helpful feedback during this work.


Conflicts of Interest

Jennifer Williams reports a relationship with The Alan Turing Institute that includes consulting or advisory. Jennifer Williams reports a relationship with MyVoice AI that includes employment. Jennifer Williams has patent #US20230186896A1 pending to MyVoice AI Ltd. Jennifer Williams has patent #US20220405363A1 pending to MyVoice AI Ltd. In regard to prior employment at MyVoice AI Ltd, Jennifer Williams was previously employed part-time while the work in this manuscript was undertaken (ending in February 2024) and does not have any remaining interactions nor any restrictive covenants, but there are two pending patents on voice-based biometric identification pending where Jennifer Williams is listed as a lead inventor, and MyVoice AI Ltd is the assignee. Those two patents are not topically related to the work in this manuscript. The rest of the authors declare no competing interests.


References

  1. Wager TD. Managing pain. InCerebrum: the Dana Forum on Brain Science 2022 Mar (Vol. 2022). Dana Foundation.
  2. Petersen-Felix S, Curatolo M. Neuroplasticity-an important factor in acute and chronic pain. Swiss medical weekly. 2002 Jun 1;132(2122):273-8.
  3. Small C, Laycock HJ. Acute postoperative pain management. Journal of British Surgery. 2020 Jan;107(2):e70-80.
  4. Karcioglu O, Topacoglu H, Dikme O, Dikme O. A systematic review of the pain scales in adults: which to use?. The American journal of emergency medicine. 2018 Apr 1;36(4):707-14.
  5. Prkachin KM. Assessing pain by facial expression: facial expression as nexus. Pain Research and Management. 2009;14(1):53-8.
  6. Mittal VK. Discriminating features of infant cry acoustic signal for automated detection of cause of crying. In2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP) 2016 Oct 17 (pp. 1-5). IEEE.
  7. Meissner A, Gorgels AP, Wellens HJ. The value of the ECG for decision-making at first medical contact in the patient with acute chest pain. Netherlands Heart Journal. 2010 Jun;18:301-6.
  8. Kong Y, Posada-Quintero HF, Chon KH. Real-time high-level acute pain detection using a smartphone and a wrist-worn electrodermal activity sensor. Sensors. 2021 Jun 8;21(12):3956.
  9. Jiang M, Mieronkoski R, Syrjälä E, Anzanpour A, Terävä V, Rahmani AM, Salanterä S, Aantaa R, Hagelberg N, Liljeberg P. Acute pain intensity monitoring with the classification of multiple physiological parameters. Journal of clinical monitoring and computing. 2019 Jun 1;33:493-507.
  10. Colbert GB, Venegas-Vera AV, Lerma EV. Utility of telemedicine in the COVID-19 era. Reviews in cardiovascular medicine. 2020 Dec 30;21(4):583-7.
  11. McIntyre MH, 23andMe Research Team, Kless A, Hein P, Field M, Tung JY. Validity of the cold pressor test and pain sensitivity questionnaire via online self-administration. PLoS One. 2020 Apr 16;15(4):e0231697.
  12. Lakhsassi L, Borg C, Martusewicz S, van der Ploeg K, de Jong PJ. The influence of sexual arousal on subjective pain intensity during a cold pressor test in women. Plos one. 2022 Oct 5;17(10):e0274331.
  13. Rothauser EH. IEEE recommended practice for speech quality measurements. IEEE Transactions on Audio and Electroacoustics. 1969;17(3):225-46.

Share
Access

Access Policy:
Only registered users who sign the specified data use agreement can access the files.

License (for files):
PhysioNet Restricted Health Data License 1.5.0

Data Use Agreement:
PhysioNet Restricted Health Data Use Agreement 1.5.0

Discovery
Corresponding Author
You must be logged in to view the contact information.

Files