Resources


Database Credentialed Access

PIFIR: PET-CT Invasive Fungal Infection Reports

Vlada Rozova, Anna Khanina, Jeremy Ong, Ramin Alipour, Leon Worth, Monica Slavin, Karin Thursky, Karin Verspoor

A corpus of PET-CT reports annotated for terminology relevant to fungal infections. Ideal for validation of named entity recognition and relation extraction methods.

nlp clinical documentation information extraction invasive fungal infections

Published: Feb. 27, 2025. Version: 1.0.0


Database Open Access

Synthetic Mention Corpora for Disease Entity Recognition and Normalization

Kuleen Sasse, John David Osborne

We present the Synthetic Mention Corpora for Disease Entity Recognition and Normalization, containing 128000 disease mentions from the UMLS disorder group, generated by an LLM. This corpus aims to improve these tasks in biomedical and clinical texts.

nlp named entity recognition machine learning data augmentation entity normalization

Published: Feb. 3, 2025. Version: 1.0.0


Database Credentialed Access

CHIFIR: Cytology and Histopathology Invasive Fungal Infection Reports

Vlada Rozova, Anna Khanina, Jasmine Teng, Joanne Teh, Leon Worth, Monica Slavin, karin thursky, Karin Verspoor

A corpus of cytology and histopathology reports annotated for terminology relevant to fungal infections. Ideal for validation of named entity recognition and relation extraction methods.

nlp clinical documentation information extraction invasive fungal infections

Published: Feb. 20, 2024. Version: 1.0.2


Database Credentialed Access

MIMIC-IV-Ext-BHC: Labeled Clinical Notes Dataset for Hospital Course Summarization

Asad Aali, Dave Van Veen, Yamin Arefeen, Jason Hom, Christian Bluethgen, Eduardo Pontes Reis, Sergios Gatidis, Namuun Clifford, Joseph Daws, Arash Tehrani, Jangwon Kim, Akshay Chaudhari

This dataset presents a collection of preprocessed and labeled clinical notes derived from "MIMIC-IV-Note", and aims to facilitate the development of ML models focused on summarizing brief hospital courses (BHC) from clinical notes.

natural language processing clinical notes brief hospital course machine learning text summarization

Published: Feb. 3, 2025. Version: 1.2.0


Database Credentialed Access

MIMIC-IV-Ext-BHC: Labeled Clinical Notes Dataset for Hospital Course Summarization

Asad Aali, Dave Van Veen, Yamin Arefeen, Jason Hom, Christian Bluethgen, Eduardo Pontes Reis, Sergios Gatidis, Namuun Clifford, Joseph Daws, Arash Tehrani, Jangwon Kim, Akshay Chaudhari

This dataset presents a collection of preprocessed and labeled clinical notes derived from "MIMIC-IV-Note", and aims to facilitate the development of ML models focused on summarizing brief hospital courses (BHC) from clinical notes.

natural language processing clinical notes brief hospital course machine learning text summarization

Published: Feb. 3, 2025. Version: 1.2.0


Database Restricted Access

Gout Emergency Department Chief Complaint Corpora

John David Osborne, Tobias O'Leary, Amy Mudano, James Booth, Giovanna Rosas, Gurusai Sujitha Peramsetty, Anthony Knighton, Jeff Foster, Ken Saag, Maria Ioana Danila

A corpus of chief complaints tagged with predicted gout flare status and chart reviewed gout flare status. Ideal for input to masked language model training to supplement lengthy clinical text notes.

gout emergency department nlp

Published: Oct. 19, 2020. Version: 1.0


Database Restricted Access

Application of Med-PaLM 2 in the refinement of MIMIC-CXR labels

Kendall Park, Rory Sayres, Andrew Sellergren, Tom Pollard, Fayaz Jamil, Timo Kohlberger, Charles Lau, Atilla Kiraly

This work further refines the labels associated with CheXpert in MIMIC-CXR-JPG 2.0.0 by filtering with Med-PaLM 2 followed by verification by manual review by three US board-certified radiologists.

mimic-cxr labels

Published: Feb. 4, 2025. Version: 1.0.0


Model Credentialed Access

Characterization of Stigmatizing Language in Medical Records

Keith Harrigian, Ayah Zirikly, Brant Chee, Alya Ahmad, Anne Links, Somnath Saha, Mary Catherine Beach, Mark Dredze

A suite of classifiers for detecting three types of stigmatizing language in electronic medical records. Trained on MIMIC-IV discharge notes.

clinical natural language processing domain transfer bias stigmatizing language large language models mimic

Published: Nov. 6, 2023. Version: 1.0.0


Model Credentialed Access

Characterization of Stigmatizing Language in Medical Records

Keith Harrigian, Ayah Zirikly, Brant Chee, Alya Ahmad, Anne Links, Somnath Saha, Mary Catherine Beach, Mark Dredze

A suite of classifiers for detecting three types of stigmatizing language in electronic medical records. Trained on MIMIC-IV discharge notes.

clinical natural language processing domain transfer bias stigmatizing language large language models mimic

Published: Nov. 6, 2023. Version: 1.0.0


Database Credentialed Access

RuMedNLI: A Russian Natural Language Inference Dataset For The Clinical Domain

Pavel Blinov, Aleksandr Nesterov, Galina Zubkova, Arina Reshetnikova, Vladimir Kokh, Chaitanya Shivade

RuMedNLI is the full counterpart dataset of MedNLI in Russian language.

natural language inference recognizing textual entailment russian language

Published: April 1, 2022. Version: 1.0.0