Synthetic Mention Corpora for Disease Entity Recognition and Normalization 1.0.0
(929 bytes)
# Synthetic Mention Corpora for Disease Entity Recognition and Normalization
## Data Access
Access on [PhysioNet](https://physionet.org/) searching for "Synthetic Mention Corpora for Disease Entity Recognition and Normalization"
## Data Description
### SYNTHETIC_MENTIONS.CSV
The corpus consists one one dataset: SYNTHETIC_MENTIONS.csv. The dataset is a comma separated (csv) file with two columns: "cui" and "matched_output".
"cui" column contains the UMLS CUI for disease highlighted in the synthetic mention
The "matched_output" column contains the synthetic mention with the disease name highlighted between two tags <1CUI> and </1CUI>.Â
### LICENSE.txt
License for project
### SHA256SUMS.txt
SHA256 checksums for those who want them
### dd.txt
Data Dictionary file for SYNTHETIC_MENTIONS.CSV describing the dataset
## Data Use
Can be loaded into software/program/script using any CSV loader like pandas for Python