# Synthetic Mention Corpora for Disease Entity Recognition and Normalization ## Data Access Access on [PhysioNet](https://physionet.org/) searching for "Synthetic Mention Corpora for Disease Entity Recognition and Normalization" ## Data Description ### SYNTHETIC_MENTIONS.CSV The corpus consists one one dataset: SYNTHETIC_MENTIONS.csv. The dataset is a comma separated (csv) file with two columns: "cui" and "matched_output". "cui" column contains the UMLS CUI for disease highlighted in the synthetic mention The "matched_output" column contains the synthetic mention with the disease name highlighted between two tags <1CUI> and .  ### LICENSE.txt License for project ### SHA256SUMS.txt SHA256 checksums for those who want them ### dd.txt Data Dictionary file for SYNTHETIC_MENTIONS.CSV describing the dataset ## Data Use Can be loaded into software/program/script using any CSV loader like pandas for Python