medcat.utils.preprocess_umls
Module Contents
Classes
Pre-process UMLS release files: |
Attributes
- medcat.utils.preprocess_umls._DEFAULT_COLUMNS: list = ['CUI', 'LAT', 'TS', 'LUI', 'STT', 'SUI', 'ISPREF', 'AUI', 'SAUI', 'SCUI', 'SDUI', 'SAB', 'TTY',...
- medcat.utils.preprocess_umls._DEFAULT_SEM_TYPE_COLUMNS: list = ['CUI', 'TUI', 'STN', 'STY', 'ATUI', 'CVF']
- medcat.utils.preprocess_umls._DEFAULT_MRHIER_COLUMNS: list = ['CUI', 'AUI', 'CXN', 'PAUI', 'SAB', 'RELA', 'PTR', 'HCD', 'CVF']
- medcat.utils.preprocess_umls.medcat_csv_mapper: dict
- class medcat.utils.preprocess_umls.UMLS(main_file_name, sem_types_file, allow_languages=['ENG'], sep='|')
Pre-process UMLS release files: :param main_file_name: Path to the main file name (probably MRCONSO.RRF) :type main_file_name: str :param sem_types_file: Path to the semantic types file name (probably MRSTY.RRF) :type sem_types_file: str :param allow_langugages: Languages to filter out. Defaults to just English ([‘ENG’]). :type allow_langugages: list :param sep: The separator used within the files. Defaults to ‘|’. :type sep: str
- Parameters:
main_file_name (str) –
sem_types_file (str) –
allow_languages (list) –
sep (str) –
- __init__(main_file_name, sem_types_file, allow_languages=['ENG'], sep='|')
- Parameters:
main_file_name (str) –
sem_types_file (str) –
allow_languages (list) –
sep (str) –
- to_concept_df()
Create a concept DataFrame. The default column names are expected.
- Returns:
pd.DataFrame – The resulting DataFrame
- Return type:
pandas.DataFrame
- map_umls2snomed()
Map to SNOMED-CT.
Currently, uses the SCUI column. At the time of writing, this is equal to the CODE column. But this may not be the case in the future.
- Returns:
pd.DataFrame – Dataframe that contains the SCUI (source CUI) as well as the UMLS CUI for each applicable concept
- Return type:
pandas.DataFrame
- map_umls2icd10()
Map to ICD-10.
- Available SAB’s that contain ‘ICD10’:
CCSR_ICD10CM - CCSR_ICD10CM (Clinical Classifications Software Refined for ICD-10-CM) - Synopsis
CCSR_ICD10PCS - CCSR_ICD10PCS (Clinical Classifications Software Refined for ICD-10-PCS) - Synopsis
DMDICD10 - DMDICD10 (ICD-10 German) - Statistics
ICD10AE - ICD10AE (ICD-10, American English Equivalents) - Synopsis
ICD10AMAE - ICD10AMAE (ICD-10, Australian Modification, Americanized English Equivalents) - Synopsis
ICD10AM - ICD10AM (ICD-10, Australian Modification) - Synopsis
ICD10DUT - ICD10DUT (ICD10, Dutch Translation) - Synopsis
ICD10PCS - ICD10PCS (ICD-10 Procedure Coding System) - Synopsis
ICD10 - ICD10 (International Classification of Diseases and Related Health Problems, Tenth Revision) - Synopsis
ICPC2ICD10DUT - ICPC2ICD10DUT (ICPC2-ICD10 Thesaurus, Dutch Translation) - Synopsis
ICPC2ICD10ENG - ICPC2ICD10ENG (ICPC2-ICD10 Thesaurus) - Synopsis
MTHICPC2ICD10AE - MTHICPC2ICD10AE (ICPC2E-ICD10 Thesaurus, American English Equivalents) - Synopsis
Currently only using ‘ICD10’. But others may be relevant as well.
If one wants to use one of the other sources listed above, they would need to use the map_umls2source method.
- Returns:
pd.DataFrame – DataFrame that has the ICD-10 codes
- Return type:
pandas.DataFrame
- map_umls2source(sources)
Allows mapping to an arbitrary
- Parameters:
sources (Union[str, List[str]]) – The source or sources to include.
- Returns:
pd.DataFrame – DataFrame that has the target source codes
- Return type:
pandas.DataFrame
- get_pt2ch()
Generates a parent to children dict.
It goes through all the < # TODO
The resulting dictionary maps a CUI to a list of CUIs that consider that CUI as their parent.
PS: This expects the MRHIER.RRF file to also exist in the same folder as the MRCONSO.RRF file.
- Raises:
ValueError – If the MRHIER.RRF file wasn’t found
- Returns:
dict – The dictionary of parent CUI and their children.
- Return type:
dict
- medcat.utils.preprocess_umls.umls