:py:mod:`medcat.utils.preprocess_umls` ====================================== .. py:module:: medcat.utils.preprocess_umls Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: medcat.utils.preprocess_umls.UMLS Attributes ~~~~~~~~~~ .. autoapisummary:: medcat.utils.preprocess_umls._DEFAULT_COLUMNS medcat.utils.preprocess_umls._DEFAULT_SEM_TYPE_COLUMNS medcat.utils.preprocess_umls._DEFAULT_MRHIER_COLUMNS medcat.utils.preprocess_umls.medcat_csv_mapper medcat.utils.preprocess_umls.umls .. py:data:: _DEFAULT_COLUMNS :type: list :value: ['CUI', 'LAT', 'TS', 'LUI', 'STT', 'SUI', 'ISPREF', 'AUI', 'SAUI', 'SCUI', 'SDUI', 'SAB', 'TTY',... .. py:data:: _DEFAULT_SEM_TYPE_COLUMNS :type: list :value: ['CUI', 'TUI', 'STN', 'STY', 'ATUI', 'CVF'] .. py:data:: _DEFAULT_MRHIER_COLUMNS :type: list :value: ['CUI', 'AUI', 'CXN', 'PAUI', 'SAB', 'RELA', 'PTR', 'HCD', 'CVF'] .. py:data:: medcat_csv_mapper :type: dict .. py:class:: UMLS(main_file_name, sem_types_file, allow_languages = ['ENG'], sep = '|') Pre-process UMLS release files: :param main_file_name: Path to the main file name (probably MRCONSO.RRF) :type main_file_name: str :param sem_types_file: Path to the semantic types file name (probably MRSTY.RRF) :type sem_types_file: str :param allow_langugages: Languages to filter out. Defaults to just English (['ENG']). :type allow_langugages: list :param sep: The separator used within the files. Defaults to '|'. :type sep: str .. py:method:: __init__(main_file_name, sem_types_file, allow_languages = ['ENG'], sep = '|') .. py:method:: to_concept_df() Create a concept DataFrame. The default column names are expected. :Returns: **pd.DataFrame** -- The resulting DataFrame .. py:method:: map_umls2snomed() Map to SNOMED-CT. Currently, uses the SCUI column. At the time of writing, this is equal to the CODE column. But this may not be the case in the future. :Returns: **pd.DataFrame** -- Dataframe that contains the SCUI (source CUI) as well as the UMLS CUI for each applicable concept .. py:method:: map_umls2icd10() Map to ICD-10. Available SAB's that contain 'ICD10': - CCSR_ICD10CM - CCSR_ICD10CM (Clinical Classifications Software Refined for ICD-10-CM) - Synopsis - CCSR_ICD10PCS - CCSR_ICD10PCS (Clinical Classifications Software Refined for ICD-10-PCS) - Synopsis - DMDICD10 - DMDICD10 (ICD-10 German) - Statistics - ICD10AE - ICD10AE (ICD-10, American English Equivalents) - Synopsis - ICD10AMAE - ICD10AMAE (ICD-10, Australian Modification, Americanized English Equivalents) - Synopsis - ICD10AM - ICD10AM (ICD-10, Australian Modification) - Synopsis - ICD10DUT - ICD10DUT (ICD10, Dutch Translation) - Synopsis - ICD10PCS - ICD10PCS (ICD-10 Procedure Coding System) - Synopsis - ICD10 - ICD10 (International Classification of Diseases and Related Health Problems, Tenth Revision) - Synopsis - ICPC2ICD10DUT - ICPC2ICD10DUT (ICPC2-ICD10 Thesaurus, Dutch Translation) - Synopsis - ICPC2ICD10ENG - ICPC2ICD10ENG (ICPC2-ICD10 Thesaurus) - Synopsis - MTHICPC2ICD10AE - MTHICPC2ICD10AE (ICPC2E-ICD10 Thesaurus, American English Equivalents) - Synopsis Currently only using 'ICD10'. But others may be relevant as well. If one wants to use one of the other sources listed above, they would need to use the map_umls2source method. :Returns: **pd.DataFrame** -- DataFrame that has the ICD-10 codes .. py:method:: map_umls2source(sources) Allows mapping to an arbitrary :param sources: The source or sources to include. :type sources: Union[str, List[str]] :Returns: **pd.DataFrame** -- DataFrame that has the target source codes .. py:method:: get_pt2ch() Generates a parent to children dict. It goes through all the < # TODO The resulting dictionary maps a CUI to a list of CUIs that consider that CUI as their parent. PS: This expects the MRHIER.RRF file to also exist in the same folder as the MRCONSO.RRF file. :raises ValueError: If the MRHIER.RRF file wasn't found :Returns: **dict** -- The dictionary of parent CUI and their children. .. py:data:: umls