medcat.utils.preprocess_snomed
Module Contents
Classes
Pre-process SNOMED CT release files. |
Functions
|
|
|
Retrieves all the children of a given SNOMED CT ID (SCTID) from a given parent-to-child mapping (pt2ch) via the "IS A" relationship. |
|
This method uses the output from Snomed.map_snomed2icd10 or |
- medcat.utils.preprocess_snomed.parse_file(filename, first_row_header=True, columns=None)
- medcat.utils.preprocess_snomed.get_all_children(sctid, pt2ch)
Retrieves all the children of a given SNOMED CT ID (SCTID) from a given parent-to-child mapping (pt2ch) via the “IS A” relationship. pt2ch can be found in a MedCAT model in the additional info via the call: cat.cdb.addl_info[‘pt2ch’]
- Parameters:
sctid (int) – The SCTID whose children need to be retrieved.
pt2ch (dict) – A dictionary containing the parent-to-child relationships in the form {parent_sctid: [list of child sctids]}.
- Returns:
list – A list of unique SCTIDs that are children of the given SCTID.
- medcat.utils.preprocess_snomed.get_direct_refset_mapping(in_dict)
This method uses the output from Snomed.map_snomed2icd10 or Snomed.map_snomed2opcs4 and removes the metadata and maps each SNOMED CUI to the prioritised list of the target ontology CUIs.
The input dict is expected to be in the following format: - Keys are SnomedCT CUIs - The values are lists of dictionaries, each list item (at least)
Has a key ‘code’ that specifies the target onotlogy CUI
Has a key ‘mapPriority’ that specifies the priority
- Parameters:
in_dict (dict) – The input dict.
- Returns:
dict – The map from Snomed CUI to list of priorities list of target ontology CUIs.
- Return type:
dict
- class medcat.utils.preprocess_snomed.Snomed(data_path, uk_ext=False, uk_drug_ext=False, au_ext=False)
Pre-process SNOMED CT release files.
This class is used to create a SNOMED CT concept DataFrame ready for MedCAT CDB creation.
- Parameters:
au_ext (bool) –
- data_path
Path to the unzipped SNOMED CT folder.
- Type:
str
- release
Release of SNOMED CT folder.
- Type:
str
- uk_ext
Specifies whether the version is a SNOMED UK extension released after 2021. Defaults to False.
- Type:
bool, optional
- uk_drug_ext
Specifies whether the version is a SNOMED UK drug extension. Defaults to False.
- Type:
bool, optional
- au_ext
Specifies wether the version is a AU release. Defaults to False.
- Type:
bool, optional
- __init__(data_path, uk_ext=False, uk_drug_ext=False, au_ext=False)
- Parameters:
au_ext (bool) –
- to_concept_df()
Create a SNOMED CT concept DataFrame.
Creates a SNOMED CT concept DataFrame ready for MEDCAT CDB creation. Checks if the version is a UK extension release and sets the correct file names for the concept and description snapshots accordingly. Additionally, handles the divergent release format of the UK Drug Extension >v2021 with the uk_drug_ext variable.
- Returns:
pandas.DataFrame – SNOMED CT concept DataFrame.
- list_all_relationships()
List all SNOMED CT relationships.
SNOMED CT provides a rich set of inter-relationships between concepts.
- Returns:
list – List of all SNOMED CT relationships.
- relationship2json(relationshipcode, output_jsonfile)
Convert a single relationship map structure to JSON file.
- Parameters:
relationshipcode (str) – A single SCTID or unique concept identifier of the relationship type.
output_jsonfile (str) – Name of JSON file output.
- Returns:
file – JSON file of relationship mapping.
- map_snomed2icd10()
This function maps SNOMED CT concepts to ICD-10 codes using the refset mappings provided in the SNOMED CT release package.
- Returns:
dict – A dictionary containing the SNOMED CT to ICD-10 mappings including metadata.
- map_snomed2opcs4()
This function maps SNOMED CT concepts to OPCS-4 codes using the refset mappings provided in the SNOMED CT release package.
Then it calls the internal function _map_snomed2refset() to get the DataFrame containing the OPCS-4 mappings. The function then converts the DataFrame to a dictionary using the internal function _refset_df2dict()
- Raises:
AttributeError – If OPCS-4 mappings aren’t available.
- Returns:
dict – A dictionary containing the SNOMED CT to OPCS-4 mappings including metadata.
- Return type:
dict
- _check_path_and_release()
This function checks the path and release of the SNOMED CT data provided. It looks for the “Snapshot” folder within the data path, and if it’s not found, it looks for any folder containing the name “SnomedCT”. It then stores the path and release in separate lists. If no valid paths are found, it raises a FileNotFoundError.
- Returns:
tuple – a tuple containing two lists, the first one is a list of the paths where the data is located and the second is a list of the releases of the data.
- Raises:
FileNotFoundError – If the path to the SNOMED CT directory is incorrect.
- _refset_df2dict(refset_df)
This function takes a SNOMED refset DataFrame as an input and converts it into a dictionary. The DataFrame should contain the columns ‘referencedComponentId’,’mapTarget’,’mapGroup’,’mapPriority’,’mapRule’,’mapAdvice’.
- Parameters:
refset_df (pd.DataFrame) – DataFrame containing the refset data
- Returns:
dict – mapping from SNOMED CT codes as key and the refset metadata list of dictionaries as values.
- Return type:
dict
- _map_snomed2refset()
Maps SNOMED CT concepts to refset mappings provided in the SNOMED CT release package.
This function maps SNOMED CT concepts using the refset mappings in the Snapshot/Refset/Map directory. The refset mappings can either be ICD-10 codes in international releases or OPCS4 codes for SNOMED UK_extension, if available.
- Returns:
pd.DataFrame – Dataframe containing SNOMED CT to refset mappings and metadata.
OR
tuple – Tuple of dataframes containing SNOMED CT to refset mappings and metadata (ICD-10, OPCS4), if uk_ext is True.