medcat.cdb_maker
Module Contents
Classes
Given a CSV as shown in https://github.com/CogStack/MedCAT/tree/master/examples/<example> it creates a CDB or |
Attributes
- medcat.cdb_maker.PH_REMOVE
- medcat.cdb_maker.logger
- class medcat.cdb_maker.CDBMaker(config, cdb=None)
Bases:
object
Given a CSV as shown in https://github.com/CogStack/MedCAT/tree/master/examples/<example> it creates a CDB or updates an exisitng one.
- Parameters:
config (medcat.config.Config) – Global config for MedCAT.
cdb (medcat.cdb.CDB) – If set the CDBMaker will updat the existing CDB with new concepts in the CSV (Default value None).
- __init__(config, cdb=None)
- Parameters:
config (medcat.config.Config) –
cdb (Optional[medcat.cdb.CDB]) –
- Return type:
None
- reset_cdb()
This will re-create a new internal CDB based on the same config.
This will be necessary if/when you’re wishing to call prepare_csvs multiple times on the same object CDBMaker instance.
- Return type:
None
- prepare_csvs(csv_paths, sep=',', encoding=None, escapechar=None, index_col=False, full_build=False, only_existing_cuis=False, **kwargs)
Compile one or multiple CSVs into a CDB.
- Note: This class/method generally uses the same instance of the CDB.
So if you’re using the same CDBMaker and calling prepare_csvs multiple times, you are likely to get leakage from prior calls into new ones. To reset the CDB, call reset_cdb.
- Parameters:
csv_paths (Union[pd.DataFrame, List[str]]) – An array of paths to the csv files that should be processed. Can also be an array of pd.DataFrames
sep (str) – If necessary a custom separator for the csv files (Default value ‘,’).
encoding (Optional[str]) – Encoding to be used for reading the CSV file (Default value None).
escapechar (Optional[str]) – Escape char for the CSV (Default value None).
index_col (bool) – Index column for pandas read_csv (Default value False).
full_build (bool) – If False only the core portions of the CDB will be built (the ones required for the functioning of MedCAT). If True, everything will be added to the CDB - this usually includes concept descriptions, various forms of names etc (take care that this option produces a much larger CDB) (Default value False).
only_existing_cuis (bool) – If True no new CUIs will be added, but only linked names will be extended. Mainly used when enriching names of a CDB (e.g. SNOMED with UMLS terms) (Default value False).
kwargs (Any) – Will be passed to pandas for CSV reading
- Return type:
Note
- **kwargs:
Will be passed to pandas for CSV reading
- csv:
Examples of the CSV used to make the CDB can be found on [GitHub](link)
- Returns:
CDB – CDB with the new concepts added.
- Parameters:
csv_paths (Union[pandas.DataFrame, List[str]]) –
sep (str) –
encoding (Optional[str]) –
escapechar (Optional[str]) –
index_col (bool) –
full_build (bool) –
only_existing_cuis (bool) –
kwargs (Any) –
- Return type:
- destroy_pipe()
- Return type:
None