medcat.cdb_maker

Module Contents

Classes

CDBMaker

Given a CSV as shown in https://github.com/CogStack/MedCAT/tree/master/examples/<example> it creates a CDB or

Attributes

PH_REMOVE

logger

medcat.cdb_maker.PH_REMOVE
medcat.cdb_maker.logger
class medcat.cdb_maker.CDBMaker(config, cdb=None)

Bases: object

Given a CSV as shown in https://github.com/CogStack/MedCAT/tree/master/examples/<example> it creates a CDB or updates an exisitng one.

Parameters:
  • config (medcat.config.Config) – Global config for MedCAT.

  • cdb (medcat.cdb.CDB) – If set the CDBMaker will updat the existing CDB with new concepts in the CSV (Default value None).

__init__(config, cdb=None)
Parameters:
Return type:

None

reset_cdb()

This will re-create a new internal CDB based on the same config.

This will be necessary if/when you’re wishing to call prepare_csvs multiple times on the same object CDBMaker instance.

Return type:

None

prepare_csvs(csv_paths, sep=',', encoding=None, escapechar=None, index_col=False, full_build=False, only_existing_cuis=False, **kwargs)

Compile one or multiple CSVs into a CDB.

Note: This class/method generally uses the same instance of the CDB.

So if you’re using the same CDBMaker and calling prepare_csvs multiple times, you are likely to get leakage from prior calls into new ones. To reset the CDB, call reset_cdb.

Parameters:
  • csv_paths (Union[pd.DataFrame, List[str]]) – An array of paths to the csv files that should be processed. Can also be an array of pd.DataFrames

  • sep (str) – If necessary a custom separator for the csv files (Default value ‘,’).

  • encoding (Optional[str]) – Encoding to be used for reading the CSV file (Default value None).

  • escapechar (Optional[str]) – Escape char for the CSV (Default value None).

  • index_col (bool) – Index column for pandas read_csv (Default value False).

  • full_build (bool) – If False only the core portions of the CDB will be built (the ones required for the functioning of MedCAT). If True, everything will be added to the CDB - this usually includes concept descriptions, various forms of names etc (take care that this option produces a much larger CDB) (Default value False).

  • only_existing_cuis (bool) – If True no new CUIs will be added, but only linked names will be extended. Mainly used when enriching names of a CDB (e.g. SNOMED with UMLS terms) (Default value False).

  • kwargs (Any) – Will be passed to pandas for CSV reading

Return type:

medcat.cdb.CDB

Note

**kwargs:

Will be passed to pandas for CSV reading

csv:

Examples of the CSV used to make the CDB can be found on [GitHub](link)

Returns:

CDB – CDB with the new concepts added.

Parameters:
  • csv_paths (Union[pandas.DataFrame, List[str]]) –

  • sep (str) –

  • encoding (Optional[str]) –

  • escapechar (Optional[str]) –

  • index_col (bool) –

  • full_build (bool) –

  • only_existing_cuis (bool) –

  • kwargs (Any) –

Return type:

medcat.cdb.CDB

destroy_pipe()
Return type:

None