`medcat.cdb_maker`

Module Contents

Classes

CDBMaker

Given a CSV as shown in https://github.com/CogStack/MedCAT/tree/master/examples/<example> it creates a CDB or

Attributes

`PH_REMOVE`
`logger`

medcat.cdb_maker.PH_REMOVE

medcat.cdb_maker.logger

class medcat.cdb_maker.CDBMaker(config, cdb=None)

Bases: object

Given a CSV as shown in https://github.com/CogStack/MedCAT/tree/master/examples/<example> it creates a CDB or updates an exisitng one.

Parameters:

config (medcat.config.Config) – Global config for MedCAT.
cdb (medcat.cdb.CDB) – If set the CDBMaker will updat the existing CDB with new concepts in the CSV (Default value None).

__init__(config, cdb=None)

Parameters:

config (medcat.config.Config) –
cdb (Optional[medcat.cdb.CDB]) –

Return type:

None

reset_cdb()

This will re-create a new internal CDB based on the same config.

This will be necessary if/when you’re wishing to call prepare_csvs multiple times on the same object CDBMaker instance.

Return type:: None

prepare_csvs(csv_paths, sep=',', encoding=None, escapechar=None, index_col=False, full_build=False, only_existing_cuis=False, **kwargs)

Compile one or multiple CSVs into a CDB.

Note: This class/method generally uses the same instance of the CDB.: So if you’re using the same CDBMaker and calling prepare_csvs multiple times, you are likely to get leakage from prior calls into new ones. To reset the CDB, call reset_cdb.

Parameters:

csv_paths (Union[pd.DataFrame, List[str]]) – An array of paths to the csv files that should be processed. Can also be an array of pd.DataFrames
sep (str) – If necessary a custom separator for the csv files (Default value ‘,’).
encoding (Optional[str]) – Encoding to be used for reading the CSV file (Default value None).
escapechar (Optional[str]) – Escape char for the CSV (Default value None).
index_col (bool) – Index column for pandas read_csv (Default value False).
full_build (bool) – If False only the core portions of the CDB will be built (the ones required for the functioning of MedCAT). If True, everything will be added to the CDB - this usually includes concept descriptions, various forms of names etc (take care that this option produces a much larger CDB) (Default value False).
only_existing_cuis (bool) – If True no new CUIs will be added, but only linked names will be extended. Mainly used when enriching names of a CDB (e.g. SNOMED with UMLS terms) (Default value False).
kwargs (Any) – Will be passed to pandas for CSV reading

Return type:

medcat.cdb.CDB

Note

**kwargs:: Will be passed to pandas for CSV reading
csv:: Examples of the CSV used to make the CDB can be found on [GitHub](link)

Returns:

CDB – CDB with the new concepts added.

Parameters:

csv_paths (Union[pandas.DataFrame, List[str]]) –
sep (str) –
encoding (Optional[str]) –
escapechar (Optional[str]) –
index_col (bool) –
full_build (bool) –
only_existing_cuis (bool) –
kwargs (Any) –

Return type:

medcat.cdb.CDB

destroy_pipe()

Return type:: None

medcat.cdb_maker

Module Contents

Classes

Attributes

`medcat.cdb_maker`