:py:mod:`medcat.cdb_maker` ========================== .. py:module:: medcat.cdb_maker Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: medcat.cdb_maker.CDBMaker Attributes ~~~~~~~~~~ .. autoapisummary:: medcat.cdb_maker.PH_REMOVE medcat.cdb_maker.logger .. py:data:: PH_REMOVE .. py:data:: logger .. py:class:: CDBMaker(config, cdb = None) Bases: :py:obj:`object` Given a CSV as shown in https://github.com/CogStack/MedCAT/tree/master/examples/ it creates a CDB or updates an existing one. :param config: Global config for MedCAT. :type config: medcat.config.Config :param cdb: If set the `CDBMaker` will update the existing `CDB` with new concepts in the CSV (Default value `None`). :type cdb: medcat.cdb.CDB .. py:method:: __init__(config, cdb = None) .. py:method:: reset_cdb() This will re-create a new internal CDB based on the same config. This will be necessary if/when you're wishing to call `prepare_csvs` multiple times on the same object `CDBMaker` instance. .. py:method:: prepare_csvs(csv_paths, sep = ',', encoding = None, escapechar = None, index_col = False, full_build = False, only_existing_cuis = False, **kwargs) Compile one or multiple CSVs into a CDB. Note: This class/method generally uses the same instance of the CDB. So if you're using the same CDBMaker and calling `prepare_csvs` multiple times, you are likely to get leakage from prior calls into new ones. To reset the CDB, call `reset_cdb`. :param csv_paths: An array of paths to the csv files that should be processed. Can also be an array of pd.DataFrames :type csv_paths: Union[pd.DataFrame, List[str]] :param sep: If necessary a custom separator for the csv files (Default value ','). :type sep: str :param encoding: Encoding to be used for reading the CSV file (Default value `None`). :type encoding: Optional[str] :param escapechar: Escape char for the CSV (Default value None). :type escapechar: Optional[str] :param index_col: Index column for pandas read_csv (Default value False). :type index_col: bool :param full_build: If False only the core portions of the CDB will be built (the ones required for the functioning of MedCAT). If True, everything will be added to the CDB - this usually includes concept descriptions, various forms of names etc (take care that this option produces a much larger CDB) (Default value False). :type full_build: bool :param only_existing_cuis: If True no new CUIs will be added, but only linked names will be extended. Mainly used when enriching names of a CDB (e.g. SNOMED with UMLS terms) (Default value `False`). :type only_existing_cuis: bool :param kwargs: Will be passed to pandas for CSV reading :type kwargs: Any .. note:: \*\*kwargs: Will be passed to pandas for CSV reading csv: Examples of the CSV used to make the CDB can be found on [GitHub](link) :Returns: **CDB** -- CDB with the new concepts added. .. py:method:: destroy_pipe()