:py:mod:`medcat.utils.model_creator` ==================================== .. py:module:: medcat.utils.model_creator Module Contents --------------- Functions ~~~~~~~~~ .. autoapisummary:: medcat.utils.model_creator.create_cdb medcat.utils.model_creator.create_vocab medcat.utils.model_creator.train_unsupervised medcat.utils.model_creator.create_models medcat.utils.model_creator.main Attributes ~~~~~~~~~~ .. autoapisummary:: medcat.utils.model_creator.DEFAULT_UNIGRAM_TABLE_SIZE medcat.utils.model_creator.logger medcat.utils.model_creator.parser .. py:data:: DEFAULT_UNIGRAM_TABLE_SIZE :value: 100000000 .. py:data:: logger .. py:function:: create_cdb(concept_csv_file, medcat_config) Create concept database from csv. :param concept_csv_file: Path to CSV file containing all concepts and synonyms. :type concept_csv_file: Path :param medcat_config: MedCAT configuration file. :type medcat_config: Config :Returns: **CDB** -- MedCAT concept database containing list of entities and synonyms, without context embeddings. .. py:function:: create_vocab(cdb, training_data_list, medcat_config, output_dir, unigram_table_size) Create vocabulary for word embeddings and spell check from list of training documents and CDB. :param cdb: MedCAT concept database containing list of entities and synonyms. :type cdb: medcat.cdb.CDB :param training_data_list: List of example documents. :type training_data_list: list :param medcat_config: MedCAT configuration file. :type medcat_config: medcat.config.Config :param output_dir: Output directory to write vocabulary and data.txt (required to create vocabulary) to. :type output_dir: pathlib.Path :param unigram_table_size: Size of unigram table to be initialized before creating vocabulary. :type unigram_table_size: int :Returns: **medcat.vocab.Vocab** -- MedCAT vocabulary created from CDB and training documents. .. py:function:: train_unsupervised(cdb, vocab, config, output_dir, training_data_list) Perform unsupervised training and save updated CDB. Although not returned explicitly in this function, the CDB will be updated with context embeddings. :param cdb: MedCAT concept database containing list of entities and synonyms. :type cdb: medcat.cdb.CDB :param vocab: MedCAT vocabulary created from CDB and training documents. :type vocab: medcat.vocab.Vocab :param config: MedCAT configuration file. :type config: medcat.config.Config :param output_dir: Output directory to write updated CDB to. :type output_dir: pathlib.Path :param training_data_list: List of example documents. :type training_data_list: list :Returns: **medcat.cdb.CDB** -- MedCAT concept database containing list of entities and synonyms, as well as context embeddings. .. py:function:: create_models(config_file) Create MedCAT CDB and Vocabulary models. :param config_file: Location of model creator configuration file to specify input, output and MedCAT configuration. :type config_file: pathlib.Path :Returns: **medcat.cat.CAT** -- Containing CDB, Vocab and Config. .. py:function:: main(config_file) .. py:data:: parser