medcat.utils.model_creator
Module Contents
Functions
|
Create concept database from csv. |
|
Create vocabulary for word embeddings and spell check from list of training documents and CDB. |
|
Perform unsupervised training and save updated CDB. |
|
Create MedCAT CDB and Vocabulary models. |
|
Attributes
- medcat.utils.model_creator.DEFAULT_UNIGRAM_TABLE_SIZE = 100000000
- medcat.utils.model_creator.logger
- medcat.utils.model_creator.create_cdb(concept_csv_file, medcat_config)
Create concept database from csv.
- Parameters:
concept_csv_file (Path) – Path to CSV file containing all concepts and synonyms.
medcat_config (Config) – MedCAT configuration file.
- Returns:
CDB – MedCAT concept database containing list of entities and synonyms, without context embeddings.
- Return type:
- medcat.utils.model_creator.create_vocab(cdb, training_data_list, medcat_config, output_dir, unigram_table_size)
Create vocabulary for word embeddings and spell check from list of training documents and CDB.
- Parameters:
cdb (medcat.cdb.CDB) – MedCAT concept database containing list of entities and synonyms.
training_data_list (list) – List of example documents.
medcat_config (medcat.config.Config) – MedCAT configuration file.
output_dir (pathlib.Path) – Output directory to write vocabulary and data.txt (required to create vocabulary) to.
unigram_table_size (int) – Size of unigram table to be initialized before creating vocabulary.
- Returns:
medcat.vocab.Vocab – MedCAT vocabulary created from CDB and training documents.
- medcat.utils.model_creator.train_unsupervised(cdb, vocab, config, output_dir, training_data_list)
Perform unsupervised training and save updated CDB.
Although not returned explicitly in this function, the CDB will be updated with context embeddings.
- Parameters:
cdb (medcat.cdb.CDB) – MedCAT concept database containing list of entities and synonyms.
vocab (medcat.vocab.Vocab) – MedCAT vocabulary created from CDB and training documents.
config (medcat.config.Config) – MedCAT configuration file.
output_dir (pathlib.Path) – Output directory to write updated CDB to.
training_data_list (list) – List of example documents.
- Returns:
medcat.cdb.CDB – MedCAT concept database containing list of entities and synonyms, as well as context embeddings.
- medcat.utils.model_creator.create_models(config_file)
Create MedCAT CDB and Vocabulary models.
- Parameters:
config_file (pathlib.Path) – Location of model creator configuration file to specify input, output and MedCAT configuration.
- Returns:
medcat.cat.CAT – Containing CDB, Vocab and Config.
- medcat.utils.model_creator.main(config_file)
- medcat.utils.model_creator.parser