medcat.linking.vector_context_model

Module Contents

Classes

ContextModel

Used to learn embeddings for concepts and calculate similarities in new documents.

Attributes

logger

medcat.linking.vector_context_model.logger
class medcat.linking.vector_context_model.ContextModel(cdb, vocab, config)

Bases: object

Used to learn embeddings for concepts and calculate similarities in new documents.

Parameters:
  • cdb (CDB) – The Context Database

  • vocab (Vocab) – The vocabulary

  • config (Config) – The config to be used

__init__(cdb, vocab, config)
Parameters:
Return type:

None

get_context_tokens(entity, doc, size)

Get context tokens for an entity, this will skip anything that is marked as skip in token._.to_skip

Parameters:
  • entity (Span) – The entity to look for.

  • doc (Doc) – The document look in.

  • size (int) – The size of the entity.

Returns:

Tuple – The tokens on the left, centre, and right.

Return type:

Tuple

get_context_vectors(entity, doc, cui=None)

Given an entity and the document it will return the context representation for the given entity.

Parameters:
  • entity (Span) – The entity to look for.

  • doc (Doc) – The document to look in.

  • cui (Any) – The CUI.

Returns:

Dict – The context vector.

Return type:

Dict

similarity(cui, entity, doc)

Calculate the similarity between the learnt context for this CUI and the context in the given doc.

Parameters:
  • cui (str) – The CUI.

  • entity (Span) – The entity to look for.

  • doc (Doc) – The document to look in.

Returns:

float – The simularity.

Return type:

float

_similarity(cui, vectors)

Calculate similarity once we have vectors and a cui.

Parameters:
  • cui (str) – The CUI.

  • vectors (Dict) – The vectors.

Returns:

float – The similarity.

Return type:

float

disambiguate(cuis, entity, name, doc)
Parameters:
  • cuis (List) –

  • entity (spacy.tokens.Span) –

  • name (str) –

  • doc (spacy.tokens.Doc) –

Return type:

Tuple

train(cui, entity, doc, negative=False, names=[])

Update the context representation for this CUI, given it’s correct location (entity) in a document (doc).

Parameters:
  • cui (str) – The CUI to train.

  • entity (Span) – The entity we’re at.

  • doc (Doc) – The document within which we’re working.

  • negative (bool) – Whether or not the example is negative. Defaults to False.

  • names (List[str]/Dict) – Optionally used to update the status of a name-cui pair in the CDB.

Return type:

None

train_using_negative_sampling(cui)
Parameters:

cui (str) –

Return type:

None