aac_metrics.functional.vocab module

class VocabScores

Bases: dict

vocab(candidates: list[str], mult_references: list[list[str]] | None, return_all_scores: bool = True, *, seed: None | int | ~torch._C.Generator = 1234, tokenizer: ~typing.Callable[[str], list[str]] = <method 'split' of 'str' objects>, dtype: ~torch.dtype = torch.float64, pop_strategy: ~typing.Literal['max', 'min'] | int = 'max', verbose: int = 0) tuple[VocabScores, VocabScores] | Tensor[source]

Compute vocabulary statistics.

Returns the candidate corpus vocabulary length, the references vocabulary length, the average vocabulary length for single references, and the vocabulary ratios between candidates and references.

Parameters:
  • candidates – The list of sentences to evaluate.

  • mult_references – The list of list of sentences used as target. Can also be None.

  • return_all_scores – If True, returns a tuple containing the globals and locals scores. Otherwise returns a scalar tensor containing the main global score. defaults to True.

  • seed – Random seed used to compute average vocabulary length for multiple references. defaults to 1234.

  • tokenizer – The function used to split a sentence into tokens. defaults to str.split.

  • dtype – Torch floating point dtype for numerical precision. defaults to torch.float64.

  • pop_strategy – Strategy to compute average reference vocab. defaults to “max”.

  • verbose – The verbose level. defaults to 0.

Returns:

A tuple of globals and locals scores or a scalar tensor with the main global score.