aac_metrics.functional.vocab module¶

class VocabScores¶: Bases: dict

vocab(candidates: list[str], mult_references: list[list[str]] | None, return_all_scores: bool = True, *, seed: None | int | ~torch._C.Generator = 1234, tokenizer: ~typing.Callable[[str], list[str]] = <method 'split' of 'str' objects>, dtype: ~torch.dtype = torch.float64, pop_strategy: ~typing.Literal['max', 'min'] | int = 'max', verbose: int = 0) → tuple[VocabScores, VocabScores] | Tensor[source]¶

Compute vocabulary statistics.

Returns the candidate corpus vocabulary length, the references vocabulary length, the average vocabulary length for single references, and the vocabulary ratios between candidates and references.

Parameters:

candidates – The list of sentences to evaluate.
mult_references – The list of list of sentences used as target. Can also be None.
return_all_scores – If True, returns a tuple containing the globals and locals scores. Otherwise returns a scalar tensor containing the main global score. defaults to True.
seed – Random seed used to compute average vocabulary length for multiple references. defaults to 1234.
tokenizer – The function used to split a sentence into tokens. defaults to str.split.
dtype – Torch floating point dtype for numerical precision. defaults to torch.float64.
pop_strategy – Strategy to compute average reference vocab. defaults to “max”.
verbose – The verbose level. defaults to 0.

Returns:

A tuple of globals and locals scores or a scalar tensor with the main global score.