aac_metrics.functional.vocab module

class VocabScores

Bases: dict

vocab(candidates: list[str], mult_references: list[list[str]] | None, return_all_scores: bool = True, *, seed: None | int | ~torch._C.Generator = 1234, tokenizer: ~typing.Callable[[str], list[str]] = <method 'split' of 'str' objects>, dtype: ~torch.dtype = torch.float64, pop_strategy: ~typing.Literal['max', 'min'] | int = 'max', verbose: int = 0) tuple[VocabScores, VocabScores] | Tensor[source]

Compute vocabulary statistics.

Returns the candidate corpus vocabulary length, the references vocabulary length, the average vocabulary length for single references, and the vocabulary ratios between candidates and references.

Parameters:
candidates

The list of sentences to evaluate.

mult_references

The list of list of sentences used as target. Can also be None.

return_all_scores

If True, returns a tuple containing the globals and locals scores. Otherwise returns a scalar tensor containing the main global score. defaults to True.

seed

Random seed used to compute average vocabulary length for multiple references. defaults to 1234.

tokenizer

The function used to split a sentence into tokens. defaults to str.split.

dtype

Torch floating point dtype for numerical precision. defaults to torch.float64.

pop_strategy

Strategy to compute average reference vocab. defaults to “max”.

verbose

The verbose level. defaults to 0.

Returns:

A tuple of globals and locals scores or a scalar tensor with the main global score.