aac_metrics.functional.bleu module¶

bleu(candidates: list[str], mult_references: list[list[str]], return_all_scores: bool = True, *, n: int = 4, option: ~typing.Literal['shortest', 'average', 'closest'] = 'closest', verbose: int = 0, tokenizer: ~typing.Callable[[str], list[str]] = <method 'split' of 'str' objects>, return_1_to_n: bool = False) → tuple[dict[str, Tensor], dict[str, Tensor]] | Tensor[source]¶

BiLingual Evaluation Understudy function.

Paper: https://www.aclweb.org/anthology/P02-1040.pdf

Note: this version of the BLEU metric applies a penalty formula that depends on the size of all candidates and the length of the references, which means that the average score of the candidates is not equal to the corpus score.

Parameters:

candidates – The list of sentences to evaluate.
mult_references – The list of list of sentences used as target.
return_all_scores – If True, returns a tuple containing the globals and locals scores. Otherwise returns a scalar tensor containing the main global score. defaults to True.
n – Maximal number of n-grams taken into account. defaults to 4.
option – Corpus reference length mode. Can be “shortest”, “average” or “closest”. defaults to “closest”.
verbose – The verbose level. defaults to 0.
tokenizer – The fast tokenizer used to split sentences into words. defaults to str.split.
return_1_to_n – If True, returns the n-grams results from 1 to n. Otherwise return the n-grams scores. defauts to False.

Returns:

A tuple of globals and locals scores or a scalar tensor with the main global score.

bleu_1(candidates: list[str], mult_references: list[list[str]], return_all_scores: bool = True, *, option: ~typing.Literal['shortest', 'average', 'closest'] = 'closest', verbose: int = 0, tokenizer: ~typing.Callable[[str], list[str]] = <method 'split' of 'str' objects>, return_1_to_n: bool = False) → tuple[dict[str, Tensor], dict[str, Tensor]] | Tensor[source]¶

bleu_2(candidates: list[str], mult_references: list[list[str]], return_all_scores: bool = True, *, option: ~typing.Literal['shortest', 'average', 'closest'] = 'closest', verbose: int = 0, tokenizer: ~typing.Callable[[str], list[str]] = <method 'split' of 'str' objects>, return_1_to_n: bool = False) → tuple[dict[str, Tensor], dict[str, Tensor]] | Tensor[source]¶

bleu_3(candidates: list[str], mult_references: list[list[str]], return_all_scores: bool = True, *, option: ~typing.Literal['shortest', 'average', 'closest'] = 'closest', verbose: int = 0, tokenizer: ~typing.Callable[[str], list[str]] = <method 'split' of 'str' objects>, return_1_to_n: bool = False) → tuple[dict[str, Tensor], dict[str, Tensor]] | Tensor[source]¶

bleu_4(candidates: list[str], mult_references: list[list[str]], return_all_scores: bool = True, *, option: ~typing.Literal['shortest', 'average', 'closest'] = 'closest', verbose: int = 0, tokenizer: ~typing.Callable[[str], list[str]] = <method 'split' of 'str' objects>, return_1_to_n: bool = False) → tuple[dict[str, Tensor], dict[str, Tensor]] | Tensor[source]¶