aac_metrics.functional.bert_score_mrefs module¶

class BERTScoreMRefsScores¶: Bases: dict

bert_score_mrefs( candidates: list[str], mult_references: list[list[str]], return_all_scores: True = True, *, model: str | Module = DEFAULT_BERT_SCORE_MODEL, tokenizer: Callable | None = None, device: str | device | None = 'cuda_if_available', batch_size: int | None = 32, num_threads: int = 0, max_length: int = 64, reset_state: bool = True, idf: bool = False, reduction: 'mean' | 'max' | 'min' | Callable[[...], Tensor] = 'max', filter_nan: bool = True, verbose: int = 0, ) → tuple[BERTScoreMRefsScores, BERTScoreMRefsScores][source]¶

bert_score_mrefs( candidates: list[str], mult_references: list[list[str]], return_all_scores: False, *, model: str | Module = DEFAULT_BERT_SCORE_MODEL, tokenizer: Callable | None = None, device: str | device | None = 'cuda_if_available', batch_size: int | None = 32, num_threads: int = 0, max_length: int = 64, reset_state: bool = True, idf: bool = False, reduction: 'mean' | 'max' | 'min' | Callable[[...], Tensor] = 'max', filter_nan: bool = True, verbose: int = 0, ) → Tensor

BERTScore metric which supports multiple references.

The implementation is based on the bert_score implementation of torchmetrics.

Paper: https://arxiv.org/pdf/1904.09675.pdf

Parameters:¶

candidates: list[str]¶: The list of sentences to evaluate.
mult_references: list[list[str]]¶: The list of list of sentences used as target.
return_all_scores: True = True¶
return_all_scores: False: If True, returns a tuple containing the globals and locals scores. Otherwise returns a scalar tensor containing the main global score. defaults to True.
model: str | Module = DEFAULT_BERT_SCORE_MODEL¶: The model name or the instantiated model to use to compute token embeddings. defaults to “roberta-large”.
tokenizer: Callable | None = None¶: The fast tokenizer used to split sentences into words. If None, use the tokenizer corresponding to the model argument. defaults to None.
device: str | device | None = 'cuda_if_available'¶: The PyTorch device used to run the BERT model. defaults to “cuda_if_available”.
batch_size: int | None = 32¶: The batch size used in the model forward.
num_threads: int = 0¶: A number of threads to use for a dataloader. defaults to 0.
max_length: int = 64¶: Max length when encoding sentences to tensor ids. defaults to 64.
idf: bool = False¶: Whether or not using Inverse document frequency to ponderate the BERTScores. defaults to False.
reduction: 'mean' | 'max' | 'min' | Callable[[...], Tensor] = 'max'¶: The reduction function to apply between multiple references for each audio. defaults to “max”.
filter_nan: bool = True¶: If True, replace NaN scores by 0.0. defaults to True.
verbose: int = 0¶: The verbose level. defaults to 0.

Returns:¶

A tuple of globals and locals scores or a scalar tensor with the main global score.