aac_metrics.functional.bert_score_mrefs module

class BERTScoreMRefsScores

Bases: dict

bert_score_mrefs(
candidates: list[str],
mult_references: list[list[str]],
return_all_scores: True = True,
*,
model: str | Module = DEFAULT_BERT_SCORE_MODEL,
tokenizer: Callable | None = None,
device: str | device | None = 'cuda_if_available',
batch_size: int | None = 32,
num_threads: int = 0,
max_length: int = 64,
reset_state: bool = True,
idf: bool = False,
reduction: 'mean' | 'max' | 'min' | Callable[[...], Tensor] = 'max',
filter_nan: bool = True,
verbose: int = 0,
) tuple[BERTScoreMRefsScores, BERTScoreMRefsScores][source]
bert_score_mrefs(
candidates: list[str],
mult_references: list[list[str]],
return_all_scores: False,
*,
model: str | Module = DEFAULT_BERT_SCORE_MODEL,
tokenizer: Callable | None = None,
device: str | device | None = 'cuda_if_available',
batch_size: int | None = 32,
num_threads: int = 0,
max_length: int = 64,
reset_state: bool = True,
idf: bool = False,
reduction: 'mean' | 'max' | 'min' | Callable[[...], Tensor] = 'max',
filter_nan: bool = True,
verbose: int = 0,
) Tensor

BERTScore metric which supports multiple references.

The implementation is based on the bert_score implementation of torchmetrics.

Parameters:
candidates: list[str]

The list of sentences to evaluate.

mult_references: list[list[str]]

The list of list of sentences used as target.

return_all_scores: True = True
return_all_scores: False

If True, returns a tuple containing the globals and locals scores. Otherwise returns a scalar tensor containing the main global score. defaults to True.

model: str | Module = DEFAULT_BERT_SCORE_MODEL

The model name or the instantiated model to use to compute token embeddings. defaults to “roberta-large”.

tokenizer: Callable | None = None

The fast tokenizer used to split sentences into words. If None, use the tokenizer corresponding to the model argument. defaults to None.

device: str | device | None = 'cuda_if_available'

The PyTorch device used to run the BERT model. defaults to “cuda_if_available”.

batch_size: int | None = 32

The batch size used in the model forward.

num_threads: int = 0

A number of threads to use for a dataloader. defaults to 0.

max_length: int = 64

Max length when encoding sentences to tensor ids. defaults to 64.

idf: bool = False

Whether or not using Inverse document frequency to ponderate the BERTScores. defaults to False.

reduction: 'mean' | 'max' | 'min' | Callable[[...], Tensor] = 'max'

The reduction function to apply between multiple references for each audio. defaults to “max”.

filter_nan: bool = True

If True, replace NaN scores by 0.0. defaults to True.

verbose: int = 0

The verbose level. defaults to 0.

Returns:

A tuple of globals and locals scores or a scalar tensor with the main global score.