aac_metrics.functional.bert_score_mrefs module¶
- bert_score_mrefs(
- candidates: list[str],
- mult_references: list[list[str]],
- return_all_scores: bool = True,
- *,
- model: str | Module = 'roberta-large',
- tokenizer: Callable | None = None,
- device: str | device | None = 'cuda_if_available',
- batch_size: int | None = 32,
- num_threads: int = 0,
- max_length: int = 64,
- reset_state: bool = True,
- idf: bool = False,
- reduction: Literal['mean', 'max', 'min'] | Callable[[...], Tensor] = 'max',
- filter_nan: bool = True,
- verbose: int = 0,
BERTScore metric which supports multiple references.
The implementation is based on the bert_score implementation of torchmetrics.
- Parameters:
candidates – The list of sentences to evaluate.
mult_references – The list of list of sentences used as target.
return_all_scores – If True, returns a tuple containing the globals and locals scores. Otherwise returns a scalar tensor containing the main global score. defaults to True.
model – The model name or the instantiated model to use to compute token embeddings. defaults to “roberta-large”.
tokenizer – The fast tokenizer used to split sentences into words. If None, use the tokenizer corresponding to the model argument. defaults to None.
device – The PyTorch device used to run the BERT model. defaults to “cuda_if_available”.
batch_size – The batch size used in the model forward.
num_threads – A number of threads to use for a dataloader. defaults to 0.
max_length – Max length when encoding sentences to tensor ids. defaults to 64.
idf – Whether or not using Inverse document frequency to ponderate the BERTScores. defaults to False.
reduction – The reduction function to apply between multiple references for each audio. defaults to “max”.
filter_nan – If True, replace NaN scores by 0.0. defaults to True.
verbose – The verbose level. defaults to 0.
- Returns:
A tuple of globals and locals scores or a scalar tensor with the main global score.