aac_metrics.classes package¶

class BERTScoreMRefs( return_all_scores: bool = True, *, model: str | Module = 'roberta-large', device: str | device | None = 'cuda_if_available', batch_size: int | None = 32, num_threads: int = 0, max_length: int = 64, reset_state: bool = True, idf: bool = False, reduction: Literal['mean', 'max', 'min'] | Callable[[...], Tensor] = 'max', filter_nan: bool = True, verbose: int = 0, )[source]¶

Bases: AACMetric[tuple[BERTScoreMRefsScores, BERTScoreMRefsScores] | Tensor]

BERTScore metric which supports multiple references.

The implementation is based on the bert_score implementation of torchmetrics.

Paper: https://arxiv.org/pdf/1904.09675.pdf

For more information, see bert_score_mrefs().

compute() → tuple[BERTScoreMRefsScores, BERTScoreMRefsScores] | Tensor[source]¶

extra_repr() → str[source]¶

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

full_state_update: ClassVar[bool | None] = False¶

get_output_names() → tuple[str, ...][source]¶

higher_is_better: ClassVar[bool | None] = True¶

is_differentiable: ClassVar[bool | None] = False¶

max_value: ClassVar[float] = 1.0¶

min_value: ClassVar[float] = 0.0¶

reset() → None[source]¶

training: bool¶

update( candidates: list[str], mult_references: list[list[str]], ) → None[source]¶

class BLEU(return_all_scores: bool = True, *, n: int = 4, option: ~typing.Literal['shortest', 'average', 'closest'] = 'closest', verbose: int = 0, tokenizer: ~typing.Callable[[str], list[str]] = <method 'split' of 'str' objects>)[source]¶

Bases: AACMetric[tuple[dict[str, Tensor], dict[str, Tensor]] | Tensor]

BiLingual Evaluation Understudy metric class.

Paper: https://www.aclweb.org/anthology/P02-1040.pdf

For more information, see bleu().

compute() → tuple[dict[str, Tensor], dict[str, Tensor]] | Tensor[source]¶

extra_repr() → str[source]¶

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

full_state_update: ClassVar[bool | None] = False¶

get_output_names() → tuple[str, ...][source]¶

higher_is_better: ClassVar[bool | None] = True¶

is_differentiable: ClassVar[bool | None] = False¶

max_value: ClassVar[float] = 1.0¶

min_value: ClassVar[float] = 0.0¶

reset() → None[source]¶

training: bool¶

update( candidates: list[str], mult_references: list[list[str]], ) → None[source]¶

class BLEU1(return_all_scores: bool = True, option: ~typing.Literal['shortest', 'average', 'closest'] = 'closest', verbose: int = 0, tokenizer: ~typing.Callable[[str], list[str]] = <method 'split' of 'str' objects>)[source]¶: Bases: BLEU

class BLEU2(return_all_scores: bool = True, option: ~typing.Literal['shortest', 'average', 'closest'] = 'closest', verbose: int = 0, tokenizer: ~typing.Callable[[str], list[str]] = <method 'split' of 'str' objects>)[source]¶: Bases: BLEU

class BLEU3(return_all_scores: bool = True, option: ~typing.Literal['shortest', 'average', 'closest'] = 'closest', verbose: int = 0, tokenizer: ~typing.Callable[[str], list[str]] = <method 'split' of 'str' objects>)[source]¶: Bases: BLEU

class BLEU4(return_all_scores: bool = True, option: ~typing.Literal['shortest', 'average', 'closest'] = 'closest', verbose: int = 0, tokenizer: ~typing.Callable[[str], list[str]] = <method 'split' of 'str' objects>)[source]¶: Bases: BLEU

class CIDErD(return_all_scores: bool = True, *, n: int = 4, sigma: float = 6.0, tokenizer: ~typing.Callable[[str], list[str]] = <method 'split' of 'str' objects>, return_tfidf: bool = False, scale: float = 10.0)[source]¶

Bases: AACMetric[tuple[CIDErDScores, CIDErDScores] | Tensor]

Consensus-based Image Description Evaluation metric class.

Paper: https://arxiv.org/pdf/1411.5726.pdf

For more information, see cider_d().

compute() → tuple[CIDErDScores, CIDErDScores] | Tensor[source]¶

extra_repr() → str[source]¶

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

full_state_update: ClassVar[bool | None] = False¶

get_output_names() → tuple[str, ...][source]¶

higher_is_better: ClassVar[bool | None] = True¶

is_differentiable: ClassVar[bool | None] = False¶

max_value: ClassVar[float] = 10.0¶

min_value: ClassVar[float] = 0.0¶

reset() → None[source]¶

training: bool¶

update( candidates: list[str], mult_references: list[list[str]], ) → None[source]¶

class CLAPSim( return_all_scores: bool = True, *, clap_method: Literal['audio', 'text'] = 'text', clap_model: str | CLAPWrapper = 'MS-CLAP-2023', device: str | device | None = 'cuda_if_available', batch_size: int | None = 32, reset_state: bool = True, seed: int | None = 42, verbose: int = 0, )[source]¶

Bases: AACMetric[tuple[CLAPScores, CLAPScores] | Tensor]

Cosine-similarity of the Contrastive Language-Audio Pretraining (CLAP) embeddings.

The implementation is based on the msclap pypi package. Note: Instances of this class are not pickable.

Paper: https://arxiv.org/pdf/2411.00321
msclap package: https://pypi.org/project/msclap/

For more information, see clap_sim().

compute() → tuple[CLAPScores, CLAPScores] | Tensor[source]¶

extra_repr() → str[source]¶

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

full_state_update: ClassVar[bool | None] = False¶

get_output_names() → tuple[str, ...][source]¶

higher_is_better: ClassVar[bool | None] = True¶

is_differentiable: ClassVar[bool | None] = False¶

max_value: ClassVar[float] = 1.0¶

min_value: ClassVar[float] = -1.0¶

reset() → None[source]¶

training: bool¶

update( candidates: list[str], mult_references_or_audio_paths: list[list[str]] | list[str], ) → None[source]¶

Bases: Evaluate

Evaluate candidates with multiple references with DCASE2023 Audio Captioning metrics.

For more information, see dcase2023_evaluate().

Bases: Evaluate

Evaluate candidates with multiple references with DCASE2024 Audio Captioning metrics.

For more information, see dcase2024_evaluate().

Bases: list[AACMetric], AACMetric[tuple[dict[str, Tensor], dict[str, Tensor]]]

Evaluate candidates with multiple references with custom metrics.

For more information, see evaluate().

compute() → tuple[dict[str, Tensor], dict[str, Tensor]][source]¶

full_state_update: ClassVar[bool | None] = False¶

higher_is_better: ClassVar[bool | None] = None¶

is_differentiable: ClassVar[bool | None] = False¶

reset() → None[source]¶

tolist() → list[AACMetric][source]¶

training: bool¶

update( candidates: list[str], mult_references: list[list[str]], ) → None[source]¶

class FENSE( return_all_scores: bool = True, *, sbert_model: str | SentenceTransformer = 'paraphrase-TinyBERT-L6-v2', echecker: str | BERTFlatClassifier = 'echecker_clotho_audiocaps_base', echecker_tokenizer: AutoTokenizer | None = None, error_threshold: float = 0.9, device: str | device | None = 'cuda_if_available', batch_size: int | None = 32, reset_state: bool = True, return_probs: bool = False, penalty: float = 0.9, verbose: int = 0, )[source]¶

Bases: AACMetric[tuple[FENSEScores, FENSEScores] | Tensor]

Fluency ENhanced Sentence-bert Evaluation (FENSE)

Paper: https://arxiv.org/abs/2110.04684
Original implementation: https://github.com/blmoistawinde/fense

For more information, see fense().

compute() → tuple[FENSEScores, FENSEScores] | Tensor[source]¶

extra_repr() → str[source]¶

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

full_state_update: ClassVar[bool | None] = False¶

get_output_names() → tuple[str, ...][source]¶

higher_is_better: ClassVar[bool | None] = True¶

is_differentiable: ClassVar[bool | None] = False¶

max_value: ClassVar[float] = 1.0¶

min_value: ClassVar[float] = -1.0¶

reset() → None[source]¶

training: bool¶

update( candidates: list[str], mult_references: list[list[str]], ) → None[source]¶

class FER( return_all_scores: bool = True, *, echecker: str | BERTFlatClassifier = 'echecker_clotho_audiocaps_base', error_threshold: float = 0.9, device: str | device | None = 'cuda_if_available', batch_size: int | None = 32, reset_state: bool = True, return_probs: bool = False, verbose: int = 0, )[source]¶

Bases: AACMetric[tuple[FERScores, FERScores] | Tensor]

Return Fluency Error Rate (FER) detected by a pre-trained BERT model.

Paper: https://arxiv.org/abs/2110.04684
Original implementation: https://github.com/blmoistawinde/fense

For more information, see fer().

compute() → tuple[FERScores, FERScores] | Tensor[source]¶

extra_repr() → str[source]¶

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

full_state_update: ClassVar[bool | None] = False¶

get_output_names() → tuple[str, ...][source]¶

higher_is_better: ClassVar[bool | None] = False¶

is_differentiable: ClassVar[bool | None] = False¶

max_value: ClassVar[float] = 1.0¶

min_value: ClassVar[float] = -1.0¶

reset() → None[source]¶

training: bool¶

update(

candidates: list[str],

*args,

**kwargs,

) → None[source]¶

class MACE( return_all_scores: bool = True, *, mace_method: Literal['text', 'audio', 'combined'] = 'text', penalty: float = 0.3, clap_model: str | CLAPWrapper = 'MS-CLAP-2023', seed: int | None = 42, echecker: str | BERTFlatClassifier = 'echecker_clotho_audiocaps_base', echecker_tokenizer: AutoTokenizer | None = None, error_threshold: float = 0.97, device: str | device | None = 'cuda_if_available', batch_size: int | None = 32, reset_state: bool = True, return_probs: bool = False, verbose: int = 0, )[source]¶

Bases: AACMetric[tuple[MACEScores, MACEScores] | Tensor]

Multimodal Audio-Caption Evaluation class (MACE).

MACE is a metric designed for evaluating automated audio captioning (AAC) systems. Unlike metrics that compare machine-generated captions solely to human references, MACE uses both audio and text to improve evaluation. By integrating both audio and text, it produces assessments that align better with human judgments.

The implementation is based on the mace original implementation (original author have accepted to include their code in aac-metrics under the MIT license). Note: Instances of this class are not pickable.

Paper: https://arxiv.org/pdf/2411.00321
Original author: Satvik Dixit
Original implementation: https://github.com/satvik-dixit/mace/tree/main

For more information, see mace().

compute() → tuple[MACEScores, MACEScores] | Tensor[source]¶

extra_repr() → str[source]¶

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

full_state_update: ClassVar[bool | None] = False¶

get_output_names() → tuple[str, ...][source]¶

higher_is_better: ClassVar[bool | None] = True¶

is_differentiable: ClassVar[bool | None] = False¶

max_value: ClassVar[float] = 1.0¶

min_value: ClassVar[float] = -1.0¶

reset() → None[source]¶

training: bool¶

update( candidates: list[str], mult_references: list[list[str]] | None = None, audio_paths: list[str] | None = None, ) → None[source]¶

class METEOR( return_all_scores: bool = True, *, cache_path: str | Path | None = None, java_path: str | Path | None = None, java_max_memory: str = '2G', language: Literal['en', 'cz', 'de', 'es', 'fr'] = 'en', use_shell: bool | None = None, params: Iterable[float] | None = None, weights: Iterable[float] | None = None, verbose: int = 0, )[source]¶

Bases: AACMetric[tuple[METEORScores, METEORScores] | Tensor]

Metric for Evaluation of Translation with Explicit ORdering metric class.

Paper: https://dl.acm.org/doi/pdf/10.5555/1626355.1626389
Documentation: https://www.cs.cmu.edu/~alavie/METEOR/README.html

For more information, see meteor().

compute() → tuple[METEORScores, METEORScores] | Tensor[source]¶

extra_repr() → str[source]¶

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

full_state_update: ClassVar[bool | None] = False¶

get_output_names() → tuple[str, ...][source]¶

higher_is_better: ClassVar[bool | None] = True¶

is_differentiable: ClassVar[bool | None] = False¶

max_value: ClassVar[float] = 1.0¶

min_value: ClassVar[float] = 0.0¶

reset() → None[source]¶

training: bool¶

update( candidates: list[str], mult_references: list[list[str]], ) → None[source]¶

class ROUGEL(return_all_scores: bool = True, *, beta: float = 1.2, tokenizer: ~typing.Callable[[str], list[str]] = <method 'split' of 'str' objects>)[source]¶

Bases: AACMetric[tuple[ROUGELScores, ROUGELScores] | Tensor]

Recall-Oriented Understudy for Gisting Evaluation class.

Paper: https://aclanthology.org/W04-1013.pdf

For more information, see rouge_l().

compute() → tuple[ROUGELScores, ROUGELScores] | Tensor[source]¶

extra_repr() → str[source]¶

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

full_state_update: ClassVar[bool | None] = False¶

get_output_names() → tuple[str, ...][source]¶

higher_is_better: ClassVar[bool | None] = True¶

is_differentiable: ClassVar[bool | None] = False¶

max_value: ClassVar[float] = 1.0¶

min_value: ClassVar[float] = 0.0¶

reset() → None[source]¶

training: bool¶

update( candidates: list[str], mult_references: list[list[str]], ) → None[source]¶

class SBERTSim( return_all_scores: bool = True, *, sbert_model: str | SentenceTransformer = 'paraphrase-TinyBERT-L6-v2', device: str | device | None = 'cuda_if_available', batch_size: int | None = 32, reset_state: bool = True, verbose: int = 0, )[source]¶

Bases: AACMetric[tuple[SBERTSimScores, SBERTSimScores] | Tensor]

Cosine-similarity of the Sentence-BERT embeddings.

Paper: https://arxiv.org/abs/1908.10084
Original implementation: https://github.com/blmoistawinde/fense

For more information, see sbert().

compute() → tuple[SBERTSimScores, SBERTSimScores] | Tensor[source]¶

extra_repr() → str[source]¶

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

full_state_update: ClassVar[bool | None] = False¶

get_output_names() → tuple[str, ...][source]¶

higher_is_better: ClassVar[bool | None] = True¶

is_differentiable: ClassVar[bool | None] = False¶

max_value: ClassVar[float] = 1.0¶

min_value: ClassVar[float] = -1.0¶

reset() → None[source]¶

training: bool¶

update( candidates: list[str], mult_references: list[list[str]], ) → None[source]¶

Bases: AACMetric[tuple[SPICEScores, SPICEScores] | Tensor]

Semantic Propositional Image Caption Evaluation class.

Paper: https://arxiv.org/pdf/1607.08822.pdf

For more information, see spice().

compute() → tuple[SPICEScores, SPICEScores] | Tensor[source]¶

extra_repr() → str[source]¶

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

full_state_update: ClassVar[bool | None] = False¶

get_output_names() → tuple[str, ...][source]¶

higher_is_better: ClassVar[bool | None] = True¶

is_differentiable: ClassVar[bool | None] = False¶

max_value: ClassVar[float] = 1.0¶

min_value: ClassVar[float] = 0.0¶

reset() → None[source]¶

training: bool¶

update( candidates: list[str], mult_references: list[list[str]], ) → None[source]¶

Bases: AACMetric[tuple[SPIDErScores, SPIDErScores] | Tensor]

SPIDEr class.

Paper: https://arxiv.org/pdf/1612.00370.pdf

For more information, see spider().

compute() → tuple[SPIDErScores, SPIDErScores] | Tensor[source]¶

extra_repr() → str[source]¶

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

full_state_update: ClassVar[bool | None] = False¶

get_output_names() → tuple[str, ...][source]¶

higher_is_better: ClassVar[bool | None] = True¶

is_differentiable: ClassVar[bool | None] = False¶

max_value: ClassVar[float] = 5.5¶

min_value: ClassVar[float] = 0.0¶

reset() → None[source]¶

training: bool¶

update( candidates: list[str], mult_references: list[list[str]], ) → None[source]¶

class SPIDErFL( return_all_scores: bool = True, *, n: int = 4, sigma: float = 6.0, cache_path: str | Path | None = None, java_path: str | Path | None = None, tmp_path: str | Path | None = None, n_threads: int | None = None, java_max_memory: str = '8G', timeout: None | int | Iterable[int] = None, echecker: str | BERTFlatClassifier = 'echecker_clotho_audiocaps_base', echecker_tokenizer: AutoTokenizer | None = None, error_threshold: float = 0.9, device: str | device | None = 'cuda_if_available', batch_size: int | None = 32, reset_state: bool = True, return_probs: bool = True, penalty: float = 0.9, verbose: int = 0, )[source]¶

Bases: AACMetric[tuple[SPIDErFLScores, SPIDErFLScores] | Tensor]

SPIDErFL class.

For more information, see spider_fl().

compute() → tuple[SPIDErFLScores, SPIDErFLScores] | Tensor[source]¶

extra_repr() → str[source]¶

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

full_state_update: ClassVar[bool | None] = False¶

get_output_names() → tuple[str, ...][source]¶

higher_is_better: ClassVar[bool | None] = True¶

is_differentiable: ClassVar[bool | None] = False¶

max_value: ClassVar[float] = 5.5¶

min_value: ClassVar[float] = 0.0¶

reset() → None[source]¶

training: bool¶

update( candidates: list[str], mult_references: list[list[str]], ) → None[source]¶

Bases: AACMetric[tuple[SPIDErMaxScores, SPIDErMaxScores] | Tensor]

SPIDEr-max class.

Paper: https://hal.archives-ouvertes.fr/hal-03810396/file/Labbe_DCASE2022.pdf

For more information, see spider().

compute() → tuple[SPIDErMaxScores, SPIDErMaxScores] | Tensor[source]¶

extra_repr() → str[source]¶

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

full_state_update: ClassVar[bool | None] = False¶

get_output_names() → tuple[str, ...][source]¶

higher_is_better: ClassVar[bool | None] = True¶

is_differentiable: ClassVar[bool | None] = False¶

max_value: ClassVar[float] = 5.5¶

min_value: ClassVar[float] = 0.0¶

reset() → None[source]¶

training: bool¶

update( mult_candidates: list[list[str]], mult_references: list[list[str]], ) → None[source]¶

class Vocab(return_all_scores: bool = True, *, seed: None | int | ~torch._C.Generator = 1234, tokenizer: ~typing.Callable[[str], list[str]] = <method 'split' of 'str' objects>, dtype: ~torch.dtype = torch.float64, pop_strategy: ~typing.Literal['max', 'min'] | int = 'max', verbose: int = 0)[source]¶

Bases: AACMetric[tuple[VocabScores, VocabScores] | Tensor]

VocabStats class.

For more information, see vocab().

compute() → tuple[VocabScores, VocabScores] | Tensor[source]¶

full_state_update: ClassVar[bool | None] = False¶

get_output_names() → tuple[str, ...][source]¶

higher_is_better: ClassVar[bool | None] = None¶

is_differentiable: ClassVar[bool | None] = False¶

max_value: ClassVar[float] = inf¶

min_value: ClassVar[float] = 0.0¶

reset() → None[source]¶

training: bool¶

update( candidates: list[str], mult_references: list[list[str]] | None = None, ) → None[source]¶

aac_metrics.classes package¶

Submodules¶