aac_metrics package¶
Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.
- class AACMetric(
- **kwargs: Any,
Bases:
Module
,Generic
[OutType
]Base Metric module for AAC metrics. Similar to torchmetrics.Metric.
- forward( ) OutType [source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class BERTScoreMRefs(
- return_all_scores: bool = True,
- model: str | Module = 'roberta-large',
- device: str | device | None = 'cuda_if_available',
- batch_size: int = 32,
- num_threads: int = 0,
- max_length: int = 64,
- reset_state: bool = True,
- idf: bool = False,
- reduction: str | Callable[[Tensor, ...], Tensor] = 'max',
- filter_nan: bool = True,
- verbose: int = 0,
Bases:
AACMetric
BERTScore metric which supports multiple references.
The implementation is based on the bert_score implementation of torchmetrics.
For more information, see
bert_score_mrefs()
.
- class BLEU(return_all_scores: bool = True, n: int = 4, option: str = 'closest', verbose: int = 0, tokenizer: ~typing.Callable[[str], list[str]] = <method 'split' of 'str' objects>)[source]¶
Bases:
AACMetric
[Union
[tuple
[dict
[str
,Tensor
],dict
[str
,Tensor
]],Tensor
]]BiLingual Evaluation Understudy metric class.
For more information, see
bleu()
.
- class CIDErD(return_all_scores: bool = True, n: int = 4, sigma: float = 6.0, tokenizer: ~typing.Callable[[str], list[str]] = <method 'split' of 'str' objects>, return_tfidf: bool = False, scale: float = 10.0)[source]¶
Bases:
AACMetric
[Union
[tuple
[dict
[str
,Tensor
],dict
[str
,Any
]],Tensor
]]Consensus-based Image Description Evaluation metric class.
For more information, see
cider_d()
.
- class DCASE2023Evaluate(
- preprocess: bool = True,
- cache_path: str | Path | None = None,
- java_path: str | Path | None = None,
- tmp_path: str | Path | None = None,
- device: str | device | None = 'cuda_if_available',
- verbose: int = 0,
Bases:
Evaluate
Evaluate candidates with multiple references with DCASE2023 Audio Captioning metrics.
For more information, see
dcase2023_evaluate()
.
- class Evaluate(
- preprocess: bool = True,
- metrics: str | Iterable[str] | Iterable[AACMetric] = 'default',
- cache_path: str | Path | None = None,
- java_path: str | Path | None = None,
- tmp_path: str | Path | None = None,
- device: str | device | None = 'cuda_if_available',
- verbose: int = 0,
Bases:
list
[AACMetric
],AACMetric
[tuple
[dict
[str
,Tensor
],dict
[str
,Tensor
]]]Evaluate candidates with multiple references with custom metrics.
For more information, see
evaluate()
.
- class FENSE(
- return_all_scores: bool = True,
- sbert_model: str | SentenceTransformer = 'paraphrase-TinyBERT-L6-v2',
- echecker: str | BERTFlatClassifier = 'echecker_clotho_audiocaps_base',
- error_threshold: float = 0.9,
- device: str | device | None = 'cuda_if_available',
- batch_size: int = 32,
- reset_state: bool = True,
- return_probs: bool = False,
- penalty: float = 0.9,
- verbose: int = 0,
Bases:
AACMetric
[Union
[tuple
[dict
[str
,Tensor
],dict
[str
,Tensor
]],Tensor
]]Fluency ENhanced Sentence-bert Evaluation (FENSE)
Original implementation: https://github.com/blmoistawinde/fense
For more information, see
fense()
.
- class FER(
- return_all_scores: bool = True,
- echecker: str | BERTFlatClassifier = 'echecker_clotho_audiocaps_base',
- error_threshold: float = 0.9,
- device: str | device | None = 'cuda_if_available',
- batch_size: int = 32,
- reset_state: bool = True,
- return_probs: bool = False,
- verbose: int = 0,
Bases:
AACMetric
[Union
[tuple
[dict
[str
,Tensor
],dict
[str
,Tensor
]],Tensor
]]Return Fluency Error Rate (FER) detected by a pre-trained BERT model.
Original implementation: https://github.com/blmoistawinde/fense
For more information, see
fer()
.
- class METEOR(
- return_all_scores: bool = True,
- cache_path: str | Path | None = None,
- java_path: str | Path | None = None,
- java_max_memory: str = '2G',
- language: str = 'en',
- use_shell: bool | None = None,
- params: Iterable[float] | None = None,
- weights: Iterable[float] | None = None,
- verbose: int = 0,
Bases:
AACMetric
[Union
[tuple
[dict
[str
,Tensor
],dict
[str
,Tensor
]],Tensor
]]Metric for Evaluation of Translation with Explicit ORdering metric class.
Documentation: https://www.cs.cmu.edu/~alavie/METEOR/README.html
For more information, see
meteor()
.
- class ROUGEL(return_all_scores: bool = True, beta: float = 1.2, tokenizer: ~typing.Callable[[str], list[str]] = <method 'split' of 'str' objects>)[source]¶
Bases:
AACMetric
[Union
[tuple
[dict
[str
,Tensor
],dict
[str
,Tensor
]],Tensor
]]Recall-Oriented Understudy for Gisting Evaluation class.
For more information, see
rouge_l()
.
- class SBERTSim(
- return_all_scores: bool = True,
- sbert_model: str | SentenceTransformer = 'paraphrase-TinyBERT-L6-v2',
- device: str | device | None = 'cuda_if_available',
- batch_size: int = 32,
- reset_state: bool = True,
- verbose: int = 0,
Bases:
AACMetric
[Union
[tuple
[dict
[str
,Tensor
],dict
[str
,Tensor
]],Tensor
]]Cosine-similarity of the Sentence-BERT embeddings.
Original implementation: https://github.com/blmoistawinde/fense
For more information, see
sbert()
.
- class SPICE(
- return_all_scores: bool = True,
- cache_path: str | Path | None = None,
- java_path: str | Path | None = None,
- tmp_path: str | Path | None = None,
- n_threads: int | None = None,
- java_max_memory: str = '8G',
- timeout: None | int | Iterable[int] = None,
- separate_cache_dir: bool = True,
- use_shell: bool | None = None,
- verbose: int = 0,
Bases:
AACMetric
[Union
[tuple
[dict
[str
,Tensor
],dict
[str
,Tensor
]],Tensor
]]Semantic Propositional Image Caption Evaluation class.
For more information, see
spice()
.
- class SPIDEr(
- return_all_scores: bool = True,
- n: int = 4,
- sigma: float = 6.0,
- cache_path: str | Path | None = None,
- java_path: str | Path | None = None,
- tmp_path: str | Path | None = None,
- n_threads: int | None = None,
- java_max_memory: str = '8G',
- timeout: None | int | Iterable[int] = None,
- verbose: int = 0,
Bases:
AACMetric
[Union
[tuple
[dict
[str
,Tensor
],dict
[str
,Tensor
]],Tensor
]]SPIDEr class.
For more information, see
spider()
.
- class SPIDErFL(
- return_all_scores: bool = True,
- n: int = 4,
- sigma: float = 6.0,
- cache_path: str | Path | None = None,
- java_path: str | Path | None = None,
- tmp_path: str | Path | None = None,
- n_threads: int | None = None,
- java_max_memory: str = '8G',
- timeout: None | int | Iterable[int] = None,
- echecker: str | BERTFlatClassifier = 'echecker_clotho_audiocaps_base',
- echecker_tokenizer: AutoTokenizer | None = None,
- error_threshold: float = 0.9,
- device: str | device | None = 'cuda_if_available',
- batch_size: int = 32,
- reset_state: bool = True,
- return_probs: bool = True,
- penalty: float = 0.9,
- verbose: int = 0,
Bases:
AACMetric
[Union
[tuple
[dict
[str
,Tensor
],dict
[str
,Tensor
]],Tensor
]]SPIDErFL class.
For more information, see
spider_fl()
.
- class SPIDErMax(
- return_all_scores: bool = True,
- return_all_cands_scores: bool = False,
- n: int = 4,
- sigma: float = 6.0,
- cache_path: str | Path | None = None,
- java_path: str | Path | None = None,
- tmp_path: str | Path | None = None,
- n_threads: int | None = None,
- java_max_memory: str = '8G',
- timeout: None | int | Iterable[int] = None,
- verbose: int = 0,
Bases:
AACMetric
[Union
[tuple
[dict
[str
,Tensor
],dict
[str
,Tensor
]],Tensor
]]SPIDEr-max class.
For more information, see
spider()
.
- class Vocab(return_all_scores: bool = True, seed: None | int | ~torch._C.Generator = 1234, tokenizer: ~typing.Callable[[str], list[str]] = <method 'split' of 'str' objects>, dtype: ~torch.dtype = torch.float64, pop_strategy: str = 'max', verbose: int = 0)[source]¶
Bases:
AACMetric
[Union
[tuple
[dict
[str
,Tensor
],dict
[str
,Tensor
]],Tensor
]]VocabStats class.
For more information, see
vocab()
.
- dcase2023_evaluate(
- candidates: list[str],
- mult_references: list[list[str]],
- preprocess: bool = True,
- cache_path: str | Path | None = None,
- java_path: str | Path | None = None,
- tmp_path: str | Path | None = None,
- device: str | device | None = 'cuda_if_available',
- verbose: int = 0,
Evaluate candidates with multiple references with the DCASE2023 Audio Captioning metrics.
- Parameters:
candidates – The list of sentences to evaluate.
mult_references – The list of list of sentences used as target.
preprocess – If True, the candidates and references will be passed as input to the PTB stanford tokenizer before computing metrics. defaults to True.
cache_path – The path to the external code directory. defaults to the value returned by
get_default_cache_path()
.java_path – The path to the java executable. defaults to the value returned by
get_default_java_path()
.tmp_path – Temporary directory path. defaults to the value returned by
get_default_tmp_path()
.device – The PyTorch device used to run FENSE and SPIDErFL models. If None, it will try to detect use cuda if available. defaults to “cuda_if_available”.
verbose – The verbose level. defaults to 0.
- Returns:
A tuple contains the corpus and sentences scores.
- evaluate(
- candidates: list[str],
- mult_references: list[list[str]],
- preprocess: bool = True,
- metrics: str | Iterable[str] | Iterable[Callable[[list, list], tuple]] = 'default',
- cache_path: str | Path | None = None,
- java_path: str | Path | None = None,
- tmp_path: str | Path | None = None,
- device: str | device | None = 'cuda_if_available',
- verbose: int = 0,
Evaluate candidates with multiple references with custom metrics.
- Parameters:
candidates – The list of sentences to evaluate.
mult_references – The list of list of sentences used as target.
preprocess – If True, the candidates and references will be passed as input to the PTB stanford tokenizer before computing metrics.defaults to True.
metrics – The name of the metric list or the explicit list of metrics to compute. defaults to “default”.
cache_path – The path to the external code directory. defaults to the value returned by
get_default_cache_path()
.java_path – The path to the java executable. defaults to the value returned by
get_default_java_path()
.tmp_path – Temporary directory path. defaults to the value returned by
get_default_tmp_path()
.device – The PyTorch device used to run FENSE and SPIDErFL models. If None, it will try to detect use cuda if available. defaults to “cuda_if_available”.
verbose – The verbose level. defaults to 0.
- Returns:
A tuple contains the corpus and sentences scores.
- get_default_cache_path() str [source]¶
Returns the default cache directory path.
If
set_default_cache_path()
has been used before with a string argument, it will return the value given to this function. Else if the environment variable AAC_METRICS_CACHE_PATH has been set to a string, it will return its value. Else it will be equal to “~/.cache” by default.
- get_default_java_path() str [source]¶
Returns the default java executable path.
If
set_default_java_path()
has been used before with a string argument, it will return the value given to this function. Else if the environment variable AAC_METRICS_JAVA_PATH has been set to a string, it will return its value. Else it will be equal to “java” by default.
- get_default_tmp_path() str [source]¶
Returns the default temporary directory path.
If
set_default_tmp_path()
has been used before with a string argument, it will return the value given to this function. Else if the environment variable AAC_METRICS_TMP_PATH has been set to a string, it will return its value. Else it will be equal to the value returned bygettempdir()
by default.
- load_metric(
- name: str,
- **kwargs,
Load a metric class by name.
- Parameters:
name – The name of the metric.
**kwargs –
The optional keyword arguments passed to the metric factory.
- Returns:
The Metric object built.
Subpackages¶
- aac_metrics.classes package
BERTScoreMRefs
BERTScoreMRefs.compute()
BERTScoreMRefs.extra_repr()
BERTScoreMRefs.full_state_update
BERTScoreMRefs.get_output_names()
BERTScoreMRefs.higher_is_better
BERTScoreMRefs.is_differentiable
BERTScoreMRefs.max_value
BERTScoreMRefs.min_value
BERTScoreMRefs.reset()
BERTScoreMRefs.training
BERTScoreMRefs.update()
BLEU
BLEU1
BLEU2
BLEU3
BLEU4
CIDErD
DCASE2023Evaluate
Evaluate
FENSE
FER
METEOR
ROUGEL
SBERTSim
SPICE
SPIDEr
SPIDErFL
SPIDErMax
Vocab
- Submodules
- aac_metrics.classes.base module
- aac_metrics.classes.bert_score_mrefs module
- aac_metrics.classes.bleu module
- aac_metrics.classes.cider_d module
- aac_metrics.classes.evaluate module
- aac_metrics.classes.fense module
- aac_metrics.classes.fer module
- aac_metrics.classes.meteor module
- aac_metrics.classes.rouge_l module
- aac_metrics.classes.sbert_sim module
- aac_metrics.classes.spice module
- aac_metrics.classes.spider module
- aac_metrics.classes.spider_fl module
- aac_metrics.classes.spider_max module
- aac_metrics.classes.vocab module
- aac_metrics.functional package
bert_score_mrefs()
bleu()
bleu_1()
bleu_2()
bleu_3()
bleu_4()
cider_d()
dcase2023_evaluate()
evaluate()
fense()
fer()
meteor()
rouge_l()
sbert_sim()
spice()
spider()
spider_fl()
spider_max()
vocab()
- Submodules
- aac_metrics.functional.bert_score_mrefs module
- aac_metrics.functional.bleu module
- aac_metrics.functional.cider_d module
- aac_metrics.functional.evaluate module
- aac_metrics.functional.fense module
- aac_metrics.functional.fer module
- aac_metrics.functional.meteor module
- aac_metrics.functional.mult_cands module
- aac_metrics.functional.rouge_l module
- aac_metrics.functional.sbert_sim module
- aac_metrics.functional.spice module
- aac_metrics.functional.spider module
- aac_metrics.functional.spider_fl module
- aac_metrics.functional.spider_max module
- aac_metrics.functional.vocab module
- aac_metrics.utils package