aac_metrics.classes.mace module¶
- class MACE(
- return_all_scores: bool = True,
- *,
- mace_method: Literal['text', 'audio', 'combined'] = 'text',
- penalty: float = 0.3,
- clap_model: str | CLAPWrapper = 'MS-CLAP-2023',
- seed: int | None = 42,
- echecker: str | BERTFlatClassifier = 'echecker_clotho_audiocaps_base',
- echecker_tokenizer: AutoTokenizer | None = None,
- error_threshold: float = 0.97,
- device: str | device | None = 'cuda_if_available',
- batch_size: int | None = 32,
- reset_state: bool = True,
- return_probs: bool = False,
- verbose: int = 0,
Bases:
AACMetric[tuple[MACEScores,MACEScores] |Tensor]Multimodal Audio-Caption Evaluation class (MACE).
MACE is a metric designed for evaluating automated audio captioning (AAC) systems. Unlike metrics that compare machine-generated captions solely to human references, MACE uses both audio and text to improve evaluation. By integrating both audio and text, it produces assessments that align better with human judgments.
The implementation is based on the mace original implementation (original author have accepted to include their code in aac-metrics under the MIT license). Note: Instances of this class are not pickable.
Original author: Satvik Dixit
Original implementation: https://github.com/satvik-dixit/mace/tree/main
For more information, see
mace().- compute() tuple[MACEScores, MACEScores] | Tensor[source]¶