aac_metrics.classes.mace module

class MACE(
return_all_scores: bool = True,
*,
mace_method: Literal['text', 'audio', 'combined'] = 'text',
penalty: float = 0.3,
clap_model: str | CLAPWrapper = 'MS-CLAP-2023',
seed: int | None = 42,
echecker: str | BERTFlatClassifier = 'echecker_clotho_audiocaps_base',
echecker_tokenizer: AutoTokenizer | None = None,
error_threshold: float = 0.97,
device: str | device | None = 'cuda_if_available',
batch_size: int | None = 32,
reset_state: bool = True,
return_probs: bool = False,
verbose: int = 0,
)[source]

Bases: AACMetric[tuple[MACEScores, MACEScores] | Tensor]

Multimodal Audio-Caption Evaluation class (MACE).

MACE is a metric designed for evaluating automated audio captioning (AAC) systems. Unlike metrics that compare machine-generated captions solely to human references, MACE uses both audio and text to improve evaluation. By integrating both audio and text, it produces assessments that align better with human judgments.

The implementation is based on the mace original implementation (original author have accepted to include their code in aac-metrics under the MIT license). Note: Instances of this class are not pickable.

For more information, see mace().

compute() tuple[MACEScores, MACEScores] | Tensor[source]
extra_repr() str[source]

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

full_state_update: ClassVar[bool | None] = False
get_output_names() tuple[str, ...][source]
higher_is_better: ClassVar[bool | None] = True
is_differentiable: ClassVar[bool | None] = False
max_value: ClassVar[float] = 1.0
min_value: ClassVar[float] = -1.0
reset() None[source]
training: bool
update(
candidates: list[str],
mult_references: list[list[str]] | None = None,
audio_paths: list[str] | None = None,
) None[source]