aac_metrics.classes.clap_sim module¶

class CLAPSim( return_all_scores: True = True, *, clap_method: 'audio' | 'text' = 'text', clap_model: str | CLAPWrapper = DEFAULT_CLAP_SIM_MODEL, device: str | device | None = 'cuda_if_available', batch_size: int | None = 32, reset_state: bool = True, seed: int | None = 42, verbose: int = 0, )[source]¶

class CLAPSim( return_all_scores: False, *, clap_method: 'audio' | 'text' = 'text', clap_model: str | CLAPWrapper = DEFAULT_CLAP_SIM_MODEL, device: str | device | None = 'cuda_if_available', batch_size: int | None = 32, reset_state: bool = True, seed: int | None = 42, verbose: int = 0, )

Bases: Generic[T_CLAPOut], AACMetric[T_CLAPOut]

Cosine-similarity of the Contrastive Language-Audio Pretraining (CLAP) embeddings.

The implementation is based on the msclap pypi package. Note: Instances of this class are not pickable.

Paper: https://arxiv.org/pdf/2411.00321
msclap package: https://pypi.org/project/msclap/

For more information, see clap_sim().

compute() → T_CLAPOut[source]¶

extra_repr() → str[source]¶

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

full_state_update : ClassVar[bool | None] = False¶

get_output_names() → tuple[str, ...][source]¶

higher_is_better : ClassVar[bool | None] = True¶

is_differentiable : ClassVar[bool | None] = False¶

max_value : ClassVar[float] = 1.0¶

min_value : ClassVar[float] = -1.0¶

reset() → None[source]¶

training : bool¶

update( candidates: list[str], mult_references_or_audio_paths: list[list[str]] | list[str], ) → None[source]¶