aac_metrics.functional.clap_sim module

class CLAPScores

Bases: dict

clap_sim : Tensor
clap_sim(
candidates: list[str],
mult_references: list[list[str]] | None = None,
audio_paths: list[str] | None = None,
return_all_scores: True = True,
*,
clap_method: 'audio' | 'text' = 'text',
clap_model: str | CLAPWrapper = DEFAULT_CLAP_SIM_MODEL,
device: str | device | None = 'cuda_if_available',
batch_size: int | None = 32,
reset_state: bool = True,
seed: int | None = 42,
verbose: int = 0,
) tuple[CLAPScores, CLAPScores][source]
clap_sim(
candidates: list[str],
mult_references: list[list[str]] | None = None,
audio_paths: list[str] | None = None,
*,
return_all_scores: False,
clap_method: 'audio' | 'text' = 'text',
clap_model: str | CLAPWrapper = DEFAULT_CLAP_SIM_MODEL,
device: str | device | None = 'cuda_if_available',
batch_size: int | None = 32,
reset_state: bool = True,
seed: int | None = 42,
verbose: int = 0,
) Tensor

Cosine-similarity of the Contrastive Language-Audio Pretraining (CLAP) embeddings.

The implementation is based on the msclap pypi package.

Parameters:
candidates: list[str]

The list of sentences to evaluate.

mult_references: list[list[str]] | None = None

The list of list of sentences used as target when method is “text”. defaults to None.

audio_paths: list[str] | None = None

Audio filepaths required when method is “audio”. defaults to None.

return_all_scores: True = True
return_all_scores: False

If True, returns a tuple containing the globals and locals scores. Otherwise returns a scalar tensor containing the main global score. defaults to True.

clap_method: 'audio' | 'text' = 'text'

The method used to encode the sentences. Can be “text” or “audio”. defaults to “text”.

clap_model: str | CLAPWrapper = DEFAULT_CLAP_SIM_MODEL

The CLAP model used to extract sentence embeddings for cosine-similarity. defaults to “2023”.

device: str | device | None = 'cuda_if_available'

The PyTorch device used to run MACE models. If “cuda_if_available”, it will use cuda if available. defaults to “cuda_if_available”.

batch_size: int | None = 32

The batch size of the CLAP models. defaults to 32.

reset_state: bool = True

If True, reset the state of the PyTorch global generator after the initialization of the pre-trained models. defaults to True.

seed: int | None = 42

Optional seed to make CLAP-sim scores deterministic when using clap_method=”audio” on large audio files. defaults to 42.

verbose: int = 0

The verbose level. defaults to 0.

Returns:

A tuple of globals and locals scores or a scalar tensor with the main global score.