aac_metrics.functional.clap_sim module¶
- clap_sim(
- candidates: list[str],
- mult_references: list[list[str]] | None =
None, - audio_paths: list[str] | None =
None, - return_all_scores: True =
True, - *,
- clap_method: 'audio' | 'text' =
'text', - clap_model: str | CLAPWrapper =
DEFAULT_CLAP_SIM_MODEL, - device: str | device | None =
'cuda_if_available', - batch_size: int | None =
32, - reset_state: bool =
True, - seed: int | None =
42, - verbose: int =
0, - clap_sim(
- candidates: list[str],
- mult_references: list[list[str]] | None =
None, - audio_paths: list[str] | None =
None, - *,
- return_all_scores: False,
- clap_method: 'audio' | 'text' =
'text', - clap_model: str | CLAPWrapper =
DEFAULT_CLAP_SIM_MODEL, - device: str | device | None =
'cuda_if_available', - batch_size: int | None =
32, - reset_state: bool =
True, - seed: int | None =
42, - verbose: int =
0, Cosine-similarity of the Contrastive Language-Audio Pretraining (CLAP) embeddings.
The implementation is based on the msclap pypi package.
msclap package: https://pypi.org/project/msclap/
- Parameters:¶
- candidates: list[str]¶
The list of sentences to evaluate.
- mult_references: list[list[str]] | None =
None¶ The list of list of sentences used as target when method is “text”. defaults to None.
- audio_paths: list[str] | None =
None¶ Audio filepaths required when method is “audio”. defaults to None.
- return_all_scores: True =
True¶ - return_all_scores: False
If True, returns a tuple containing the globals and locals scores. Otherwise returns a scalar tensor containing the main global score. defaults to True.
- clap_method: 'audio' | 'text' =
'text'¶ The method used to encode the sentences. Can be “text” or “audio”. defaults to “text”.
- clap_model: str | CLAPWrapper =
DEFAULT_CLAP_SIM_MODEL¶ The CLAP model used to extract sentence embeddings for cosine-similarity. defaults to “2023”.
- device: str | device | None =
'cuda_if_available'¶ The PyTorch device used to run MACE models. If “cuda_if_available”, it will use cuda if available. defaults to “cuda_if_available”.
- batch_size: int | None =
32¶ The batch size of the CLAP models. defaults to 32.
- reset_state: bool =
True¶ If True, reset the state of the PyTorch global generator after the initialization of the pre-trained models. defaults to True.
- seed: int | None =
42¶ Optional seed to make CLAP-sim scores deterministic when using clap_method=”audio” on large audio files. defaults to 42.
- verbose: int =
0¶ The verbose level. defaults to 0.
- Returns:¶
A tuple of globals and locals scores or a scalar tensor with the main global score.