aac_metrics package

Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.

class AACMetric(
**kwargs: Any,
)[source]

Bases: Module, Generic[T_OutType]

Base Metric module for AAC metrics. Similar to torchmetrics.Metric.

compute() T_OutType[source]
forward(
*args: Any,
**kwargs: Any,
) T_OutType[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

full_state_update : ClassVar[bool | None] = False
higher_is_better : ClassVar[bool | None] = None
is_differentiable : ClassVar[bool | None] = False
max_value : ClassVar[float] = inf
min_value : ClassVar[float] = -inf
reset() None[source]
update(
*args,
**kwargs,
) None[source]
class BERTScoreMRefs(
return_all_scores: True = True,
*,
model: str | Module = DEFAULT_BERT_SCORE_MODEL,
device: str | device | None = 'cuda_if_available',
batch_size: int | None = 32,
num_threads: int = 0,
max_length: int = 64,
reset_state: bool = True,
idf: bool = False,
reduction: 'mean' | 'max' | 'min' | Callable[[...], Tensor] = 'max',
filter_nan: bool = True,
verbose: int = 0,
)[source]
class BERTScoreMRefs(
return_all_scores: False,
*,
model: str | Module = DEFAULT_BERT_SCORE_MODEL,
device: str | device | None = 'cuda_if_available',
batch_size: int | None = 32,
num_threads: int = 0,
max_length: int = 64,
reset_state: bool = True,
idf: bool = False,
reduction: 'mean' | 'max' | 'min' | Callable[[...], Tensor] = 'max',
filter_nan: bool = True,
verbose: int = 0,
)

Bases: Generic[T_BERTScoreMRefsOut], AACMetric[T_BERTScoreMRefsOut]

BERTScore metric which supports multiple references.

The implementation is based on the bert_score implementation of torchmetrics.

For more information, see bert_score_mrefs().

compute() T_BERTScoreMRefsOut[source]
extra_repr() str[source]

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

full_state_update : ClassVar[bool | None] = False
get_output_names() tuple[str, ...][source]
higher_is_better : ClassVar[bool | None] = True
is_differentiable : ClassVar[bool | None] = False
max_value : ClassVar[float] = 1.0
min_value : ClassVar[float] = 0.0
reset() None[source]
update(
candidates: list[str],
mult_references: list[list[str]],
) None[source]
class BLEU(
return_all_scores: True = True,
*,
n: int = 4,
option: 'shortest' | 'average' | 'closest' = 'closest',
verbose: int = 0,
tokenizer: Callable[[str], list[str]] = str.split,
)[source]
class BLEU(
return_all_scores: False,
*,
n: int = 4,
option: 'shortest' | 'average' | 'closest' = 'closest',
verbose: int = 0,
tokenizer: Callable[[str], list[str]] = str.split,
)

Bases: Generic[T_BLEUOut], AACMetric[T_BLEUOut]

BiLingual Evaluation Understudy metric class.

For more information, see bleu().

compute() T_BLEUOut[source]
extra_repr() str[source]

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

full_state_update : ClassVar[bool | None] = False
get_output_names() tuple[str, ...][source]
higher_is_better : ClassVar[bool | None] = True
is_differentiable : ClassVar[bool | None] = False
max_value : ClassVar[float] = 1.0
min_value : ClassVar[float] = 0.0
reset() None[source]
update(
candidates: list[str],
mult_references: list[list[str]],
) None[source]
class BLEU1(return_all_scores: bool = True, option: ~typing.Literal['shortest', 'average', 'closest'] = 'closest', verbose: int = 0, tokenizer: ~typing.Callable[[str], list[str]] = <method 'split' of 'str' objects>)[source]

Bases: BLEU

class BLEU2(return_all_scores: bool = True, option: ~typing.Literal['shortest', 'average', 'closest'] = 'closest', verbose: int = 0, tokenizer: ~typing.Callable[[str], list[str]] = <method 'split' of 'str' objects>)[source]

Bases: BLEU

class BLEU3(return_all_scores: bool = True, option: ~typing.Literal['shortest', 'average', 'closest'] = 'closest', verbose: int = 0, tokenizer: ~typing.Callable[[str], list[str]] = <method 'split' of 'str' objects>)[source]

Bases: BLEU

class BLEU4(return_all_scores: bool = True, option: ~typing.Literal['shortest', 'average', 'closest'] = 'closest', verbose: int = 0, tokenizer: ~typing.Callable[[str], list[str]] = <method 'split' of 'str' objects>)[source]

Bases: BLEU

class CIDErD(
return_all_scores: True = True,
*,
n: int = 4,
sigma: float = 6.0,
tokenizer: Callable[[str], list[str]] = str.split,
return_tfidf: bool = False,
scale: float = 10.0,
)[source]
class CIDErD(
return_all_scores: False,
*,
n: int = 4,
sigma: float = 6.0,
tokenizer: Callable[[str], list[str]] = str.split,
return_tfidf: bool = False,
scale: float = 10.0,
)

Bases: Generic[T_CIDErDOut], AACMetric[T_CIDErDOut]

Consensus-based Image Description Evaluation metric class.

For more information, see cider_d().

compute() T_CIDErDOut[source]
extra_repr() str[source]

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

full_state_update : ClassVar[bool | None] = False
get_output_names() tuple[str, ...][source]
higher_is_better : ClassVar[bool | None] = True
is_differentiable : ClassVar[bool | None] = False
max_value : ClassVar[float] = 10.0
min_value : ClassVar[float] = 0.0
reset() None[source]
update(
candidates: list[str],
mult_references: list[list[str]],
) None[source]
class CLAPSim(
return_all_scores: True = True,
*,
clap_method: 'audio' | 'text' = 'text',
clap_model: str | CLAPWrapper = DEFAULT_CLAP_SIM_MODEL,
device: str | device | None = 'cuda_if_available',
batch_size: int | None = 32,
reset_state: bool = True,
seed: int | None = 42,
verbose: int = 0,
)[source]
class CLAPSim(
return_all_scores: False,
*,
clap_method: 'audio' | 'text' = 'text',
clap_model: str | CLAPWrapper = DEFAULT_CLAP_SIM_MODEL,
device: str | device | None = 'cuda_if_available',
batch_size: int | None = 32,
reset_state: bool = True,
seed: int | None = 42,
verbose: int = 0,
)

Bases: Generic[T_CLAPOut], AACMetric[T_CLAPOut]

Cosine-similarity of the Contrastive Language-Audio Pretraining (CLAP) embeddings.

The implementation is based on the msclap pypi package. Note: Instances of this class are not pickable.

For more information, see clap_sim().

compute() T_CLAPOut[source]
extra_repr() str[source]

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

full_state_update : ClassVar[bool | None] = False
get_output_names() tuple[str, ...][source]
higher_is_better : ClassVar[bool | None] = True
is_differentiable : ClassVar[bool | None] = False
max_value : ClassVar[float] = 1.0
min_value : ClassVar[float] = -1.0
reset() None[source]
update(
candidates: list[str],
mult_references_or_audio_paths: list[list[str]] | list[str],
) None[source]
class DCASE2023Evaluate(
preprocess: bool = True,
cache_path: str | Path | None = None,
java_path: str | Path | None = None,
tmp_path: str | Path | None = None,
device: str | device | None = 'cuda_if_available',
verbose: int = 0,
)[source]

Bases: Evaluate

Evaluate candidates with multiple references with DCASE2023 Audio Captioning metrics.

For more information, see dcase2023_evaluate().

class DCASE2024Evaluate(
preprocess: bool = True,
cache_path: str | Path | None = None,
java_path: str | Path | None = None,
tmp_path: str | Path | None = None,
device: str | device | None = 'cuda_if_available',
verbose: int = 0,
)[source]

Bases: Evaluate

Evaluate candidates with multiple references with DCASE2024 Audio Captioning metrics.

For more information, see dcase2024_evaluate().

class Evaluate(
preprocess: bool | Callable[[list[str]], list[str]] = True,
metrics: str | Iterable[str] | Iterable[AACMetric] = 'default',
cache_path: str | Path | None = None,
java_path: str | Path | None = None,
tmp_path: str | Path | None = None,
device: str | device | None = 'cuda_if_available',
verbose: int = 0,
)[source]

Bases: list[AACMetric], AACMetric[tuple[dict[str, Tensor], dict[str, Tensor]]]

Evaluate candidates with multiple references with custom metrics.

For more information, see evaluate().

compute() tuple[dict[str, Tensor], dict[str, Tensor]][source]
full_state_update : ClassVar[bool | None] = False
higher_is_better : ClassVar[bool | None] = None
is_differentiable : ClassVar[bool | None] = False
reset() None[source]
tolist() list[AACMetric][source]
update(
candidates: list[str],
mult_references: list[list[str]],
) None[source]
class FENSE(
return_all_scores: bool = True,
*,
sbert_model: str | SentenceTransformer = 'paraphrase-TinyBERT-L6-v2',
echecker: str | BERTFlatClassifier = 'echecker_clotho_audiocaps_base',
echecker_tokenizer: AutoTokenizer | None = None,
error_threshold: float = 0.9,
device: str | device | None = 'cuda_if_available',
batch_size: int | None = 32,
reset_state: bool = True,
return_probs: bool = False,
penalty: float = 0.9,
verbose: int = 0,
)[source]

Bases: AACMetric[tuple[FENSEScores, FENSEScores] | Tensor]

Fluency ENhanced Sentence-bert Evaluation (FENSE)

For more information, see fense().

compute() tuple[FENSEScores, FENSEScores] | Tensor[source]
extra_repr() str[source]

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

full_state_update : ClassVar[bool | None] = False
get_output_names() tuple[str, ...][source]
higher_is_better : ClassVar[bool | None] = True
is_differentiable : ClassVar[bool | None] = False
max_value : ClassVar[float] = 1.0
min_value : ClassVar[float] = -1.0
reset() None[source]
update(
candidates: list[str],
mult_references: list[list[str]],
) None[source]
class FER(
return_all_scores: bool = True,
*,
echecker: str | BERTFlatClassifier = 'echecker_clotho_audiocaps_base',
error_threshold: float = 0.9,
device: str | device | None = 'cuda_if_available',
batch_size: int | None = 32,
reset_state: bool = True,
return_probs: bool = False,
verbose: int = 0,
)[source]

Bases: AACMetric[tuple[FERScores, FERScores] | Tensor]

Return Fluency Error Rate (FER) detected by a pre-trained BERT model.

For more information, see fer().

compute() tuple[FERScores, FERScores] | Tensor[source]
extra_repr() str[source]

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

full_state_update : ClassVar[bool | None] = False
get_output_names() tuple[str, ...][source]
higher_is_better : ClassVar[bool | None] = False
is_differentiable : ClassVar[bool | None] = False
max_value : ClassVar[float] = 1.0
min_value : ClassVar[float] = -1.0
reset() None[source]
update(
candidates: list[str],
*args,
**kwargs,
) None[source]
class MACE(
return_all_scores: bool = True,
*,
mace_method: 'text' | 'audio' | 'combined' = 'text',
penalty: float = 0.3,
clap_model: str | CLAPWrapper = 'MS-CLAP-2023',
seed: int | None = 42,
echecker: str | BERTFlatClassifier = 'echecker_clotho_audiocaps_base',
echecker_tokenizer: AutoTokenizer | None = None,
error_threshold: float = 0.97,
device: str | device | None = 'cuda_if_available',
batch_size: int | None = 32,
reset_state: bool = True,
return_probs: bool = False,
verbose: int = 0,
)[source]

Bases: AACMetric[tuple[MACEScores, MACEScores] | Tensor]

Multimodal Audio-Caption Evaluation class (MACE).

MACE is a metric designed for evaluating automated audio captioning (AAC) systems. Unlike metrics that compare machine-generated captions solely to human references, MACE uses both audio and text to improve evaluation. By integrating both audio and text, it produces assessments that align better with human judgments.

The implementation is based on the mace original implementation (original author have accepted to include their code in aac-metrics under the MIT license). Note: Instances of this class are not pickable.

For more information, see mace().

compute() tuple[MACEScores, MACEScores] | Tensor[source]
extra_repr() str[source]

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

full_state_update : ClassVar[bool | None] = False
get_output_names() tuple[str, ...][source]
higher_is_better : ClassVar[bool | None] = True
is_differentiable : ClassVar[bool | None] = False
max_value : ClassVar[float] = 1.0
min_value : ClassVar[float] = -1.0
reset() None[source]
update(
candidates: list[str],
mult_references: list[list[str]] | None = None,
audio_paths: list[str] | None = None,
) None[source]
class METEOR(
return_all_scores: bool = True,
*,
cache_path: str | Path | None = None,
java_path: str | Path | None = None,
java_max_memory: str = '2G',
language: 'en' | 'cz' | 'de' | 'es' | 'fr' = 'en',
use_shell: bool | None = None,
params: Iterable[float] | None = None,
weights: Iterable[float] | None = None,
verbose: int = 0,
)[source]

Bases: AACMetric[tuple[METEORScores, METEORScores] | Tensor]

Metric for Evaluation of Translation with Explicit ORdering metric class.

For more information, see meteor().

compute() tuple[METEORScores, METEORScores] | Tensor[source]
extra_repr() str[source]

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

full_state_update : ClassVar[bool | None] = False
get_output_names() tuple[str, ...][source]
higher_is_better : ClassVar[bool | None] = True
is_differentiable : ClassVar[bool | None] = False
max_value : ClassVar[float] = 1.0
min_value : ClassVar[float] = 0.0
reset() None[source]
update(
candidates: list[str],
mult_references: list[list[str]],
) None[source]
class ROUGEL(return_all_scores: bool = True, *, beta: float = 1.2, tokenizer: ~typing.Callable[[str], list[str]] = <method 'split' of 'str' objects>)[source]

Bases: AACMetric[tuple[ROUGELScores, ROUGELScores] | Tensor]

Recall-Oriented Understudy for Gisting Evaluation class.

For more information, see rouge_l().

compute() tuple[ROUGELScores, ROUGELScores] | Tensor[source]
extra_repr() str[source]

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

full_state_update : ClassVar[bool | None] = False
get_output_names() tuple[str, ...][source]
higher_is_better : ClassVar[bool | None] = True
is_differentiable : ClassVar[bool | None] = False
max_value : ClassVar[float] = 1.0
min_value : ClassVar[float] = 0.0
reset() None[source]
update(
candidates: list[str],
mult_references: list[list[str]],
) None[source]
class SBERTSim(
return_all_scores: bool = True,
*,
sbert_model: str | SentenceTransformer = 'paraphrase-TinyBERT-L6-v2',
device: str | device | None = 'cuda_if_available',
batch_size: int | None = 32,
reset_state: bool = True,
verbose: int = 0,
)[source]

Bases: AACMetric[tuple[SBERTSimScores, SBERTSimScores] | Tensor]

Cosine-similarity of the Sentence-BERT embeddings.

For more information, see sbert().

compute() tuple[SBERTSimScores, SBERTSimScores] | Tensor[source]
extra_repr() str[source]

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

full_state_update : ClassVar[bool | None] = False
get_output_names() tuple[str, ...][source]
higher_is_better : ClassVar[bool | None] = True
is_differentiable : ClassVar[bool | None] = False
max_value : ClassVar[float] = 1.0
min_value : ClassVar[float] = -1.0
reset() None[source]
update(
candidates: list[str],
mult_references: list[list[str]],
) None[source]
class SPICE(
return_all_scores: bool = True,
*,
cache_path: str | Path | None = None,
java_path: str | Path | None = None,
tmp_path: str | Path | None = None,
n_threads: int | None = None,
java_max_memory: str = '8G',
timeout: None | int | Iterable[int] = None,
separate_cache_dir: bool = True,
use_shell: bool | None = None,
verbose: int = 0,
)[source]

Bases: AACMetric[tuple[SPICEScores, SPICEScores] | Tensor]

Semantic Propositional Image Caption Evaluation class.

For more information, see spice().

compute() tuple[SPICEScores, SPICEScores] | Tensor[source]
extra_repr() str[source]

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

full_state_update : ClassVar[bool | None] = False
get_output_names() tuple[str, ...][source]
higher_is_better : ClassVar[bool | None] = True
is_differentiable : ClassVar[bool | None] = False
max_value : ClassVar[float] = 1.0
min_value : ClassVar[float] = 0.0
reset() None[source]
update(
candidates: list[str],
mult_references: list[list[str]],
) None[source]
class SPIDEr(
return_all_scores: bool = True,
*,
n: int = 4,
sigma: float = 6.0,
cache_path: str | Path | None = None,
java_path: str | Path | None = None,
tmp_path: str | Path | None = None,
n_threads: int | None = None,
java_max_memory: str = '8G',
timeout: None | int | Iterable[int] = None,
verbose: int = 0,
)[source]

Bases: AACMetric[tuple[SPIDErScores, SPIDErScores] | Tensor]

SPIDEr class.

For more information, see spider().

compute() tuple[SPIDErScores, SPIDErScores] | Tensor[source]
extra_repr() str[source]

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

full_state_update : ClassVar[bool | None] = False
get_output_names() tuple[str, ...][source]
higher_is_better : ClassVar[bool | None] = True
is_differentiable : ClassVar[bool | None] = False
max_value : ClassVar[float] = 5.5
min_value : ClassVar[float] = 0.0
reset() None[source]
update(
candidates: list[str],
mult_references: list[list[str]],
) None[source]
class SPIDErFL(
return_all_scores: bool = True,
*,
n: int = 4,
sigma: float = 6.0,
cache_path: str | Path | None = None,
java_path: str | Path | None = None,
tmp_path: str | Path | None = None,
n_threads: int | None = None,
java_max_memory: str = '8G',
timeout: None | int | Iterable[int] = None,
echecker: str | BERTFlatClassifier = 'echecker_clotho_audiocaps_base',
echecker_tokenizer: AutoTokenizer | None = None,
error_threshold: float = 0.9,
device: str | device | None = 'cuda_if_available',
batch_size: int | None = 32,
reset_state: bool = True,
return_probs: bool = True,
penalty: float = 0.9,
verbose: int = 0,
)[source]

Bases: AACMetric[tuple[SPIDErFLScores, SPIDErFLScores] | Tensor]

SPIDErFL class.

For more information, see spider_fl().

compute() tuple[SPIDErFLScores, SPIDErFLScores] | Tensor[source]
extra_repr() str[source]

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

full_state_update : ClassVar[bool | None] = False
get_output_names() tuple[str, ...][source]
higher_is_better : ClassVar[bool | None] = True
is_differentiable : ClassVar[bool | None] = False
max_value : ClassVar[float] = 5.5
min_value : ClassVar[float] = 0.0
reset() None[source]
update(
candidates: list[str],
mult_references: list[list[str]],
) None[source]
class SPIDErMax(
return_all_scores: bool = True,
*,
return_all_cands_scores: bool = False,
n: int = 4,
sigma: float = 6.0,
cache_path: str | Path | None = None,
java_path: str | Path | None = None,
tmp_path: str | Path | None = None,
n_threads: int | None = None,
java_max_memory: str = '8G',
timeout: None | int | Iterable[int] = None,
verbose: int = 0,
)[source]

Bases: AACMetric[tuple[SPIDErMaxScores, SPIDErMaxScores] | Tensor]

SPIDEr-max class.

For more information, see spider().

compute() tuple[SPIDErMaxScores, SPIDErMaxScores] | Tensor[source]
extra_repr() str[source]

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

full_state_update : ClassVar[bool | None] = False
get_output_names() tuple[str, ...][source]
higher_is_better : ClassVar[bool | None] = True
is_differentiable : ClassVar[bool | None] = False
max_value : ClassVar[float] = 5.5
min_value : ClassVar[float] = 0.0
reset() None[source]
update(
mult_candidates: list[list[str]],
mult_references: list[list[str]],
) None[source]
class Vocab(return_all_scores: bool = True, *, seed: None | int | ~torch._C.Generator = 1234, tokenizer: ~typing.Callable[[str], list[str]] = <method 'split' of 'str' objects>, dtype: ~torch.dtype = torch.float64, pop_strategy: ~typing.Literal['max', 'min'] | int = 'max', verbose: int = 0)[source]

Bases: AACMetric[tuple[VocabScores, VocabScores] | Tensor]

VocabStats class.

For more information, see vocab().

compute() tuple[VocabScores, VocabScores] | Tensor[source]
full_state_update : ClassVar[bool | None] = False
get_output_names() tuple[str, ...][source]
higher_is_better : ClassVar[bool | None] = None
is_differentiable : ClassVar[bool | None] = False
max_value : ClassVar[float] = inf
min_value : ClassVar[float] = 0.0
reset() None[source]
update(
candidates: list[str],
mult_references: list[list[str]] | None = None,
) None[source]
dcase2023_evaluate(
candidates: list[str],
mult_references: list[list[str]],
preprocess: bool | Callable[[list[str]], list[str]] = True,
cache_path: str | Path | None = None,
java_path: str | Path | None = None,
tmp_path: str | Path | None = None,
device: str | device | None = 'cuda_if_available',
verbose: int = 0,
) tuple[dict[str, Tensor], dict[str, Tensor]][source]

Evaluate candidates with multiple references with the DCASE2023 Audio Captioning metrics.

Parameters:
candidates: list[str]

The list of sentences to evaluate.

mult_references: list[list[str]]

The list of list of sentences used as target.

preprocess: bool | Callable[[list[str]], list[str]] = True

If True, the candidates and references will be passed as input to the PTB stanford tokenizer before computing metrics. defaults to True.

cache_path: str | Path | None = None

The path to the external code directory. defaults to the value returned by get_default_cache_path().

java_path: str | Path | None = None

The path to the java executable. defaults to the value returned by get_default_java_path().

tmp_path: str | Path | None = None

Temporary directory path. defaults to the value returned by get_default_tmp_path().

device: str | device | None = 'cuda_if_available'

The PyTorch device used to run FENSE and SPIDErFL models. If None, it will try to detect use cuda if available. defaults to “cuda_if_available”.

verbose: int = 0

The verbose level. defaults to 0.

Returns:

A tuple contains the corpus and sentences scores.

dcase2024_evaluate(
candidates: list[str],
mult_references: list[list[str]],
preprocess: bool | Callable[[list[str]], list[str]] = True,
cache_path: str | Path | None = None,
java_path: str | Path | None = None,
tmp_path: str | Path | None = None,
device: str | device | None = 'cuda_if_available',
verbose: int = 0,
) tuple[dict[str, Tensor], dict[str, Tensor]][source]

Evaluate candidates with multiple references with the DCASE2024 Audio Captioning metrics.

Parameters:
candidates: list[str]

The list of sentences to evaluate.

mult_references: list[list[str]]

The list of list of sentences used as target.

preprocess: bool | Callable[[list[str]], list[str]] = True

If True, the candidates and references will be passed as input to the PTB stanford tokenizer before computing metrics. defaults to True.

cache_path: str | Path | None = None

The path to the external code directory. defaults to the value returned by get_default_cache_path().

java_path: str | Path | None = None

The path to the java executable. defaults to the value returned by get_default_java_path().

tmp_path: str | Path | None = None

Temporary directory path. defaults to the value returned by get_default_tmp_path().

device: str | device | None = 'cuda_if_available'

The PyTorch device used to run FENSE and SPIDErFL models. If None, it will try to detect use cuda if available. defaults to “cuda_if_available”.

verbose: int = 0

The verbose level. defaults to 0.

Returns:

A tuple contains the corpus and sentences scores.

evaluate(
candidates: list[str],
mult_references: list[list[str]],
preprocess: bool | Callable[[list[str]], list[str]] = True,
metrics: str | Iterable[str] | Iterable[Callable[[list, list], tuple]] = 'default',
cache_path: str | Path | None = None,
java_path: str | Path | None = None,
tmp_path: str | Path | None = None,
device: str | device | None = 'cuda_if_available',
verbose: int = 0,
) tuple[dict[str, Tensor], dict[str, Tensor]][source]

Evaluate candidates with multiple references with custom metrics.

Parameters:
candidates: list[str]

The list of sentences to evaluate.

mult_references: list[list[str]]

The list of list of sentences used as target.

preprocess: bool | Callable[[list[str]], list[str]] = True

If True, the candidates and references will be passed as input to the PTB stanford tokenizer before computing metrics. defaults to True.

metrics: str | Iterable[str] | Iterable[Callable[[list, list], tuple]] = 'default'

The name of the metric list or the explicit list of metrics to compute. defaults to “default”.

cache_path: str | Path | None = None

The path to the external code directory. defaults to the value returned by get_default_cache_path().

java_path: str | Path | None = None

The path to the java executable. defaults to the value returned by get_default_java_path().

tmp_path: str | Path | None = None

Temporary directory path. defaults to the value returned by get_default_tmp_path().

device: str | device | None = 'cuda_if_available'

The PyTorch device used to run FENSE and SPIDErFL models. If None, it will try to detect use cuda if available. defaults to “cuda_if_available”.

verbose: int = 0

The verbose level. defaults to 0.

Returns:

A tuple contains the corpus and sentences scores.

get_default_cache_path() str[source]

Returns the default cache directory path.

If set_default_cache_path() has been used before with a string argument, it will return the value given to this function. Else if the environment variable AAC_METRICS_CACHE_PATH has been set to a string, it will return its value. Else it will be equal to “~/.cache” by default.

get_default_java_path() str[source]

Returns the default java executable path.

If set_default_java_path() has been used before with a string argument, it will return the value given to this function. Else if the environment variable AAC_METRICS_JAVA_PATH has been set to a string, it will return its value. Else it will be equal to “java” by default.

get_default_tmp_path() str[source]

Returns the default temporary directory path.

If set_default_tmp_path() has been used before with a string argument, it will return the value given to this function. Else if the environment variable AAC_METRICS_TMP_PATH has been set to a string, it will return its value. Else it will be equal to the value returned by gettempdir() by default.

list_metrics_available() list[str]

Returns the list of metrics that can be loaded from its name.

load_metric(
name: str,
**kwargs,
) AACMetric

Load a metric class by name.

Parameters:
name: str

The name of the metric.

**kwargs

The optional keyword arguments passed to the metric factory.

Returns:

The Metric object built.

set_default_cache_path(
cache_path: str | Path | None,
) None[source]

Override default cache directory path.

set_default_java_path(
java_path: str | Path | None,
) None[source]

Override default java executable path.

set_default_tmp_path(
tmp_path: str | Path | None,
) None[source]

Override default temporary directory path.

Subpackages

Submodules