Metrics¶

The binary_metrics functions compute the 10 most commonly used metrics:

Negative precision aka Negative Predictive Value (NPV)
Positive recision aka Positive Predictive Value (PPV)
Negative recall aka True Negative Rate (TNR) aka Specificity
Positive recall aka True Positive Rate (TPR) aka Sensitivity
Negative f1 score
Positive f1 score
False Positive Rate (FPR)
False Negative Rate (FNR)
Accuracy
Mathew’s Correlation Coefficient (MCC)

Most other metrics should be computable from these.

mmu.auto_thresholds(scores, max_steps=None, epsilon=None, seed=None)¶

Determine the thresholds s.t. each threshold results in a different confusion matrix.

The thresholds can be subsampled by setting max_steps. The points are sampled where regions with small difference between score are oversampled.

Parameters:

scores (np.ndarray[float32/float64]) – the classifier scores
max_steps (int, double, default=None) – limit the number of steps in the threshold. Default is None which does not limit the number of thresholds
epsilon (float, default=None) – the minimum difference between two scores for them to be considered different. If None the machine precision is used for the dtype scores
seed (int, default=None) – seed to use when subsampeling, ignored when max_steps is None or when the number of unique thresholds is smaller than max_steps.

Returns:

thresholds – the thresholds that result in different confusion matrices. Length of which is at most the number of elements in scores or max_steps

Return type:

np.ndarray[float64]

mmu.binary_metrics(y, yhat=None, scores=None, threshold=None, fill=1.0, return_df=False)¶

Compute binary classification metrics.

bmetrics is an alias for this function.

Computes the following metrics where [i] indicates the i’th value in the array.

[0] neg.precision aka Negative Predictive Value (NPV)

[1] pos.precision aka Positive Predictive Value (PPV)

[2] neg.recall aka True Negative Rate (TNR) aka Specificity

[3] pos.recall aka True Positive Rate (TPR) aka Sensitivity

[4] neg.f1 score

[5] pos.f1 score

[6] False Positive Rate (FPR)

[7] False Negative Rate (FNR)

[8] Accuracy

[9] MCC

Parameters:

y (np.ndarray[bool, int32, int64, float32, float64]) – true labels for observations
yhat (np.ndarray[bool, int32, int64, float32, float64], default=None) – the predicted labels, the same dtypes are supported as y. Can be None if scores is not None, if both are provided, scores is ignored.
scores (np.ndarray[float32, float64], default=None) – the classifier scores to be evaluated against the threshold, i.e. yhat = scores >= threshold. Can be None if yhat is not None, if both are provided, this parameter is ignored.
threshold (float, default=0.5) – the classification threshold to which the classifier scores is evaluated, is inclusive.
fill (float, default=1.0) – value to fill when a metric is not defined, e.g. divide by zero.
return_df (bool, default=False) – return confusion matrix as pd.DataFrame

Returns:

confusion_matrix (np.ndarray, pd.DataFrame) – the confusion_matrix with layout [0, 0] = TN, [0, 1] = FP, [1, 0] = FN, [1, 1] = TP
metrics (np.ndarray, pd.DataFrame) – the computed metrics

mmu.binary_metrics_thresholds(y, scores, thresholds, fill=1.0, return_df=False)¶

Compute binary classification metrics over multiple thresholds.

bmetrics_thresh is an alias for this function.

Computes the following metrics where [i] indicates the i’th column in the array.

[0] neg.precision aka Negative Predictive Value (NPV)

[1] pos.precision aka Positive Predictive Value (PPV)

[2] neg.recall aka True Negative Rate (TNR) aka Specificity

[3] pos.recall aka True Positive Rate (TPR) aka Sensitivity

[4] neg.f1 score

[5] pos.f1 score

[6] False Positive Rate (FPR)

[7] False Negative Rate (FNR)

[8] Accuracy

[9] MCC

Parameters:

y (np.ndarray[bool, int32, int64, float32, float64]) – true labels for observations
yhat (np.ndarray[bool, int32, int64, float32, float64], default=None) – the predicted labels, the same dtypes are supported as y. Can be None if scores is not None, if both are provided, scores is ignored.
scores (np.ndarray[float32, float64], default=None) – the classifier scores to be evaluated against the threshold, i.e. yhat = scores >= threshold. Can be None if yhat is not None, if both are provided, this parameter is ignored.
thresholds (np.ndarray[float32, float64]) – the classification thresholds for which the classifier scores is evaluated, is inclusive.
fill (float, default=1.0) – value to fill when a metric is not defined, e.g. divide by zero.
return_df (bool, default=False) – return the metrics confusion matrix and metrics as a DataFrame

Returns:

conf_mat (np.ndarray, pd.DataFrame) – the confusion_matrices where the rows contain the counts for a threshold, [i, 0] = TN, [i, 1] = FP, [i, 2] = FN, [i, 3] = TP
metrics (np.ndarray, pd.DataFrame) – the computed metrics where the rows contain the metrics for a single threshold

mmu.binary_metrics_confusion_matrix(conf_mat, fill=1.0, return_df=False)¶

Compute binary classification metrics.

bmetrics_conf_mat is an alias for this function.

Computes the following metrics where [i] indicates the i’th value in the array.

[0] neg.precision aka Negative Predictive Value (NPV)

[1] pos.precision aka Positive Predictive Value (PPV)

[2] neg.recall aka True Negative Rate (TNR) aka Specificity

[3] pos.recall aka True Positive Rate (TPR) aka Sensitivity

[4] neg.f1 score

[5] pos.f1 score

[6] False Positive Rate (FPR)

[7] False Negative Rate (FNR)

[8] Accuracy

[9] MCC

Parameters:

conf_mat (np.ndarray[int32, int64],) – confusion_matrix as returned by mmu.confusion_matrix
fill (float, default=1.0) – value to fill when a metric is not defined, e.g. divide by zero.
return_df (bool, default=False) – return the metrics confusion matrix and metrics as a DataFrame

Returns:

metrics – the computed metrics

Return type:

np.ndarray, pd.DataFrame

mmu.binary_metrics_confusion_matrices(conf_mat, fill=1.0, return_df=False)¶

Compute binary classification metrics.

bmetrics_conf_mats is an alias for this function.

Computes the following metrics where [i] indicates the i’th value in the array.

[0] neg.precision aka Negative Predictive Value (NPV)

[1] pos.precision aka Positive Predictive Value (PPV)

[2] neg.recall aka True Negative Rate (TNR) aka Specificity

[3] pos.recall aka True Positive Rate (TPR) aka Sensitivity

[4] neg.f1 score

[5] pos.f1 score

[6] False Positive Rate (FPR)

[7] False Negative Rate (FNR)

[8] Accuracy

[9] MCC

Parameters:

conf_mat (np.ndarray[int32, int64],) – confusion_matrix as returned by mmu.confusion_matrices, should have shape (N, 4) and be C-Contiguous
fill (float, default=1.0) – value to fill when a metric is not defined, e.g. divide by zero.
return_df (bool, default=False) – return the metrics confusion matrix and metrics as a DataFrame

Returns:

metrics – the computed metrics

Return type:

np.ndarray, pd.DataFrame

mmu.binary_metrics_runs(y, yhat=None, scores=None, threshold=None, obs_axis=0, fill=1.0, return_df=False)¶

Compute binary classification metrics over multiple runs.

bmetrics_runs is an alias for this function.

Computes the following metrics where [i] indicates the i’th column in the array.

[0] neg.precision aka Negative Predictive Value (NPV)

[1] pos.precision aka Positive Predictive Value (PPV)

[2] neg.recall aka True Negative Rate (TNR) aka Specificity

[3] pos.recall aka True Positive Rate (TPR) aka Sensitivity

[4] neg.f1 score

[5] pos.f1 score

[6] False Positive Rate (FPR)

[7] False Negative Rate (FNR)

[8] Accuracy

[9] MCC

Parameters:

y (np.ndarray[bool, int32, int64, float32, float64]) – true labels for observations, should have shape (N, K) for K runs each consisting of N observations if obs_axis
yhat (np.ndarray[bool, int32, int64, float32, float64], default=None) – the predicted labels, the same dtypes are supported as y. Can be None if scores is not None, if both are provided, scores is ignored. yhat shape must be compatible with y.
scores (np.ndarray[float32, float64], default=None) – the classifier scores to be evaluated against the threshold, i.e. yhat = scores >= threshold. Can be None if yhat is not None, if both are provided, this parameter is ignored. scores shape must be compatible with y.
thresholds (np.ndarray[float32, float64]) – the classification thresholds for which the classifier scores is evaluated, is inclusive.
obs_axis (int, default=0) – the axis containing the observations for a single run, e.g. 0 when the labels and scoress are stored as columns
fill (float, default=1.0) – value to fill when a metric is not defined, e.g. divide by zero.
return_df (bool, default=False) – return the metrics confusion matrix and metrics as a DataFrame

Returns:

conf_mat (np.ndarray, pd.DataFrame) – the confusion_matrices where the rows contain the counts for a run, [i, 0] = TN, [i, 1] = FP, [i, 2] = FN, [i, 3] = TP
metrics (np.ndarray, pd.DataFrame) – the computed metrics where the rows contain the metrics for a single run

mmu.binary_metrics_runs_thresholds(y, scores, thresholds, n_obs=None, fill=1.0, obs_axis=0)¶

Compute binary classification metrics over runs and thresholds.

bmetrics_runs_thresh is an alias for this function.

Computes the following metrics where [i] indicates the i’th column in the array.

[0] neg.precision aka Negative Predictive Value (NPV)

[1] pos.precision aka Positive Predictive Value (PPV)

[2] neg.recall aka True Negative Rate (TNR) aka Specificity

[3] pos.recall aka True Positive Rate (TPR) aka Sensitivity

[4] neg.f1 score

[5] pos.f1 score

[6] False Positive Rate (FPR)

[7] False Negative Rate (FNR)

[8] Accuracy

[9] MCC

Parameters:

y (np.ndarray[bool, int32, int64, float32, float64]) – the ground truth labels, if different runs have different number of observations the n_obs parameter must be set to avoid computing metrics of the filled values. If y is one dimensional and scores is not the y values are assumed to be the same for each run.
scores (np.array[float32, float64]) – the classifier scoress, if different runs have different number of observations the n_obs parameter must be set to avoid computing metrics of the filled values.
thresholds (np.array[float32, float64]) – classification thresholds
n_obs (np.array[int64], default=None) – the number of observations per run, if None the same number of observations are assumed exist for each run.
fill (double, default=1.0) – value to fill when a metric is not defined, e.g. divide by zero.
obs_axis ({0, 1}, default=0) – 0 if the observations for a single run is a column (e.g. from pd.DataFrame) and 1 otherwhise

Returns:

conf_mat (np.ndarray[int64]) – 3D array where the rows contain the counts for a threshold, the columns the confusion matrix entries and the slices the counts for a run
metrics (np.ndarray[float64]) – 3D array where the first axis is the threshold, the second the metrics and the third the run

mmu.confusion_matrix(y, yhat=None, scores=None, threshold=0.5, return_df=False)¶

Compute binary confusion matrix.

conf_mat is alias for this function.

Parameters:

y (np.ndarray[bool, int32, int64, float32, float64]) – true labels for observations, supported dtypes are
yhat (np.ndarray[bool, int32, int64, float32, float64], default=None) – the predicted labels, the same dtypes are supported as y. Can be None if scores is not None, if both are provided, scores is ignored.
scores (np.ndarray[float32, float64], default=None) – the classifier scores to be evaluated against the threshold, i.e. yhat = scores >= threshold. Can be None if yhat is not None, if both are provided, this parameter is ignored. Supported dtypes are float32 and float64.
threshold (float, default=0.5) – the classification threshold to which the classifier scores is evaluated, is inclusive.
return_df (bool, default=False) – return confusion matrix as pd.DataFrame

Raises:

TypeError – if both scores and yhat are None
TypeError – if scores is not None and threshold is not a float

Returns:

conf_mat – the confusion_matrix with layout [0, 0] = TN, [0, 1] = FP, [1, 0] = FN, [1, 1] = TP Returned as dataframe when return_df is True

Return type:

np.ndarray, pd.DataFrame

mmu.confusion_matrices(y, yhat=None, scores=None, threshold=0.5, obs_axis=0, return_df=False)¶

Compute binary confusion matrices over multiple runs.

conf_mats is alias for this function.

Parameters:

y (np.ndarray[bool, int32, int64, float32, float64]) – true labels for observations
yhat (np.ndarray[bool, int32, int64, float32, float64], default=None) – the predicted labels, the same dtypes are supported as y. Can be None if scores is not None, if both are provided, scores is ignored.
scores (np.ndarray[float32, float64], default=None) – the classifier scores to be evaluated against the threshold, i.e. yhat = scores >= threshold. Can be None if yhat is not None, if both are provided, this parameter is ignored.
threshold (float, default=0.5) – the classification threshold to which the classifier scores is evaluated, is inclusive.
obs_axis (int, default=0) – the axis containing the observations for a single run, e.g. 0 when the labels and scoress are stored as columns
return_df (bool, default=False) – return confusion matrix as pd.DataFrame

Raises:

TypeError – if both scores and yhat are None
TypeError – if scores is not None and threshold is not a float

Returns:

the confusion_matrices where the rows contain the counts for the runs [i, 0] = TN, [i, 1] = FP, [i, 2] = FN, [i, 3] = TP

Return type:

confusion_matrices np.ndarray, pd.DataFrame

mmu.confusion_matrices_thresholds(y, scores, thresholds, return_df=False)¶

Compute binary confusion matrix over a range of thresholds.

conf_mats_thresh is an alias for this function.

Parameters:

y (np.ndarray[bool, int32, int64, float32, float64]) – true labels for observations
scores (np.ndarray[float32, float64]) – the classifier scores to be evaluated against the threshold, i.e. yhat = scores >= threshold.
thresholds (np.ndarray[float32, float64], default=None) – the classification thresholds for which the classifier scores is evaluated, is inclusive.
return_df (bool, default=False) – return the metrics confusion matrix and metrics as a DataFrame

Returns:

confusion_matrix – the confusion_matrices where the rows contain the counts for a threshold, [i, 0] = TN, [i, 1] = FP, [i, 2] = FN, [i, 3] = TP

Return type:

np.ndarray, pd.DataFrame

mmu.confusion_matrices_runs_thresholds(y, scores, thresholds, n_obs=None, fill=0.0, obs_axis=0)¶

Compute confusion matrices over runs and thresholds.

conf_mats_runs_thresh is an alias for this function.

Parameters:

y (np.ndarray[bool, int32, int64, float32, float64]) – the ground truth labels, if different runs have different number of observations the n_obs parameter must be set to avoid computing metrics of the filled values. If y is one dimensional and scores is not the y values are assumed to be the same for each run.
scores (np.array[float32, float64]) – the classifier scoress, if different runs have different number of observations the n_obs parameter must be set to avoid computing metrics of the filled values.
thresholds (np.array[float32, float64]) – classification thresholds
n_obs (np.array[int64], default=None) – the number of observations per run, if None the same number of observations are assumed exist for each run.
fill (double) – value to fill when a metric is not defined, e.g. divide by zero.
obs_axis ({0, 1}, default=0) – 0 if the observations for a single run is a column (e.g. from pd.DataFrame) and 1 otherwhise

Returns:

conf_mat – 3D array where the rows contain the counts for a threshold, the columns the confusion matrix entries and the slices the counts for a run

Return type:

np.ndarray[int64]

mmu.precision_recall(y, yhat=None, scores=None, threshold=None, fill=1.0, return_df=False)¶

Compute precision and recall.

Parameters:

y (np.ndarray[bool, int32, int64, float32, float64]) – true labels for observations
yhat (np.ndarray[bool, int32, int64, float32, float64], default=None) – the predicted labels, the same dtypes are supported as y. Can be None if scores is not None, if both are provided, scores is ignored.
scores (np.ndarray[float32, float64], default=None) – the classifier scores to be evaluated against the threshold, i.e. yhat = scores >= threshold. Can be None if yhat is not None, if both are provided, this parameter is ignored.
threshold (float, default=0.5) – the classification threshold to which the classifier scores is evaluated, is inclusive.
fill (float, default=1.0) – value to fill when a metric is not defined, e.g. divide by zero.
return_df (bool, default=False) – return confusion matrix as pd.DataFrame

Returns:

confusion_matrix (np.ndarray, pd.DataFrame) – the confusion_matrix with layout [0, 0] = TN, [0, 1] = FP, [1, 0] = FN, [1, 1] = TP
prec_rec (np.ndarray, pd.DataFrame) – precision and recall

mmu.precision_recall_curve(y, scores, thresholds=None, fill=1.0, return_df=False)¶

Compute precision and recall over the thresholds.

pr_curve is an alias for this function.

Parameters:

y (np.ndarray[bool, int32, int64, float32, float64]) – true labels for observations
scores (np.ndarray[float32, float64]) – the classifier scores to be evaluated against the threshold, i.e. yhat = scores >= threshold
threshold (np.ndarray[float32, float64]) – the classification thresholds to which the classifier scores is evaluated, is inclusive.
fill (float, default=1.0) – value to fill when a metric is not defined, e.g. divide by zero.
return_df (bool, default=False) – return confusion matrix as pd.DataFrame

Returns:

precision (np.ndarray[float64]) – the precision for each threshold
recall (np.ndarray[float64]) – the recall for each threshold

mmu.metrics.confusion_matrix_to_dataframe(conf_mat)¶

Create dataframe with confusion matrix.

Parameters:: conf_mat (np.ndarray) – array containing a single confusion matrix
Returns:: the confusion matrix
Return type:: pd.DataFrame

mmu.metrics.confusion_matrices_to_dataframe(conf_mat)¶

Create dataframe with confusion matrix.

Parameters:: conf_mat (np.ndarray) – array containing multiple confusion matrices as an (N, 4) array
Returns:: the confusion matrix
Return type:: pd.DataFrame

mmu.metrics.metrics_to_dataframe(metrics, metric_names=None)¶

Return DataFrame with metrics.

Parameters:

metrics (np.ndarray) – metrics where the rows are the metrics for various runs or classification thresholds and the columns are the metrics.
metric_names (str, list[str], default=None) – if you computed a subset of the metrics you should set the column names here

Returns:

the metrics as a DataFrame

Return type:

pd.DataFrame