ROC¶
y-axis: True Positive Rate (TPR) i.e. Recall
x-axis: False Positive Rate (FPR)
- class mmu.ROCU¶
- class mmu.ROCUncertainty¶
Bases:
mmu.methods.pointbase.BaseUncertainty
Compute joint uncertainty for a point like Precision-Recall or TPR-FPR.
The joint statistical uncertainty can be computed using:
Multinomial method:
Model’s the uncertainty using profile log-likelihoods between the observed and most conservative confusion matrix for that point. Unlike the Bivariate-Normal/Elliptical approach, this approach is valid for relatively low statistic samples and at the edges of the curve. However, it does not allow one to add the training sample uncertainty to it.
Bivariate Normal / Elliptical method:
Model’s the linearly propogated errors of the confusion matrix as a bivariate Normal distribution. Note that this method is not valid for low statistic sets or for points close to (y=1.0, x=0.0). In these scenarios the Multinomial method should be used.
- conf_mat¶
the confusion_matrix with layout [0, 0] = TN, [0, 1] = FP, [1, 0] = FN, [1, 1] = TP A DataFrame can be obtained by calling get_conf_mat.
- Type:
np.ndarray[int64]
- y¶
Precision in Precision-Recall
TPR in ROC
- Type:
float
- x¶
Recall in Precision-Recall
FPR in ROC
- Type:
float
- threshold¶
the inclusive threshold used to determine the confusion matrix. Is None when the class is instantiated with from_predictions or from_confusion_matrix.
- Type:
float, optional
- cov_mat¶
the covariance matrix of with layout [0, 0] = V[y], [0, 1] = COV[y, x], [1, 0] = COV[y, x], [1, 1] = V[x] For example for Precision-Recall: [0, 0] = V[P], [0, 1] = COV[P, R], [1, 0] = COV[P, R], [1, 1] = V[R] A DataFrame can be obtained by calling get_cov_mat. Only set when Bivariate/Elliptical method is used.
- Type:
np.ndarray[float64], optional
- chi2_scores¶
the chi2 scores for the grid with shape (n_bins, n_bins) and bounds y_bounds on the y-axis, x_bounds on the x-axis Only set when Multinomial method is used.
- Type:
np.ndarray[float64], optional
- y_bounds¶
the lower and upper bound for which y was evaluated, equal to y +- n_sigmas * sigma(y) Only set when Multinomial method is used.
- Type:
np.ndarray[float64], optional
- x_bounds¶
the lower and upper bound for which x was evaluated, equal to x +- n_sigmas * sigma(x) Only set when Multinomial method is used.
- Type:
np.ndarray[float64], optional
- n_sigmas¶
the number of marginal standard deviations used to determine the bounds of the grid which is evaluated for each observed y and x. Only set when Multinomial method is used.
- Type:
int, float, optional
- epsilon¶
the value used to prevent the bounds from reaching the point (y=1.0, x=0.0) which would result in NaNs. Only set when Multinomial method is used.
- Type:
float, optional
- y_label¶
the label of the y-avis.
- Type:
str
- x_label¶
the label of the x-avis.
- Type:
str
- property FPR¶
Alias of the x coordinate
- Type:
np.ndarray[float64]
- property TPR¶
Alias of the y coordinate
- Type:
np.ndarray[float64]
- compute_pvalue_for(y: float | numpy.ndarray, x: float | numpy.ndarray, epsilon: float = 1e-12)¶
Compute p-value(s) for a given y(s) and x(s). If method is bvn the sum of squared Z scores is computed, if method is ‘mult’ the profile loglikelihood is computed. Both follow are chi2 distribution with 2 degrees of freedom.
- Parameters:
y (float, np.ndarray[float64, float32]) –
value(s) to evaluate
Precision in Precision-Recall
TPR in ROC
x (float, np.ndarray[float64, float32]) –
value(s) to evaluate
Recall in Precision-Recall
FPR in ROC
level (int, float) –
epsilon (float, default=1e-12) – the value used to prevent the bounds from reaching the point (y=1.0, x=0.0) which would result in NaNs. Ignored when method is not the Multinomial approach.
- Returns:
chi2_score – the chi2_score(s) for the given y(s) and x(s).
- Return type:
float, np.ndarray[float64]
- compute_score_for(y: float | numpy.ndarray, x: float | numpy.ndarray, epsilon: float = 1e-12)¶
Compute score for a given y(s) and x(s). If method is bvn the sum of squared Z scores is computed, if method is ‘mult’ the profile loglikelihood is computed. Both follow a chi2 distribution with 2 degrees of freedom.
- Parameters:
y (float, np.ndarray[float64, float32]) –
value(s) to evaluate
Precision in Precision-Recall
TPR in ROC
x (float, np.ndarray[float64, float32]) –
value(s) to evaluate
Recall in Precision-Recall
FPR in ROC
epsilon (float, default=1e-12) – the value used to prevent the bounds from reaching the point (y=1.0, x=0.0) which would result in NaNs.
- Returns:
chi2_score – the chi2_score(s) for the given y(s) and x(s).
- Return type:
float, np.ndarray[float64]
- classmethod from_classifier(clf, X: numpy.ndarray, y: numpy.ndarray, threshold: float = 0.5, method: str = 'multinomial', n_bins: int = 100, n_sigmas: int | float = 6.0, epsilon: float = 1e-12, n_threads: int | None = None)¶
Compute joint-uncertainty for a point.
- Parameters:
clf (sklearn.Predictor) – a trained model with method predict_proba, used to compute the classifier scores
X (np.ndarray) – the feature array to be used to compute the classifier scores
y (np.ndarray[bool, int32, int64, float32, float64]) – true labels for observations
threshold (float, default=0.5) – the classification threshold to which the classifier score is evaluated, is inclusive.
method (str, default='multinomial',) – which method to use, options are the Multinomial approach {‘multinomial’, ‘mult’} or the bivariate-normal/elliptical approach {‘bvn’, ‘bivariate’, ‘elliptical’}. Default is ‘multinomial’.
n_bins (int, default=100) – the number of bins in the y/x grid for which the uncertainty is computed. scores will be a n_bins by n_bins array. Ignored when method is not the Multinomial approach.
n_sigmas (int, float, default=6.0) – the number of marginal standard deviations used to determine the bounds of the grid which is evaluated. Ignored when method is not the Multinomial approach.
epsilon (float, default=1e-12) – the value used to prevent the bounds from reaching the point (y=1.0, x=0.0) which would result in NaNs. Ignored when method is not the Multinomial approach.
n_threads (int, default=None) – number of threads to use in the computation of multinomial. If mmu installed from a wheel it won’t have multithreading support. If it was compiled with OpenMP support the default is 4, otherwise 1.
- classmethod from_confusion_matrix(conf_mat: numpy.ndarray, method: str = 'multinomial', n_bins: int = 100, n_sigmas: int | float = 6.0, epsilon: float = 1e-12, n_threads: int | None = None)¶
Compute joint-uncertainty for a point.
- Parameters:
conf_mat (np.ndarray[int64],) – confusion matrix as returned by mmu.confusion_matrix, i.e. with layout [0, 0] = TN, [0, 1] = FP, [1, 0] = FN, [1, 1] = TP or the flattened equivalent. Supported dtypes are int32, int64
method (str, default='multinomial',) – which method to use, options are the Multinomial approach {‘multinomial’, ‘mult’} or the bivariate-normal/elliptical approach {‘bvn’, ‘bivariate’, ‘elliptical’}. Default is ‘multinomial’.
n_bins (int, default=100) – the number of bins in the y/x grid for which the uncertainty is computed. scores will be a n_bins by n_bins array. Ignored when method is not the Multinomial approach.
n_sigmas (int, float, default=6.0) – the number of marginal standard deviations used to determine the bounds of the grid which is evaluated. Ignored when method is not the Multinomial approach.
epsilon (float, default=1e-12) – the value used to prevent the bounds from reaching the point (y=1.0, x=0.0) which would result in NaNs. Ignored when method is not the Multinomial approach.
n_threads (int, default=None) – number of threads to use in the computation of multinomial. If mmu installed from a wheel it won’t have multithreading support. If it was compiled with OpenMP support the default is 4, otherwise 1.
- classmethod from_predictions(y: numpy.ndarray, yhat: numpy.ndarray, method: str = 'multinomial', n_bins: int = 100, n_sigmas: int | float = 6.0, epsilon: float = 1e-12, n_threads: int | None = None)¶
Compute joint-uncertainty for a point.
- Parameters:
y (np.ndarray[bool, int32, int64, float32, float64]) – true labels for observations, supported dtypes are
yhat (yhat : np.ndarray[bool, int32, int64, float32, float64], default=None) – the predicted labels, the same dtypes are supported as y.
method (str, default='multinomial',) – which method to use, options are the Multinomial approach {‘multinomial’, ‘mult’} or the bivariate-normal/elliptical approach {‘bvn’, ‘bivariate’, ‘elliptical’}. Default is ‘multinomial’.
n_bins (int, default=100) – the number of bins in the y/x grid for which the uncertainty is computed. scores will be a n_bins by n_bins array. Ignored when method is not the Multinomial approach.
n_sigmas (int, float, default=6.0) – the number of marginal standard deviations used to determine the bounds of the grid which is evaluated. Ignored when method is not the Multinomial approach.
epsilon (float, default=1e-12) – the value used to prevent the bounds from reaching the point (y=1.0, x=0.0) which would result in NaNs. Ignored when method is not the Multinomial approach.
n_threads (int, default=None) – number of threads to use in the computation of multinomial. If mmu installed from a wheel it won’t have multithreading support. If it was compiled with OpenMP support the default is 4, otherwise 1.
- classmethod from_scores(y: numpy.ndarray, scores: numpy.ndarray, threshold: float = 0.5, method: str = 'multinomial', n_bins: int = 100, n_sigmas: int | float = 6.0, epsilon: float = 1e-12, n_threads: int | None = None)¶
Compute joint-uncertainty for a point.
- Parameters:
y (np.ndarray) – true labels for observations, supported dtypes are [bool, int32, int64, float32, float64]
scores (np.ndarray, default=None) – the classifier scores to be evaluated against the threshold, i.e. yhat = scores >= threshold. Supported dtypes are float32 and float64.
threshold (float, default=0.5) – the classification threshold to which the classifier scores are evaluated, is inclusive.
method (str, default='multinomial',) – which method to use, options are the Multinomial approach {‘multinomial’, ‘mult’} or the bivariate-normal/elliptical approach {‘bvn’, ‘bivariate’, ‘elliptical’}. Default is ‘multinomial’.
n_bins (int, default=100) – the number of bins in the y/x grid for which the uncertainty is computed. scores will be a n_bins by n_bins array. Ignored when method is not the Multinomial approach.
n_sigmas (int, float, default=6.0) – the number of marginal standard deviations used to determine the bounds of the grid which is evaluated. Ignored when method is not the Multinomial approach.
epsilon (float, default=1e-12) – the value used to prevent the bounds from reaching the point (y=1.0, x=0.0) which would result in NaNs. Ignored when method is not the Multinomial approach.
n_threads (int, default=None) – number of threads to use in the computation of multinomial. If mmu installed from a wheel it won’t have multithreading support. If it was compiled with OpenMP support the default is 4, otherwise 1.
- get_conf_mat()¶
Obtain confusion matrix as a DataFrame.
- Returns:
the confusion matrix of the test set
- Return type:
pd.DataFrame
- get_cov_mat()¶
Obtain covariance matrix of the test set.
- Returns:
the covariance matrix
- Return type:
pd.DataFrame
- Raises:
NotImplementedError – when method is not Bivariate-Normal/Elliptical
- plot(levels: int | float | numpy.ndarray | None = None, ax=None, cmap: str = 'Blues', equal_aspect: bool = True, limit_axis: bool = True, legend_loc: str | None = None, alpha: float = 0.8, other: BaseUncertainty | BaseSimulatedUncertainty | List[BaseUncertainty] | List[BaseSimulatedUncertainty] | None = None, other_kwargs: Dict | List[Dict] | None = None)¶
Plot confidence interval(s).
- Parameters:
levels (int, float np.ndarray, default=np.array((1, 2, 3,))) – if int(s) levels is treated as the number of standard deviations for the confidence interval. If float(s) it is taken to be the density to be contained in the confidence interval By default we plot 1, 2 and 3 std deviations
ax (matplotlib.axes.Axes, default=None) – Pre-existing axes for the plot
cmap (str, default='Blues') – matplotlib cmap name to use for CIs
equal_aspect (bool, default=True) – ensure the same scaling for x and y axis
limit_axis (bool, default=True) – allow ax to be limited for optimal CI plot
legend_loc (str, default=None) – location of the legend, default is lower left
alpha (float, defualt=0.8) – opacity value of the contours
other (BaseUncertainty, BaseSimulatedUncertainty, List, default=None) – Add other point uncertainty(ies) plot to the plot, by default the Reds cmap is used for the other plot(s).
other_kwargs (dict, list[dict], default=None) – Keyword arguments passed to other.plot(), ignored if other is None. If other is a list and other_kwargs is a dict, the kwargs are used for all point others.
- Returns:
ax – the axis with the ellipse added to it
- Return type:
matplotlib.axes.Axes
- class mmu.ROCCU¶
- class mmu.ROCCurveUncertainty¶
Bases:
mmu.methods.curvebase.BaseCurveUncertainty
Compute joint uncertainty for a curve like Precision-Recall or ROC.
The joint statistical uncertainty can be computed using:
Multinomial method:
Model’s the uncertainty using profile log-likelihoods between the observed and most conservative confusion matrix for a point. Unlike the Bivariate-Normal/Elliptical approach, this approach is valid for relatively low statistic samples and at the edges of the curve. However, it does not allow one to add the training sample uncertainty to it.
Bivariate Normal / Elliptical method:
Model’s the linearly propagated errors of the confusion matrix as a bivariate Normal distribution. Note that this method is not valid for low statistic sets or for points close to 1.0/0.0. In these scenarios the Multinomial method should be used.
- conf_mat¶
the confusion_matrices over the thresholds with columns [TN, FP, FN, TP]. A DataFrame can be obtained by calling get_conf_mat.
- Type:
np.ndarray[int64]
- y¶
y coordinates of the curve
Precision in Precision-Recall
TPR in ROC
- Type:
np.ndarray[float64]
- x¶
x coordinates of the curve
Recall in Precision-Recall
FPR in ROC
- Type:
np.ndarray[float64]
- chi2_scores¶
the sum of squared z scores which follow a chi2 distribution with two degrees of freedom. Has shape (n_bins, n_bins) with bounds y_bounds on the y-axis and x_bounds on the x-axis. # TODO switch
- Type:
np.ndarray[float64]
- thresholds¶
the inclusive classification/discrimination thresholds used to compute the confusion matrices. Is None when the class is instantiated with from_confusion_matrices.
- Type:
np.ndarray[float64], Optional
- y_grid¶
the y values that where evaluated.
- Type:
np.ndarray[float64]
- x_grid¶
the x values that where evaluated.
- Type:
np.ndarray[float64]
- n_sigmas¶
the number of marginal standard deviations used to determine the bounds of the grid which is evaluated for each observed y and x.
- Type:
int, float
- epsilon¶
the value used to prevent the bounds from reaching the point (y=1.0, x=0.0) which would result in NaNs.
- Type:
float
- cov_mats¶
flattened covariance matrices for each threshold. Only set when method is bivariate/elliptical.
- Type:
np.ndarray[float64], optional
- y_label¶
the label of the y-avis.
- Type:
str
- x_label¶
the label of the x-avis.
- Type:
str
- property FPR¶
Alias of the x coordinate of the curve
- Type:
np.ndarray[float64]
- property FPR_grid¶
Alias of the x values that where evaluated.
- Type:
np.ndarray[float64]
- property TPR¶
Alias of the y coordinate of the curve
- Type:
np.ndarray[float64]
- property TPR_grid¶
Alias of the y values that where evaluated.
- Type:
np.ndarray[float64]
- classmethod from_classifier(clf, X: numpy.ndarray, y: numpy.ndarray, thresholds: numpy.ndarray | None = None, method: str = 'multinomial', n_bins: int | Tuple[int] | List[int] | numpy.ndarray | None = 1000, n_sigmas: int | float = 6.0, epsilon: float = 1e-12, auto_max_steps: int | None = None, auto_seed: int | None = None, n_threads: int | None = None)¶
Compute the curve uncertainty from a trained classifier.
- Parameters:
clf (sklearn.Predictor) – a trained model with method predict_proba, used to compute the classifier scores
X (np.ndarray) – the feature array to be used to compute the classifier scores
y (np.ndarray[bool, int32, int64, float32, float64]) – true labels for observations, supported dtypes are
threshold (float, default=0.5) – the classification threshold to which the classifier score is evaluated, is inclusive.
method (str, default='multinomial',) – which method to use, options are the Multinomial approach {‘multinomial’, ‘mult’} or the bivariate-normal/elliptical approach {‘bvn’, ‘bivariate’, ‘elliptical’}. Default is ‘multinomial’.
n_bins (int, array-like[int], default=1000) – the number of bins in the y/x grid for which the uncertainty is computed. If an int the chi2_scores will be a n_bins by n_bins array. If list-like it must be of length two where the first values determines the number of bins for y-axis and the second the x-axis
n_sigmas (int, float, default=6.0) – the number of marginal standard deviations used to determine the bounds of the grid which is evaluated.
epsilon (float, default=1e-12) – the value used to prevent the bounds from reaching the point (y=1.0, x=0.0) which would result in NaNs.
auto_max_steps (int, default=None) – the maximum number of thresholds for auto_thresholds, is ignored if thresholds is not None.
auto_seed (int, default=None) – the seed/random_state used by auto_thresholds when max_steps is not None. Ignored when thresholds is not None.
n_threads (int, default=None) – the number of threads to use when computing the scores. By default we use 4 threads if OpenMP was found, otherwise the computation is single threaded. As is common, -1 indicates that all threads but one should be used.
- classmethod from_confusion_matrices(conf_mats: numpy.ndarray, method: str = 'multinomial', n_bins: int | Tuple[int] | List[int] | numpy.ndarray | None = 1000, n_sigmas: int | float = 6.0, epsilon: float = 1e-12, obs_axis: int = 0, n_threads: int | None = None)¶
Compute a curve uncertainty from confusion matrices.
- Parameters:
conf_mat (np.ndarray[int64],) – confusion matrix as returned by mmu.confusion_matrix, i.e. with layout [0, 0] = TN, [0, 1] = FP, [1, 0] = FN, [1, 1] = TP or the flattened equivalent.
method (str, default='multinomial',) – which method to use, options are the Multinomial approach {‘multinomial’, ‘mult’} or the bivariate-normal/elliptical approach {‘bvn’, ‘bivariate’, ‘elliptical’}. Default is ‘multinomial’.
n_bins (int, array-like[int], default=1000) – the number of bins in the y/x grid for which the uncertainty is computed. If an int the chi2_scores will be a n_bins by n_bins array. If list-like it must be of length two where the first values determines the number of bins for y-axis and the second the x-axis
n_sigmas (int, float, default=6.0) – the number of marginal standard deviations used to determine the bounds of the grid which is evaluated.
epsilon (float, default=1e-12) – the value used to prevent the bounds from reaching the point (y=1.0, x=0.0) which would result in NaNs.
n_threads (int, default=None) – the number of threads to use when computing the scores. By default we use 4 threads if OpenMP was found, otherwise the computation is single threaded. As is common, -1 indicates that all threads but one should be used.
- classmethod from_scores(y: numpy.ndarray, scores: numpy.ndarray, thresholds: numpy.ndarray | None = None, method: str = 'multinomial', n_bins: int | Tuple[int] | List[int] | numpy.ndarray | None = 1000, n_sigmas: int | float = 6.0, epsilon: float = 1e-12, auto_max_steps: int | None = None, auto_seed: int | None = None, n_threads: int | None = None)¶
Compute the curve uncertainty from classifier scores.
- Parameters:
y (np.ndarray[bool, int32, int64, float32, float64]) – true labels for the observations
scores (np.ndarray[float32, float64], default=None) – the classifier score to be evaluated against the thresholds, i.e. yhat = score >= threshold.
thresholds (np.ndarray[float64], default=None) – the inclusive classification threshold against which the classifier score is evaluated. If None the classification thresholds are determined such that each thresholds results in a different confusion matrix. Note that the maximum number of thresholds can be set using max_steps.
method (str, default='multinomial',) – which method to use, options are the Multinomial approach {‘multinomial’, ‘mult’} or the bivariate-normal/elliptical approach {‘bvn’, ‘bivariate’, ‘elliptical’}. Default is ‘multinomial’.
n_bins (int, array-like[int], default=1000) – the number of bins in the y/x grid for which the uncertainty is computed. If an int the chi2_scores will be a n_bins by n_bins array. If list-like it must be of length two where the first values determines the number of bins for y-axis and the second the x-axis
n_sigmas (int, float, default=6.0) – the number of marginal standard deviations used to determine the bounds of the grid which is evaluated.
epsilon (float, default=1e-12) – the value used to prevent the bounds from reaching the point (y=1.0, x=0.0) which would result in NaNs.
auto_max_steps (int, default=None) – the maximum number of thresholds for auto_thresholds, is ignored if thresholds is not None.
auto_seed (int, default=None) – the seed/random_state used by auto_thresholds when max_steps is not None. Ignored when thresholds is not None.
n_threads (int, default=None) – the number of threads to use when computing the scores. By default we use 4 threads if OpenMP was found, otherwise the computation is single threaded. As is common, -1 indicates that all threads but one should be used.
- get_conf_mats()¶
Obtain confusion matrix as a DataFrame.
- Returns:
the confusion matrix of the test set
- Return type:
pd.DataFrame
- get_cov_mats()¶
Get the covariance matrices over the thresholds.
- Returns:
the flattened covariance matrix and the thresholds
- Return type:
cov_df = pd.DataFrame
- Raises:
NotImplementedError – when method is not Bivariate-Normal/Elliptical
- plot(levels: int | float | numpy.ndarray | None = None, ax=None, cmap: str = 'Blues', equal_aspect: bool = False, limit_axis: bool = True, legend_loc: str | None = None, alpha: float = 0.8, point_uncertainty: mmu.methods.pointbase.BaseUncertainty | List[mmu.methods.pointbase.BaseUncertainty] | None = None, point_kwargs: Dict | List[Dict] | None = None)¶
Plot confidence interval(s)
- Parameters:
levels (int, float np.ndarray, default=np.array((1, 2, 3,))) – if int(s) levels is treated as the number of standard deviations for the confidence interval. If float(s) it is taken to be the density to be contained in the confidence interval By default we plot 1, 2 and 3 std deviations
ax (matplotlib.axes.Axes, default=None) – Pre-existing axes for the plot
cmap (str, default='Blues') – matplotlib cmap name to use for CIs
equal_aspect (bool, default=False) – enforce square axis
limit_axis (bool, default=True) – allow ax to be limited for optimal CI plot
legend_loc (str, default=None) – location of the legend, default is lower center
alpha (float, defualt=0.8) – opacity value of the contours
point_uncertainty (BaseUncertainty, List, default=None) – Add a point uncertainty(ies) plot to the curve plot, by default the Reds cmap is used for the point plot(s).
point_kwargs (dict, list[dict], default=None) – Keyword arguments passed to point_uncertainty.plot(), ignored if point_uncertainty is None. If point_uncertainty is a list and point_kwargs is a dict the kwargs are used for all point uncertainties.
- Returns:
ax – the axis with the ellipse added to it
- Return type:
matplotlib.axes.Axes
In some very specific cases you may want to compute the uncertainty through simulation of the profile likelihoods rather than through the Chi2 distribution.
Note though that the simulation is very compute intensive, each grid point is simulated n_simulations
times.
Hence, you will perform n_bins
* n_bins
* n_simulations
simulations in total.
- class mmu.ROCSimulatedUncertainty¶
Bases:
mmu.methods.pointbase.BaseSimulatedUncertainty
Compute joint uncertainty through simulation.
Model’s the uncertainty using profile log-likelihoods between the observed and most conservative confusion matrix for that point and checks how often random multinomial given the observed probabilities of the confusion matrix result in lower profile log-likelihoods.
This approach is much slower than the BaseUncertainty with Multinomial method, and is likely to give less well-defined contours unless the number of simulations is high enough.
- conf_mat¶
the confusion_matrix with layout [0, 0] = TN, [0, 1] = FP, [1, 0] = FN, [1, 1] = TP A DataFrame can be obtained by calling get_conf_mat.
- Type:
np.ndarray[int64]
- y¶
Precision in Precision-Recall
TPR in ROC
- Type:
float
- x¶
Recall in Precision-Recall
FPR in ROC
- Type:
float
- threshold¶
the inclusive threshold used to determine the confusion matrix. Is None when the class is instantiated with from_predictions or from_confusion_matrix.
- Type:
float, optional
- coverage¶
the percentage of simulations with a lower profile loglikelihood for the grid with shape (n_bins, n_bins) and bounds y_bounds on the y-axis, x_bounds on the x-axis
- Type:
np.ndarray[float64]
- y_bounds¶
the lower and upper bound for which y was evaluated, equal to y +- n_sigmas * sigma(y)
- Type:
np.ndarray[float64]
- x_bounds¶
the lower and upper bound for which x was evaluated, equal to x +- n_sigmas * sigma(x)
- Type:
np.ndarray[float64]
- n_sigmas¶
the number of marginal standard deviations used to determine the bounds of the grid which is evaluated for each observed point.
- Type:
int, float
- epsilon¶
the value used to prevent the bounds from reaching the point (y=1.0, x=0.0) which would result in NaNs.
- Type:
float
- n_simulations¶
the number of simulations performed per grid point
- Type:
int
- property FPR¶
Alias of the x coordinate
- Type:
np.ndarray[float64]
- property TPR¶
Alias of the y coordinate
- Type:
np.ndarray[float64]
- classmethod from_classifier(clf, X: numpy.ndarray, y: numpy.ndarray, threshold: float = 0.5, n_simulations: int = 10000, n_bins: int = 100, n_sigmas: int | float = 6.0, epsilon: float = 1e-12, n_threads: int | None = None)¶
Compute joint-uncertainty for a point.
- Parameters:
clf (sklearn.Predictor) – a trained model with method predict_proba, used to compute the classifier scores
X (np.ndarray) – the feature array to be used to compute the classifier scores
y (np.ndarray[bool, int32, int64, float32, float64]) – true labels for observations
threshold (float, default=0.5) – the classification threshold to which the classifier score is evaluated, is inclusive.
n_simulations (int, default=10000) – the number of simulations to perform per grid point, note that the total number of simulations is (n_bins ** 2 * n_simulations) It is advised
n_simulations
>= 10000n_bins (int, default=100) – the number of bins in the y/x grid for which the uncertainty is computed. scores will be a n_bins by n_bins array. Ignored when method is not the Multinomial approach.
n_sigmas (int, float, default=6.0) – the number of marginal standard deviations used to determine the bounds of the grid which is evaluated. Ignored when method is not the Multinomial approach.
epsilon (float, default=1e-12) – the value used to prevent the bounds from reaching the point (y=1.0, x=0.0) which would result in NaNs. Ignored when method is not the Multinomial approach.
n_threads (int, default=None) – number of threads to use in the computation. If mmu installed from a wheel it won’t have multithreading support. If it was compiled with OpenMP support the default is 4, otherwise 1.
- classmethod from_confusion_matrix(conf_mat: numpy.ndarray, n_simulations: int = 10000, n_bins: int = 100, n_sigmas: int | float = 6.0, epsilon: float = 1e-12, n_threads: int | None = None)¶
Compute joint-uncertainty for a point.
- Parameters:
conf_mat (np.ndarray[int64],) – confusion matrix as returned by mmu.confusion_matrix, i.e. with layout [0, 0] = TN, [0, 1] = FP, [1, 0] = FN, [1, 1] = TP or the flattened equivalent. Supported dtypes are int32, int64
n_simulations (int, default=10000) – the number of simulations to perform per grid point, note that the total number of simulations is (n_bins ** 2 * n_simulations) It is advised
n_simulations
>= 10000n_bins (int, default=100) – the number of bins in the y/x grid for which the uncertainty is computed. scores will be a n_bins by n_bins array. Ignored when method is not the Multinomial approach.
n_sigmas (int, float, default=6.0) – the number of marginal standard deviations used to determine the bounds of the grid which is evaluated. Ignored when method is not the Multinomial approach.
epsilon (float, default=1e-12) – the value used to prevent the bounds from reaching the point (y=1.0, x=0.0) which would result in NaNs. Ignored when method is not the Multinomial approach.
n_threads (int, default=None) – number of threads to use in the computation. If mmu installed from a wheel it won’t have multithreading support. If it was compiled with OpenMP support the default is 4, otherwise 1.
- classmethod from_predictions(y: numpy.ndarray, yhat: numpy.ndarray, n_simulations: int = 10000, n_bins: int = 100, n_sigmas: int | float = 6.0, epsilon: float = 1e-12, n_threads: int | None = None)¶
Compute joint-uncertainty for a point.
- Parameters:
y (np.ndarray[bool, int32, int64, float32, float64]) – true labels for observations, supported dtypes are
yhat (yhat : np.ndarray[bool, int32, int64, float32, float64], default=None) – the predicted labels, the same dtypes are supported as y.
n_simulations (int, default=10000) – the number of simulations to perform per grid point, note that the total number of simulations is (n_bins ** 2 * n_simulations) It is advised
n_simulations
>= 10000n_bins (int, default=100) – the number of bins in the y/x grid for which the uncertainty is computed. scores will be a n_bins by n_bins array. Ignored when method is not the Multinomial approach.
n_sigmas (int, float, default=6.0) – the number of marginal standard deviations used to determine the bounds of the grid which is evaluated. Ignored when method is not the Multinomial approach.
epsilon (float, default=1e-12) – the value used to prevent the bounds from reaching the point (y=1.0, x=0.0) which would result in NaNs. Ignored when method is not the Multinomial approach.
n_threads (int, default=None) – number of threads to use in the computation. If mmu installed from a wheel it won’t have multithreading support. If it was compiled with OpenMP support the default is 4, otherwise 1.
- classmethod from_scores(y: numpy.ndarray, scores: numpy.ndarray, threshold: float = 0.5, n_simulations: int = 10000, n_bins: int = 100, n_sigmas: int | float = 6.0, epsilon: float = 1e-12, n_threads: int | None = None)¶
Compute joint-uncertainty for a point.
- Parameters:
y (np.ndarray) – true labels for observations, supported dtypes are [bool, int32, int64, float32, float64]
scores (np.ndarray, default=None) – the classifier scores to be evaluated against the threshold, i.e. yhat = scores >= threshold. Supported dtypes are float32 and float64.
threshold (float, default=0.5) – the classification threshold to which the classifier scores are evaluated, is inclusive.
n_simulations (int, default=10000) – the number of simulations to perform per grid point, note that the total number of simulations is (n_bins ** 2 * n_simulations) It is advised
n_simulations
>= 10000n_bins (int, default=100) – the number of bins in the y/x grid for which the uncertainty is computed. scores will be a n_bins by n_bins array. Ignored when method is not the Multinomial approach.
n_sigmas (int, float, default=6.0) – the number of marginal standard deviations used to determine the bounds of the grid which is evaluated. Ignored when method is not the Multinomial approach.
epsilon (float, default=1e-12) – the value used to prevent the bounds from reaching the point (y=1.0, x=0.0) which would result in NaNs. Ignored when method is not the Multinomial approach.
n_threads (int, default=None) – number of threads to use in the computation. If mmu installed from a wheel it won’t have multithreading support. If it was compiled with OpenMP support the default is 4, otherwise 1.
- get_conf_mat()¶
Obtain confusion matrix as a DataFrame.
- Returns:
the confusion matrix of the test set
- Return type:
pd.DataFrame
- plot(levels: int | float | numpy.ndarray | None = None, ax=None, cmap: str = 'Blues', equal_aspect: bool = True, limit_axis: bool = True, legend_loc: str | None = None, alpha: float = 0.8, other: BaseUncertainty | BaseSimulatedUncertainty | List[BaseUncertainty] | List[BaseSimulatedUncertainty] | None = None, other_kwargs: Dict | List[Dict] | None = None)¶
Plot confidence interval(s) a point.
- Parameters:
levels (int, float np.ndarray, default=np.array((1, 2, 3,))) – if int(s) levels is treated as the number of standard deviations for the confidence interval. If float(s) it is taken to be the density to be contained in the confidence interval By default we plot 1, 2 and 3 std deviations
ax (matplotlib.axes.Axes, default=None) – Pre-existing axes for the plot
cmap (str, default='Blues') – matplotlib cmap name to use for CIs
equal_aspect (bool, default=False) – enforce square axis
limit_axis (bool, default=True) – allow ax to be limited for optimal CI plot
legend_loc (str, default=None) – location of the legend, default is lower left
alpha (float, defualt=0.8) – opacity value of the contours
other (BaseUncertainty, BaseSimulatedUncertainty, List, default=None) – Add other point uncertainty(ies) plot to the plot, by default the Reds cmap is used for the other plot(s).
other_kwargs (dict, list[dict], default=None) – Keyword arguments passed to other.plot(), ignored if other is None. If other is a list and other_kwargs is a dict, the kwargs are used for all point others.
- Returns:
ax – the axis with the ellipse added to it
- Return type:
matplotlib.axes.Axes