Title: | Determine and Evaluate Optimal Cutpoints in Binary Classification Tasks |
---|---|
Description: | Estimate cutpoints that optimize a specified metric in binary classification tasks and validate performance using bootstrapping. Some methods for more robust cutpoint estimation are supported, e.g. a parametric method assuming normal distributions, bootstrapped cutpoints, and smoothing of the metric values per cutpoint using Generalized Additive Models. Various plotting functions are included. For an overview of the package see Thiele and Hirschfeld (2021) <doi:10.18637/jss.v098.i11>. |
Authors: | Christian Thiele [cre, aut] |
Maintainer: | Christian Thiele <[email protected]> |
License: | GPL-3 |
Version: | 1.1.2 |
Built: | 2024-10-31 18:38:23 UTC |
Source: | https://github.com/thie1e/cutpointr |
Calculate the absolute difference of positive predictive value (PPV) and
negative predictive value (NPV) from
true positives, false positives, true negatives and false negatives.
The inputs must be vectors of equal length.
ppv = tp / (tp + fp)
npv = tn / (tn + fn)
abs\_d\_ppv\_npv = |ppv - npv|
abs_d_ppv_npv(tp, fp, tn, fn, ...)
abs_d_ppv_npv(tp, fp, tn, fn, ...)
tp |
(numeric) number of true positives. |
fp |
(numeric) number of false positives. |
tn |
(numeric) number of true negatives. |
fn |
(numeric) number of false negatives. |
... |
for capturing additional arguments passed by method. |
Other metric functions:
F1_score()
,
Jaccard()
,
abs_d_sens_spec()
,
accuracy()
,
cohens_kappa()
,
cutpoint()
,
false_omission_rate()
,
metric_constrain()
,
misclassification_cost()
,
npv()
,
odds_ratio()
,
p_chisquared()
,
plr()
,
ppv()
,
precision()
,
prod_ppv_npv()
,
prod_sens_spec()
,
recall()
,
risk_ratio()
,
roc01()
,
sensitivity()
,
specificity()
,
sum_ppv_npv()
,
sum_sens_spec()
,
total_utility()
,
tpr()
,
tp()
,
youden()
abs_d_ppv_npv(10, 5, 20, 10) abs_d_ppv_npv(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
abs_d_ppv_npv(10, 5, 20, 10) abs_d_ppv_npv(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
Calculate the absolute difference of sensitivity and specificity
from true positives, false positives, true negatives and false negatives.
The inputs must be vectors of equal length.
sensitivity = tp / (tp + fn)
specificity = tn / (tn + fp)
abs_d_sens_spec = |sensitivity - specificity|
abs_d_sens_spec(tp, fp, tn, fn, ...)
abs_d_sens_spec(tp, fp, tn, fn, ...)
tp |
(numeric) number of true positives. |
fp |
(numeric) number of false positives. |
tn |
(numeric) number of true negatives. |
fn |
(numeric) number of false negatives. |
... |
for capturing additional arguments passed by method. |
Other metric functions:
F1_score()
,
Jaccard()
,
abs_d_ppv_npv()
,
accuracy()
,
cohens_kappa()
,
cutpoint()
,
false_omission_rate()
,
metric_constrain()
,
misclassification_cost()
,
npv()
,
odds_ratio()
,
p_chisquared()
,
plr()
,
ppv()
,
precision()
,
prod_ppv_npv()
,
prod_sens_spec()
,
recall()
,
risk_ratio()
,
roc01()
,
sensitivity()
,
specificity()
,
sum_ppv_npv()
,
sum_sens_spec()
,
total_utility()
,
tpr()
,
tp()
,
youden()
abs_d_sens_spec(10, 5, 20, 10) abs_d_sens_spec(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
abs_d_sens_spec(10, 5, 20, 10) abs_d_sens_spec(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
Calculate accuracy from
true positives, false positives, true negatives and false negatives.
The inputs must be vectors of equal length.
accuracy = (tp + tn) / (tp + fp + tn + fn)
accuracy(tp, fp, tn, fn, ...)
accuracy(tp, fp, tn, fn, ...)
tp |
(numeric) number of true positives. |
fp |
(numeric) number of false positives. |
tn |
(numeric) number of true negatives. |
fn |
(numeric) number of false negatives. |
... |
for capturing additional arguments passed by method. |
Other metric functions:
F1_score()
,
Jaccard()
,
abs_d_ppv_npv()
,
abs_d_sens_spec()
,
cohens_kappa()
,
cutpoint()
,
false_omission_rate()
,
metric_constrain()
,
misclassification_cost()
,
npv()
,
odds_ratio()
,
p_chisquared()
,
plr()
,
ppv()
,
precision()
,
prod_ppv_npv()
,
prod_sens_spec()
,
recall()
,
risk_ratio()
,
roc01()
,
sensitivity()
,
specificity()
,
sum_ppv_npv()
,
sum_sens_spec()
,
total_utility()
,
tpr()
,
tp()
,
youden()
accuracy(10, 5, 20, 10) accuracy(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
accuracy(10, 5, 20, 10) accuracy(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
By default, the output of cutpointr includes the optimized metric and several other metrics. This function adds further metrics. Suitable metric functions are all metric functions that are included in the package or that comply with those standards.
add_metric(object, metric) ## S3 method for class 'cutpointr' add_metric(object, metric) ## S3 method for class 'multi_cutpointr' add_metric(object, metric) ## S3 method for class 'roc_cutpointr' add_metric(object, metric)
add_metric(object, metric) ## S3 method for class 'cutpointr' add_metric(object, metric) ## S3 method for class 'multi_cutpointr' add_metric(object, metric) ## S3 method for class 'roc_cutpointr' add_metric(object, metric)
object |
A cutpointr or roc_cutpointr object. |
metric |
(list) A list of metric functions to be added. |
A cutpointr or roc_cutpointr object (a data.frame) with one or more added columns.
Other main cutpointr functions:
boot_ci()
,
boot_test()
,
cutpointr()
,
multi_cutpointr()
,
predict.cutpointr()
,
roc()
library(dplyr) library(cutpointr) cutpointr(suicide, dsi, suicide, gender) %>% add_metric(list(ppv, npv)) %>% select(optimal_cutpoint, subgroup, AUC, sum_sens_spec, ppv, npv)
library(dplyr) library(cutpointr) cutpointr(suicide, dsi, suicide, gender) %>% add_metric(list(ppv, npv)) %>% select(optimal_cutpoint, subgroup, AUC, sum_sens_spec, ppv, npv)
Calculate the area under the ROC curve using the trapezoidal rule.
auc(x) ## S3 method for class 'roc_cutpointr' auc(x) ## S3 method for class 'cutpointr' auc(x)
auc(x) ## S3 method for class 'roc_cutpointr' auc(x) ## S3 method for class 'cutpointr' auc(x)
x |
Data frame resulting from the roc() or cutpointr() function. |
Numeric vector of AUC values
Forked from the AUC package
Given a cutpointr
object that includes bootstrap results
this function calculates a bootstrap
confidence interval for a selected variable.
Missing values are removed before calculating the quantiles. In the case
of multiple optimal cutpoints all cutpoints / metric values are included
in the calculation.
Values of the selected variable are returned for the percentiles alpha / 2
and 1 - alpha / 2. The metrics in the bootstrap data frames of
cutpointr
are suffixed with _b
and _oob
to indicate
in-bag and out-of-bag, respectively. For example, to calculate quantiles
of the in-bag AUC variable = AUC_b
should be set.
boot_ci(x, variable, in_bag = TRUE, alpha = 0.05)
boot_ci(x, variable, in_bag = TRUE, alpha = 0.05)
x |
A cutpointr object with bootstrap results |
variable |
Variable to calculate CI for |
in_bag |
Whether the in-bag or out-of-bag results should be used for testing |
alpha |
Alpha level. Quantiles of the bootstrapped values are returned for (alpha / 2) and 1 - (alpha / 2). |
A data frame with the columns quantile and value
Other main cutpointr functions:
add_metric()
,
boot_test()
,
cutpointr()
,
multi_cutpointr()
,
predict.cutpointr()
,
roc()
## Not run: opt_cut <- cutpointr(suicide, dsi, suicide, gender, metric = youden, boot_runs = 1000) boot_ci(opt_cut, optimal_cutpoint, in_bag = FALSE, alpha = 0.05) boot_ci(opt_cut, acc, in_bag = FALSE, alpha = 0.05) boot_ci(opt_cut, cohens_kappa, in_bag = FALSE, alpha = 0.05) boot_ci(opt_cut, AUC, in_bag = TRUE, alpha = 0.05) ## End(Not run)
## Not run: opt_cut <- cutpointr(suicide, dsi, suicide, gender, metric = youden, boot_runs = 1000) boot_ci(opt_cut, optimal_cutpoint, in_bag = FALSE, alpha = 0.05) boot_ci(opt_cut, acc, in_bag = FALSE, alpha = 0.05) boot_ci(opt_cut, cohens_kappa, in_bag = FALSE, alpha = 0.05) boot_ci(opt_cut, AUC, in_bag = TRUE, alpha = 0.05) ## End(Not run)
This function performs a significance test based on the bootstrap results
of cutpointr to test whether a chosen metric is equal between subgroups
or between two cutpointr objects. The test statistic is calculated as
the standardized difference of the metric between groups. If x
contains subgroups, the test is run on all possible pairings of subgroups.
An additional adjusted p-value is returned in that case.
boot_test(x, y = NULL, variable = "AUC", in_bag = TRUE, correction = "holm")
boot_test(x, y = NULL, variable = "AUC", in_bag = TRUE, correction = "holm")
x |
A cutpointr object with bootstrap results |
y |
If x does not contain subgroups another cutpointr object |
variable |
The variable for testing |
in_bag |
Whether the in-bag or out-of-bag results should be used for testing |
correction |
The type of correction for multiple testing. Possible values are as in p.adjust.methods |
The variable name is looked up in the columns of the bootstrap results
where the suffixes _b and _oob indicate in-bag and out-of-bag estimates,
respectively (controlled via the in_bag
argument).
Possible values are optimal_cutpoint, AUC,
acc, sensitivity, specificity, and the metric that was selected
in cutpointr
. Note that there is no "out-of-bag optimal cutpoint", so
when selecting variable = optimal_cutpoint
the test will be based on
the in-bag data.
The test statistic is calculated as z = (t1 - t2) / sd(t1 - t2) where t1 and t2 are the metric values on the full sample and sd(t1 - t2) is the standard deviation of the differences of the metric values per bootstrap repetition. The test is two-sided.
If two cutpointr objects are compared and the numbers of bootstrap repetitions differ, the smaller number will be used.
Since pairwise differences are calculated for this test, the test function does not support multiple optimal cutpoints, because it is unclear how the differences should be calculated in that case.
A data.frame (a tibble) with the columns test_var, p, d, sd_d, z and in_bag. If a grouped cutpointr object was tested, the additional columns subgroup1, subgroup2 and p_adj are returned.
Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J.-C., & Müller, M. (2011). pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics, 12(1), 77. https://doi.org/10.1186/1471-2105-12-77
Other main cutpointr functions:
add_metric()
,
boot_ci()
,
cutpointr()
,
multi_cutpointr()
,
predict.cutpointr()
,
roc()
## Not run: library(cutpointr) library(dplyr) set.seed(734) cp_f <- cutpointr(suicide %>% filter(gender == "female"), dsi, suicide, boot_runs = 1000, boot_stratify = TRUE) set.seed(928) cp_m <- cutpointr(suicide %>% filter(gender == "male"), dsi, suicide, boot_runs = 1000, boot_stratify = TRUE) # No significant differences: boot_test(cp_f, cp_m, AUC, in_bag = TRUE) boot_test(cp_f, cp_m, sum_sens_spec, in_bag = FALSE) set.seed(135) cp <- cutpointr(suicide, dsi, suicide, gender, boot_runs = 1000, boot_stratify = TRUE) # Roughly same result as above: boot_test(cp, variable = AUC, in_bag = TRUE) boot_test(cp, variable = sum_sens_spec, in_bag = FALSE) ## End(Not run)
## Not run: library(cutpointr) library(dplyr) set.seed(734) cp_f <- cutpointr(suicide %>% filter(gender == "female"), dsi, suicide, boot_runs = 1000, boot_stratify = TRUE) set.seed(928) cp_m <- cutpointr(suicide %>% filter(gender == "male"), dsi, suicide, boot_runs = 1000, boot_stratify = TRUE) # No significant differences: boot_test(cp_f, cp_m, AUC, in_bag = TRUE) boot_test(cp_f, cp_m, sum_sens_spec, in_bag = FALSE) set.seed(135) cp <- cutpointr(suicide, dsi, suicide, gender, boot_runs = 1000, boot_stratify = TRUE) # Roughly same result as above: boot_test(cp, variable = AUC, in_bag = TRUE) boot_test(cp, variable = sum_sens_spec, in_bag = FALSE) ## End(Not run)
Calculate the Kappa metric from
true positives, false positives, true negatives and false negatives.
The inputs must be vectors of equal length.
mrg_a = ((tp + fn) * (tp + fp)) / (tp + fn + fp + tn)
mrg_b = ((fp + tn) * (fn + tn)) / (tp + fn + fp + tn)
expec_agree = (mrg_a + mrg_b) / (tp + fn + fp + tn)
obs_agree = (tp + tn) / (tp + fn + fp + tn)
cohens_kappa = (obs_agree - expec_agree) / (1 - expec_agree)
cohens_kappa(tp, fp, tn, fn, ...)
cohens_kappa(tp, fp, tn, fn, ...)
tp |
(numeric) number of true positives. |
fp |
(numeric) number of false positives. |
tn |
(numeric) number of true negatives. |
fn |
(numeric) number of false negatives. |
... |
for capturing additional arguments passed by method. |
A numeric matrix with the column name "cohens_kappa".
Other metric functions:
F1_score()
,
Jaccard()
,
abs_d_ppv_npv()
,
abs_d_sens_spec()
,
accuracy()
,
cutpoint()
,
false_omission_rate()
,
metric_constrain()
,
misclassification_cost()
,
npv()
,
odds_ratio()
,
p_chisquared()
,
plr()
,
ppv()
,
precision()
,
prod_ppv_npv()
,
prod_sens_spec()
,
recall()
,
risk_ratio()
,
roc01()
,
sensitivity()
,
specificity()
,
sum_ppv_npv()
,
sum_sens_spec()
,
total_utility()
,
tpr()
,
tp()
,
youden()
cohens_kappa(10, 5, 20, 10) cohens_kappa(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
cohens_kappa(10, 5, 20, 10) cohens_kappa(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
This is a utility function for extracting the cutpoints from a roc_cutpointr
object. Mainly useful in conjunction with the plot_cutpointr
function if
cutpoints are to be plotted on the x-axis.
cutpoint(x, ...) cutpoints(x, ...)
cutpoint(x, ...) cutpoints(x, ...)
x |
A roc_cutpointr object. |
... |
Further arguments. |
Other metric functions:
F1_score()
,
Jaccard()
,
abs_d_ppv_npv()
,
abs_d_sens_spec()
,
accuracy()
,
cohens_kappa()
,
false_omission_rate()
,
metric_constrain()
,
misclassification_cost()
,
npv()
,
odds_ratio()
,
p_chisquared()
,
plr()
,
ppv()
,
precision()
,
prod_ppv_npv()
,
prod_sens_spec()
,
recall()
,
risk_ratio()
,
roc01()
,
sensitivity()
,
specificity()
,
sum_ppv_npv()
,
sum_sens_spec()
,
total_utility()
,
tpr()
,
tp()
,
youden()
oc <- cutpointr(suicide, dsi, suicide, gender) plot_cutpointr(oc, cutpoint, accuracy)
oc <- cutpointr(suicide, dsi, suicide, gender) plot_cutpointr(oc, cutpoint, accuracy)
This function calculates the number of knots
when using smoothing splines for smoothing a function of metric values per
cutpoint value. The function for calculating the number of knots is equal
to stats::.nknots_smspl
but uses the number of unique cutpoints
in the data as n.
cutpoint_knots(data, x)
cutpoint_knots(data, x)
data |
A data frame |
x |
(character) The name of the predictor variable |
cutpoint_knots(suicide, "dsi")
cutpoint_knots(suicide, "dsi")
Using predictions (or e.g. biological marker values) and binary class labels, this function
will determine "optimal" cutpoints using various selectable methods. The
methods for cutpoint determination can be evaluated using bootstrapping. An
estimate of the cutpoint variability and the out-of-sample performance can then
be returned with summary
or plot
. For an introduction to the
package please see vignette("cutpointr", package = "cutpointr")
.
cutpointr(...) ## Default S3 method: cutpointr( data, x, class, subgroup = NULL, method = maximize_metric, metric = sum_sens_spec, pos_class = NULL, neg_class = NULL, direction = NULL, boot_runs = 0, boot_stratify = FALSE, use_midpoints = FALSE, break_ties = median, na.rm = FALSE, allowParallel = FALSE, silent = FALSE, tol_metric = 1e-06, ... ) ## S3 method for class 'numeric' cutpointr( x, class, subgroup = NULL, method = maximize_metric, metric = sum_sens_spec, pos_class = NULL, neg_class = NULL, direction = NULL, boot_runs = 0, boot_stratify = FALSE, use_midpoints = FALSE, break_ties = median, na.rm = FALSE, allowParallel = FALSE, silent = FALSE, tol_metric = 1e-06, ... )
cutpointr(...) ## Default S3 method: cutpointr( data, x, class, subgroup = NULL, method = maximize_metric, metric = sum_sens_spec, pos_class = NULL, neg_class = NULL, direction = NULL, boot_runs = 0, boot_stratify = FALSE, use_midpoints = FALSE, break_ties = median, na.rm = FALSE, allowParallel = FALSE, silent = FALSE, tol_metric = 1e-06, ... ) ## S3 method for class 'numeric' cutpointr( x, class, subgroup = NULL, method = maximize_metric, metric = sum_sens_spec, pos_class = NULL, neg_class = NULL, direction = NULL, boot_runs = 0, boot_stratify = FALSE, use_midpoints = FALSE, break_ties = median, na.rm = FALSE, allowParallel = FALSE, silent = FALSE, tol_metric = 1e-06, ... )
... |
Further optional arguments that will be passed to method. minimize_metric and maximize_metric pass ... to metric. |
data |
A data.frame with the data needed for x, class and optionally subgroup. |
x |
The variable name to be used for classification, e.g. predictions. The raw vector of values if the data argument is unused. |
class |
The variable name indicating class membership. If the data argument is unused, the vector of raw numeric values. |
subgroup |
An additional covariate that identifies subgroups or the raw data if data = NULL. Separate optimal cutpoints will be determined per group. Numeric, character and factor are allowed. |
method |
(function) A function for determining cutpoints. Can be user supplied or use some of the built in methods. See details. |
metric |
(function) The function for computing a metric when using maximize_metric or minimize_metric as method and and for the out-of-bag values during bootstrapping. A way of internally validating the performance. User defined functions can be supplied, see details. |
pos_class |
(optional) The value of class that indicates the positive class. |
neg_class |
(optional) The value of class that indicates the negative class. |
direction |
(character, optional) Use ">=" or "<=" to indicate whether x is supposed to be larger or smaller for the positive class. |
boot_runs |
(numerical) If positive, this number of bootstrap samples will be used to assess the variability and the out-of-sample performance. |
boot_stratify |
(logical) If the bootstrap is stratified, bootstrap samples are drawn separately in both classes and then combined, keeping the proportion of positives and negatives constant in every resample. |
use_midpoints |
(logical) If TRUE (default FALSE) the returned optimal cutpoint will be the mean of the optimal cutpoint and the next highest observation (for direction = ">=") or the next lowest observation (for direction = "<=") which avoids biasing the optimal cutpoint. |
break_ties |
If multiple cutpoints are found, they can be summarized using this function, e.g. mean or median. To return all cutpoints use c as the function. |
na.rm |
(logical) Set to TRUE (default FALSE) to keep only complete cases of x, class and subgroup (if specified). Missing values with na.rm = FALSE will raise an error. |
allowParallel |
(logical) If TRUE, the bootstrapping will be parallelized using foreach. A local cluster, for example, should be started manually beforehand. |
silent |
(logical) If TRUE suppresses all messages. |
tol_metric |
All cutpoints will be returned that lead to a metric
value in the interval [m_max - tol_metric, m_max + tol_metric] where
m_max is the maximum achievable metric value. This can be used to return
multiple decent cutpoints and to avoid floating-point problems. Not supported
by all |
If direction
and/or pos_class
and neg_class
are not given, the function will
assume that higher values indicate the positive class and use the class
with a higher median as the positive class.
This function uses tidyeval to support unquoted arguments. For programming
with cutpointr
the operator !!
can be used to unquote an argument, see the
examples.
Different methods can be selected for determining the optimal cutpoint via the method argument. The package includes the following method functions:
maximize_metric
: Maximize the metric function
minimize_metric
: Minimize the metric function
maximize_loess_metric
: Maximize the metric function after LOESS
smoothing
minimize_loess_metric
: Minimize the metric function after LOESS
smoothing
maximize_spline_metric
: Maximize the metric function after spline
smoothing
minimize_spline_metric
: Minimize the metric function after spline
smoothing
maximize_boot_metric
: Maximize the metric function as a summary of
the optimal cutpoints in bootstrapped samples
minimize_boot_metric
: Minimize the metric function as a summary of
the optimal cutpoints in bootstrapped samples
oc_youden_kernel
: Maximize the Youden-Index after kernel smoothing
the distributions of the two classes
oc_youden_normal
: Maximize the Youden-Index parametrically
assuming normally distributed data in both classes
oc_manual
: Specify the cutpoint manually
User-defined functions can be supplied to method, too. As a reference, the code of all included method functions can be accessed by simply typing their name. To define a new method function, create a function that may take as input(s):
data
: A data.frame
or tbl_df
x
: (character) The name of the predictor or independent variable
class
: (character) The name of the class or dependent variable
metric_func
: A function for calculating a metric, e.g. accuracy
pos_class
: The positive class
neg_class
: The negative class
direction
: ">=" if the positive class has higher x values, "<=" otherwise
tol_metric
: (numeric) In the built-in methods a tolerance around
the optimal metric value
use_midpoints
: (logical) In the built-in methods whether to
use midpoints instead of exact optimal cutpoints
...
Further arguments
The ...
argument can be used to avoid an error if not all of the above
arguments are needed or in order to pass additional arguments to method.
The function should return a data.frame
or tbl_df
with
one row, the column "optimal_cutpoint", and an optional column with an arbitrary name
with the metric value at the optimal cutpoint.
Built-in metric functions include:
accuracy
: Fraction correctly classified
youden
: Youden- or J-Index = sensitivity + specificity - 1
sum_sens_spec
: sensitivity + specificity
sum_ppv_npv
: The sum of positive predictive value (PPV) and negative
predictive value (NPV)
prod_sens_spec
: sensitivity * specificity
prod_ppv_npv
: The product of positive predictive value (PPV) and
negative predictive value (NPV)
cohens_kappa
: Cohen's Kappa
abs_d_sens_spec
: The absolute difference between
sensitivity and specificity
roc01
: Distance to the point (0,1) on ROC space
abs_d_ppv_npv
: The absolute difference between positive predictive
value (PPV) and negative predictive value (NPV)
p_chisquared
: The p-value of a chi-squared test on the confusion
matrix of predictions and observations
odds_ratio
: The odds ratio calculated as (TP / FP) / (FN / TN)
risk_ratio
: The risk ratio (relative risk) calculated as
(TP / (TP + FN)) / (FP / (FP + TN))
positive and negative likelihood ratio calculated as
plr
= true positive rate / false positive rate and
nlr
= false negative rate / true negative rate
misclassification_cost
: The sum of the misclassification cost of
false positives and false negatives fp * cost_fp + fn * cost_fn.
Additional arguments to cutpointr: cost_fp
, cost_fn
total_utility
: The total utility of true / false positives / negatives
calculated as utility_tp * TP + utility_tn * TN - cost_fp * FP - cost_fn * FN.
Additional arguments to cutpointr: utility_tp
, utility_tn
,
cost_fp
, cost_fn
F1_score
: The F1-score (2 * TP) / (2 * TP + FP + FN)
sens_constrain
: Maximize sensitivity given a minimal value of
specificity
spec_constrain
: Maximize specificity given a minimal value of
sensitivity
metric_constrain
: Maximize a selected metric given a minimal
value of another selected metric
Furthermore, the following functions are included which can be used as metric
functions but are more useful for plotting purposes, for example in
plot_cutpointr, or for defining new metric functions:
tp
, fp
, tn
, fn
, tpr
, fpr
,
tnr
, fnr
, false_omission_rate
,
false_discovery_rate
, ppv
, npv
, precision
,
recall
, sensitivity
, and specificity
.
User defined metric functions can be created as well which can accept the following inputs as vectors:
tp
: Vector of true positives
fp
: Vector of false positives
tn
: Vector of true negatives
fn
: Vector of false negatives
...
If the metric function is used in conjunction with any of the
maximize / minimize methods, further arguments can be passed
The function should return a numeric vector or a matrix or a data.frame
with one column. If the column is named,
the name will be included in the output and plots. Avoid using names that
are identical to the column names that are by default returned by cutpointr.
If boot_runs
is positive, that number of bootstrap samples will be drawn
and the optimal cutpoint using method
will be determined. Additionally,
as a way of internal validation, the function in metric
will be used to
score the out-of-bag predictions using the cutpoints determined by
method
. Various default metrics are always included in the bootstrap results.
If multiple optimal cutpoints are found, the column optimal_cutpoint becomes a list that contains the vector(s) of the optimal cutpoints.
If use_midpoints = TRUE
the mean of the optimal cutpoint and the next
highest or lowest possible cutpoint is returned, depending on direction
.
The tol_metric
argument can be used to avoid floating-point problems
that may lead to exclusion of cutpoints that achieve the optimally achievable
metric value. Additionally, by selecting a large tolerance multiple cutpoints
can be returned that lead to decent metric values in the vicinity of the
optimal metric value. tol_metric
is passed to metric and is only
supported by the maximization and minimization functions, i.e.
maximize_metric
, minimize_metric
, maximize_loess_metric
,
minimize_loess_metric
, maximize_spline_metric
, and
minimize_spline_metric
. In maximize_boot_metric
and
minimize_boot_metric
multiple optimal cutpoints will be passed to the
summary_func
of these two functions.
A cutpointr object which is also a data.frame and tbl_df.
Other main cutpointr functions:
add_metric()
,
boot_ci()
,
boot_test()
,
multi_cutpointr()
,
predict.cutpointr()
,
roc()
library(cutpointr) ## Optimal cutpoint for dsi data(suicide) opt_cut <- cutpointr(suicide, dsi, suicide) opt_cut s_opt_cut <- summary(opt_cut) plot(opt_cut) ## Not run: ## Predict class for new observations predict(opt_cut, newdata = data.frame(dsi = 0:5)) ## Supplying raw data, same result cutpointr(x = suicide$dsi, class = suicide$suicide) ## direction, class labels, method and metric can be defined manually ## Again, same result cutpointr(suicide, dsi, suicide, direction = ">=", pos_class = "yes", method = maximize_metric, metric = youden) ## Optimal cutpoint for dsi, as before, but for the separate subgroups opt_cut <- cutpointr(suicide, dsi, suicide, gender) opt_cut (s_opt_cut <- summary(opt_cut)) tibble:::print.tbl(s_opt_cut) ## Bootstrapping also works on individual subgroups set.seed(30) opt_cut <- cutpointr(suicide, dsi, suicide, gender, boot_runs = 1000, boot_stratify = TRUE) opt_cut summary(opt_cut) plot(opt_cut) ## Parallelized bootstrapping library(doParallel) library(doRNG) cl <- makeCluster(2) # 2 cores registerDoParallel(cl) registerDoRNG(12) # Reproducible parallel loops using doRNG opt_cut <- cutpointr(suicide, dsi, suicide, gender, boot_runs = 1000, allowParallel = TRUE) stopCluster(cl) opt_cut plot(opt_cut) ## Robust cutpoint method using kernel smoothing for optimizing Youden-Index opt_cut <- cutpointr(suicide, dsi, suicide, gender, method = oc_youden_kernel) opt_cut ## End(Not run)
library(cutpointr) ## Optimal cutpoint for dsi data(suicide) opt_cut <- cutpointr(suicide, dsi, suicide) opt_cut s_opt_cut <- summary(opt_cut) plot(opt_cut) ## Not run: ## Predict class for new observations predict(opt_cut, newdata = data.frame(dsi = 0:5)) ## Supplying raw data, same result cutpointr(x = suicide$dsi, class = suicide$suicide) ## direction, class labels, method and metric can be defined manually ## Again, same result cutpointr(suicide, dsi, suicide, direction = ">=", pos_class = "yes", method = maximize_metric, metric = youden) ## Optimal cutpoint for dsi, as before, but for the separate subgroups opt_cut <- cutpointr(suicide, dsi, suicide, gender) opt_cut (s_opt_cut <- summary(opt_cut)) tibble:::print.tbl(s_opt_cut) ## Bootstrapping also works on individual subgroups set.seed(30) opt_cut <- cutpointr(suicide, dsi, suicide, gender, boot_runs = 1000, boot_stratify = TRUE) opt_cut summary(opt_cut) plot(opt_cut) ## Parallelized bootstrapping library(doParallel) library(doRNG) cl <- makeCluster(2) # 2 cores registerDoParallel(cl) registerDoRNG(12) # Reproducible parallel loops using doRNG opt_cut <- cutpointr(suicide, dsi, suicide, gender, boot_runs = 1000, allowParallel = TRUE) stopCluster(cl) opt_cut plot(opt_cut) ## Robust cutpoint method using kernel smoothing for optimizing Youden-Index opt_cut <- cutpointr(suicide, dsi, suicide, gender, method = oc_youden_kernel) opt_cut ## End(Not run)
This function is equivalent to cutpointr
but takes only quoted arguments
for x
, class
and subgroup
. This was useful before
cutpointr
supported tidyeval.
cutpointr_( data, x, class, subgroup = NULL, method = maximize_metric, metric = sum_sens_spec, pos_class = NULL, neg_class = NULL, direction = NULL, boot_runs = 0, boot_stratify = FALSE, use_midpoints = FALSE, break_ties = median, na.rm = FALSE, allowParallel = FALSE, silent = FALSE, tol_metric = 1e-06, ... )
cutpointr_( data, x, class, subgroup = NULL, method = maximize_metric, metric = sum_sens_spec, pos_class = NULL, neg_class = NULL, direction = NULL, boot_runs = 0, boot_stratify = FALSE, use_midpoints = FALSE, break_ties = median, na.rm = FALSE, allowParallel = FALSE, silent = FALSE, tol_metric = 1e-06, ... )
data |
A data.frame with the data needed for x, class and optionally subgroup. |
x |
(character) The variable name to be used for classification, e.g. predictions or test values. |
class |
(character) The variable name indicating class membership. |
subgroup |
(character) The variable name of an additional covariate that identifies subgroups. Separate optimal cutpoints will be determined per group. |
method |
(function) A function for determining cutpoints. Can be user supplied or use some of the built in methods. See details. |
metric |
(function) The function for computing a metric when using maximize_metric or minimize_metric as method and and for the out-of-bag values during bootstrapping. A way of internally validating the performance. User defined functions can be supplied, see details. |
pos_class |
(optional) The value of class that indicates the positive class. |
neg_class |
(optional) The value of class that indicates the negative class. |
direction |
(character, optional) Use ">=" or "<=" to indicate whether x is supposed to be larger or smaller for the positive class. |
boot_runs |
(numerical) If positive, this number of bootstrap samples will be used to assess the variability and the out-of-sample performance. |
boot_stratify |
(logical) If the bootstrap is stratified, bootstrap samples are drawn separately in both classes and then combined, keeping the proportion of positives and negatives constant in every resample. |
use_midpoints |
(logical) If TRUE (default FALSE) the returned optimal cutpoint will be the mean of the optimal cutpoint and the next highest observation (for direction = ">=") or the next lowest observation (for direction = "<=") which avoids biasing the optimal cutpoint. |
break_ties |
If multiple cutpoints are found, they can be summarized using this function, e.g. mean or median. To return all cutpoints use c as the function. |
na.rm |
(logical) Set to TRUE (default FALSE) to keep only complete cases of x, class and subgroup (if specified). Missing values with na.rm = FALSE will raise an error. |
allowParallel |
(logical) If TRUE, the bootstrapping will be parallelized using foreach. A local cluster, for example, should be started manually beforehand. |
silent |
(logical) If TRUE suppresses all messages. |
tol_metric |
All cutpoints will be returned that lead to a metric
value in the interval [m_max - tol_metric, m_max + tol_metric] where
m_max is the maximum achievable metric value. This can be used to return
multiple decent cutpoints and to avoid floating-point problems. Not supported
by all |
... |
Further optional arguments that will be passed to method. minimize_metric and maximize_metric pass ... to metric. |
library(cutpointr) ## Optimal cutpoint for dsi data(suicide) opt_cut <- cutpointr_(suicide, "dsi", "suicide") opt_cut summary(opt_cut) plot(opt_cut) predict(opt_cut, newdata = data.frame(dsi = 0:5))
library(cutpointr) ## Optimal cutpoint for dsi data(suicide) opt_cut <- cutpointr_(suicide, "dsi", "suicide") opt_cut summary(opt_cut) plot(opt_cut) predict(opt_cut, newdata = data.frame(dsi = 0:5))
Calculate the F1-score from
true positives, false positives, true negatives and false negatives.
The inputs must be vectors of equal length.
F1_score = (2 * tp) / (2 * tp + fp + fn)
F1_score(tp, fp, tn, fn, ...)
F1_score(tp, fp, tn, fn, ...)
tp |
(numeric) number of true positives. |
fp |
(numeric) number of false positives. |
tn |
(numeric) number of true negatives. |
fn |
(numeric) number of false negatives. |
... |
for capturing additional arguments passed by method. |
Other metric functions:
Jaccard()
,
abs_d_ppv_npv()
,
abs_d_sens_spec()
,
accuracy()
,
cohens_kappa()
,
cutpoint()
,
false_omission_rate()
,
metric_constrain()
,
misclassification_cost()
,
npv()
,
odds_ratio()
,
p_chisquared()
,
plr()
,
ppv()
,
precision()
,
prod_ppv_npv()
,
prod_sens_spec()
,
recall()
,
risk_ratio()
,
roc01()
,
sensitivity()
,
specificity()
,
sum_ppv_npv()
,
sum_sens_spec()
,
total_utility()
,
tpr()
,
tp()
,
youden()
F1_score(10, 5, 20, 10) F1_score(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
F1_score(10, 5, 20, 10) F1_score(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
Calculate the false omission rate or false discovery rate
from true positives, false positives, true negatives and false negatives.
The inputs must be vectors of equal length.
false_omission_rate = fn / (tn + fn) = 1 - npv
false_discovery_rate = fp / (tp + fp) = 1 - ppv
false_omission_rate(tp, fp, tn, fn, ...) false_discovery_rate(tp, fp, tn, fn, ...)
false_omission_rate(tp, fp, tn, fn, ...) false_discovery_rate(tp, fp, tn, fn, ...)
tp |
(numeric) number of true positives. |
fp |
(numeric) number of false positives. |
tn |
(numeric) number of true negatives. |
fn |
(numeric) number of false negatives. |
... |
for capturing additional arguments passed by method. |
Other metric functions:
F1_score()
,
Jaccard()
,
abs_d_ppv_npv()
,
abs_d_sens_spec()
,
accuracy()
,
cohens_kappa()
,
cutpoint()
,
metric_constrain()
,
misclassification_cost()
,
npv()
,
odds_ratio()
,
p_chisquared()
,
plr()
,
ppv()
,
precision()
,
prod_ppv_npv()
,
prod_sens_spec()
,
recall()
,
risk_ratio()
,
roc01()
,
sensitivity()
,
specificity()
,
sum_ppv_npv()
,
sum_sens_spec()
,
total_utility()
,
tpr()
,
tp()
,
youden()
false_omission_rate(10, 5, 20, 10) false_omission_rate(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
false_omission_rate(10, 5, 20, 10) false_omission_rate(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
Calculate the Jaccard Index from
true positives, false positives, true negatives and false negatives.
The inputs must be vectors of equal length.
Jaccard = (tp) / (tp + fp + fn)
Jaccard(tp, fp, tn, fn, ...)
Jaccard(tp, fp, tn, fn, ...)
tp |
(numeric) number of true positives. |
fp |
(numeric) number of false positives. |
tn |
(numeric) number of true negatives. |
fn |
(numeric) number of false negatives. |
... |
for capturing additional arguments passed by method. |
Other metric functions:
F1_score()
,
abs_d_ppv_npv()
,
abs_d_sens_spec()
,
accuracy()
,
cohens_kappa()
,
cutpoint()
,
false_omission_rate()
,
metric_constrain()
,
misclassification_cost()
,
npv()
,
odds_ratio()
,
p_chisquared()
,
plr()
,
ppv()
,
precision()
,
prod_ppv_npv()
,
prod_sens_spec()
,
recall()
,
risk_ratio()
,
roc01()
,
sensitivity()
,
specificity()
,
sum_ppv_npv()
,
sum_sens_spec()
,
total_utility()
,
tpr()
,
tp()
,
youden()
Jaccard(10, 5, 20, 10) Jaccard(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
Jaccard(10, 5, 20, 10) Jaccard(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
Given a function for computing a metric in metric_func
, these functions
bootstrap the data boot_cut
times and
maximize or minimize the metric by selecting an optimal cutpoint. The returned
optimal cutpoint is the result of applying summary_func
, e.g. the mean,
to all optimal cutpoints that were determined in the bootstrap samples.
The metric
function should accept the following inputs:
tp
: vector of number of true positives
fp
: vector of number of false positives
tn
: vector of number of true negatives
fn
: vector of number of false negatives
maximize_boot_metric( data, x, class, metric_func = youden, pos_class = NULL, neg_class = NULL, direction, summary_func = mean, boot_cut = 50, boot_stratify, inf_rm = TRUE, tol_metric, use_midpoints, ... ) minimize_boot_metric( data, x, class, metric_func = youden, pos_class = NULL, neg_class = NULL, direction, summary_func = mean, boot_cut = 50, boot_stratify, inf_rm = TRUE, tol_metric, use_midpoints, ... )
maximize_boot_metric( data, x, class, metric_func = youden, pos_class = NULL, neg_class = NULL, direction, summary_func = mean, boot_cut = 50, boot_stratify, inf_rm = TRUE, tol_metric, use_midpoints, ... ) minimize_boot_metric( data, x, class, metric_func = youden, pos_class = NULL, neg_class = NULL, direction, summary_func = mean, boot_cut = 50, boot_stratify, inf_rm = TRUE, tol_metric, use_midpoints, ... )
data |
A data frame or tibble in which the columns that are given in x and class can be found. |
x |
(character) The variable name to be used for classification, e.g. predictions or test values. |
class |
(character) The variable name indicating class membership. |
metric_func |
(function) A function that computes a single number metric to be maximized. See description. |
pos_class |
The value of class that indicates the positive class. |
neg_class |
The value of class that indicates the negative class. |
direction |
(character) Use ">=" or "<=" to select whether an x value >= or <= the cutoff predicts the positive class. |
summary_func |
(function) After obtaining the bootstrapped optimal cutpoints this function, e.g. mean or median, is applied to arrive at a single cutpoint. |
boot_cut |
(numeric) Number of bootstrap repetitions over which the mean optimal cutpoint is calculated. |
boot_stratify |
(logical) If the bootstrap is stratified, bootstrap samples are drawn in both classes and then combined, keeping the number of positives and negatives constant in every resample. |
inf_rm |
(logical) whether to remove infinite cutpoints before calculating the summary. |
tol_metric |
All cutpoints will be passed to |
use_midpoints |
(logical) If TRUE (default FALSE) the returned optimal cutpoint will be the mean of the optimal cutpoint and the next highest observation (for direction = ">") or the next lowest observation (for direction = "<") which avoids biasing the optimal cutpoint. |
... |
To capture further arguments that are always passed to the method function by cutpointr. The cutpointr function passes data, x, class, metric_func, direction, pos_class and neg_class to the method function. |
The above inputs are arrived at by using all unique values in x
, Inf, and
-Inf as possible cutpoints for classifying the variable in class.
The reported metric represents the usual in-sample performance of the
determined cutpoint.
A tibble with the column optimal_cutpoint
Other method functions:
maximize_gam_metric()
,
maximize_loess_metric()
,
maximize_metric()
,
maximize_spline_metric()
,
oc_manual()
,
oc_mean()
,
oc_median()
,
oc_youden_kernel()
,
oc_youden_normal()
set.seed(100) cutpointr(suicide, dsi, suicide, method = maximize_boot_metric, metric = accuracy, boot_cut = 30) set.seed(100) cutpointr(suicide, dsi, suicide, method = minimize_boot_metric, metric = abs_d_sens_spec, boot_cut = 30)
set.seed(100) cutpointr(suicide, dsi, suicide, method = maximize_boot_metric, metric = accuracy, boot_cut = 30) set.seed(100) cutpointr(suicide, dsi, suicide, method = minimize_boot_metric, metric = abs_d_sens_spec, boot_cut = 30)
Given a function for computing a metric in metric_func
, these functions
smooth the function of metric value per cutpoint using generalized additive
models (as implemented in mgcv), then
maximize or minimize the metric by selecting an optimal cutpoint. For further details
on the GAM smoothing see ?mgcv::gam
.
The metric
function should accept the following inputs:
tp
: vector of number of true positives
fp
: vector of number of false positives
tn
: vector of number of true negatives
fn
: vector of number of false negatives
maximize_gam_metric( data, x, class, metric_func = youden, pos_class = NULL, neg_class = NULL, direction, formula = m ~ s(x.sorted), optimizer = c("outer", "newton"), tol_metric, use_midpoints, ... ) minimize_gam_metric( data, x, class, metric_func = youden, pos_class = NULL, neg_class = NULL, direction, formula = m ~ s(x.sorted), optimizer = c("outer", "newton"), tol_metric, use_midpoints, ... )
maximize_gam_metric( data, x, class, metric_func = youden, pos_class = NULL, neg_class = NULL, direction, formula = m ~ s(x.sorted), optimizer = c("outer", "newton"), tol_metric, use_midpoints, ... ) minimize_gam_metric( data, x, class, metric_func = youden, pos_class = NULL, neg_class = NULL, direction, formula = m ~ s(x.sorted), optimizer = c("outer", "newton"), tol_metric, use_midpoints, ... )
data |
A data frame or tibble in which the columns that are given in x and class can be found. |
x |
(character) The variable name to be used for classification, e.g. predictions or test values. |
class |
(character) The variable name indicating class membership. |
metric_func |
(function) A function that computes a metric to be maximized. See description. |
pos_class |
The value of class that indicates the positive class. |
neg_class |
The value of class that indicates the negative class. |
direction |
(character) Use ">=" or "<=" to select whether an x value >= or <= the cutoff predicts the positive class. |
formula |
A GAM formula. See |
optimizer |
An array specifying the numerical optimization method to
use to optimize the smoothing parameter estimation criterion (given by method).
See |
tol_metric |
All cutpoints will be returned that lead to a metric value in the interval [m_max - tol_metric, m_max + tol_metric] where m_max is the maximum achievable metric value. This can be used to return multiple decent cutpoints and to avoid floating-point problems. |
use_midpoints |
(logical) If TRUE (default FALSE) the returned optimal cutpoint will be the mean of the optimal cutpoint and the next highest observation (for direction = ">") or the next lowest observation (for direction = "<") which avoids biasing the optimal cutpoint. |
... |
Further arguments that will be passed to metric_func or the GAM smoother. |
The above inputs are arrived at by using all unique values in x
, Inf, and
-Inf as possible cutpoints for classifying the variable in class.
A tibble with the columns optimal_cutpoint
, the corresponding metric
value and roc_curve
, a nested tibble that includes all possible cutoffs
and the corresponding numbers of true and false positives / negatives and
all corresponding metric values.
Other method functions:
maximize_boot_metric()
,
maximize_loess_metric()
,
maximize_metric()
,
maximize_spline_metric()
,
oc_manual()
,
oc_mean()
,
oc_median()
,
oc_youden_kernel()
,
oc_youden_normal()
oc <- cutpointr(suicide, dsi, suicide, gender, method = maximize_gam_metric, metric = accuracy) plot_metric(oc) oc <- cutpointr(suicide, dsi, suicide, gender, method = minimize_gam_metric, metric = abs_d_sens_spec) plot_metric(oc)
oc <- cutpointr(suicide, dsi, suicide, gender, method = maximize_gam_metric, metric = accuracy) plot_metric(oc) oc <- cutpointr(suicide, dsi, suicide, gender, method = minimize_gam_metric, metric = abs_d_sens_spec) plot_metric(oc)
Given a function for computing a metric in metric_func
, these functions
smooth the function of metric value per cutpoint using LOESS, then
maximize or minimize the metric by selecting an optimal cutpoint. For further details
on the LOESS smoothing see ?fANCOVA::loess.as
.
The metric
function should accept the following inputs:
tp
: vector of number of true positives
fp
: vector of number of false positives
tn
: vector of number of true negatives
fn
: vector of number of false negatives
maximize_loess_metric( data, x, class, metric_func = youden, pos_class = NULL, neg_class = NULL, direction, criterion = "aicc", degree = 1, family = "symmetric", user.span = NULL, tol_metric, use_midpoints, ... ) minimize_loess_metric( data, x, class, metric_func = youden, pos_class = NULL, neg_class = NULL, direction, criterion = "aicc", degree = 1, family = "symmetric", user.span = NULL, tol_metric, use_midpoints, ... )
maximize_loess_metric( data, x, class, metric_func = youden, pos_class = NULL, neg_class = NULL, direction, criterion = "aicc", degree = 1, family = "symmetric", user.span = NULL, tol_metric, use_midpoints, ... ) minimize_loess_metric( data, x, class, metric_func = youden, pos_class = NULL, neg_class = NULL, direction, criterion = "aicc", degree = 1, family = "symmetric", user.span = NULL, tol_metric, use_midpoints, ... )
data |
A data frame or tibble in which the columns that are given in x and class can be found. |
x |
(character) The variable name to be used for classification, e.g. predictions or test values. |
class |
(character) The variable name indicating class membership. |
metric_func |
(function) A function that computes a metric to be maximized. See description. |
pos_class |
The value of class that indicates the positive class. |
neg_class |
The value of class that indicates the negative class. |
direction |
(character) Use ">=" or "<=" to select whether an x value >= or <= the cutoff predicts the positive class. |
criterion |
the criterion for automatic smoothing parameter selection: "aicc" denotes bias-corrected AIC criterion, "gcv" denotes generalized cross-validation. |
degree |
the degree of the local polynomials to be used. It can be 0, 1 or 2. |
family |
if "gaussian" fitting is by least-squares, and if "symmetric" a re-descending M estimator is used with Tukey's biweight function. |
user.span |
The user-defined parameter which controls the degree of smoothing |
tol_metric |
All cutpoints will be returned that lead to a metric value in the interval [m_max - tol_metric, m_max + tol_metric] where m_max is the maximum achievable metric value. This can be used to return multiple decent cutpoints and to avoid floating-point problems. |
use_midpoints |
(logical) If TRUE (default FALSE) the returned optimal cutpoint will be the mean of the optimal cutpoint and the next highest observation (for direction = ">") or the next lowest observation (for direction = "<") which avoids biasing the optimal cutpoint. |
... |
Further arguments that will be passed to metric_func or the loess smoother. |
The above inputs are arrived at by using all unique values in x
, Inf, and
-Inf as possible cutpoints for classifying the variable in class.
A tibble with the columns optimal_cutpoint
, the corresponding metric
value and roc_curve
, a nested tibble that includes all possible cutoffs
and the corresponding numbers of true and false positives / negatives and
all corresponding metric values.
Xiao-Feng Wang (2010). fANCOVA: Nonparametric Analysis of Covariance. https://CRAN.R-project.org/package=fANCOVA
Leeflang, M. M., Moons, K. G., Reitsma, J. B., & Zwinderman, A. H. (2008). Bias in sensitivity and specificity caused by data-driven selection of optimal cutoff values: mechanisms, magnitude, and solutions. Clinical Chemistry, (4), 729–738.
Other method functions:
maximize_boot_metric()
,
maximize_gam_metric()
,
maximize_metric()
,
maximize_spline_metric()
,
oc_manual()
,
oc_mean()
,
oc_median()
,
oc_youden_kernel()
,
oc_youden_normal()
oc <- cutpointr(suicide, dsi, suicide, gender, method = maximize_loess_metric, criterion = "aicc", family = "symmetric", degree = 2, user.span = 0.7, metric = accuracy) plot_metric(oc) oc <- cutpointr(suicide, dsi, suicide, gender, method = minimize_loess_metric, criterion = "aicc", family = "symmetric", degree = 2, user.span = 0.7, metric = misclassification_cost, cost_fp = 1, cost_fn = 10) plot_metric(oc)
oc <- cutpointr(suicide, dsi, suicide, gender, method = maximize_loess_metric, criterion = "aicc", family = "symmetric", degree = 2, user.span = 0.7, metric = accuracy) plot_metric(oc) oc <- cutpointr(suicide, dsi, suicide, gender, method = minimize_loess_metric, criterion = "aicc", family = "symmetric", degree = 2, user.span = 0.7, metric = misclassification_cost, cost_fp = 1, cost_fn = 10) plot_metric(oc)
Given a function for computing a metric in metric_func
, these functions
maximize or minimize that metric by selecting an optimal cutpoint.
The metric function should accept the following inputs:
tp
: vector of number of true positives
fp
: vector of number of false positives
tn
: vector of number of true negatives
fn
: vector of number of false negatives
maximize_metric( data, x, class, metric_func = youden, pos_class = NULL, neg_class = NULL, direction, tol_metric, use_midpoints, ... ) minimize_metric( data, x, class, metric_func = youden, pos_class = NULL, neg_class = NULL, direction, tol_metric, use_midpoints, ... )
maximize_metric( data, x, class, metric_func = youden, pos_class = NULL, neg_class = NULL, direction, tol_metric, use_midpoints, ... ) minimize_metric( data, x, class, metric_func = youden, pos_class = NULL, neg_class = NULL, direction, tol_metric, use_midpoints, ... )
data |
A data frame or tibble in which the columns that are given in x and class can be found. |
x |
(character) The variable name to be used for classification, e.g. predictions or test values. |
class |
(character) The variable name indicating class membership. |
metric_func |
(function) A function that computes a metric to be maximized. See description. |
pos_class |
The value of class that indicates the positive class. |
neg_class |
The value of class that indicates the negative class. |
direction |
(character) Use ">=" or "<=" to select whether an x value >= or <= the cutoff predicts the positive class. |
tol_metric |
All cutpoints will be returned that lead to a metric value in the interval [m_max - tol_metric, m_max + tol_metric] where m_max is the maximum achievable metric value. This can be used to return multiple decent cutpoints and to avoid floating-point problems. |
use_midpoints |
(logical) If TRUE (default FALSE) the returned optimal cutpoint will be the mean of the optimal cutpoint and the next highest observation (for direction = ">") or the next lowest observation (for direction = "<") which avoids biasing the optimal cutpoint. |
... |
Further arguments that will be passed to |
The above inputs are arrived at by using all unique values in x
, Inf, or
-Inf as possible cutpoints for classifying the variable in class.
A tibble with the columns optimal_cutpoint
, the corresponding metric
value and roc_curve
, a nested tibble that includes all possible cutoffs
and the corresponding numbers of true and false positives / negatives and
all corresponding metric values.
Other method functions:
maximize_boot_metric()
,
maximize_gam_metric()
,
maximize_loess_metric()
,
maximize_spline_metric()
,
oc_manual()
,
oc_mean()
,
oc_median()
,
oc_youden_kernel()
,
oc_youden_normal()
cutpointr(suicide, dsi, suicide, method = maximize_metric, metric = accuracy) cutpointr(suicide, dsi, suicide, method = minimize_metric, metric = abs_d_sens_spec)
cutpointr(suicide, dsi, suicide, method = maximize_metric, metric = accuracy) cutpointr(suicide, dsi, suicide, method = minimize_metric, metric = abs_d_sens_spec)
Given a function for computing a metric in metric_func
, this function
smoothes the function of metric value per cutpoint using smoothing splines. Then it
optimizes the metric by selecting an optimal cutpoint. For further details
on the smoothing spline see ?stats::smooth.spline
.
The metric
function should accept the following inputs:
tp
: vector of number of true positives
fp
: vector of number of false positives
tn
: vector of number of true negatives
fn
: vector of number of false negatives
maximize_spline_metric( data, x, class, metric_func = youden, pos_class = NULL, neg_class = NULL, direction, w = NULL, df = NULL, spar = 1, nknots = cutpoint_knots, df_offset = NULL, penalty = 1, control_spar = list(), tol_metric, use_midpoints, ... ) minimize_spline_metric( data, x, class, metric_func = youden, pos_class = NULL, neg_class = NULL, direction, w = NULL, df = NULL, spar = 1, nknots = cutpoint_knots, df_offset = NULL, penalty = 1, control_spar = list(), tol_metric, use_midpoints, ... )
maximize_spline_metric( data, x, class, metric_func = youden, pos_class = NULL, neg_class = NULL, direction, w = NULL, df = NULL, spar = 1, nknots = cutpoint_knots, df_offset = NULL, penalty = 1, control_spar = list(), tol_metric, use_midpoints, ... ) minimize_spline_metric( data, x, class, metric_func = youden, pos_class = NULL, neg_class = NULL, direction, w = NULL, df = NULL, spar = 1, nknots = cutpoint_knots, df_offset = NULL, penalty = 1, control_spar = list(), tol_metric, use_midpoints, ... )
data |
A data frame or tibble in which the columns that are given in x and class can be found. |
x |
(character) The variable name to be used for classification, e.g. predictions or test values. |
class |
(character) The variable name indicating class membership. |
metric_func |
(function) A function that computes a metric to be optimized. See description. |
pos_class |
The value of class that indicates the positive class. |
neg_class |
The value of class that indicates the negative class. |
direction |
(character) Use ">=" or "<=" to select whether an x value >= or <= the cutoff predicts the positive class. |
w |
Optional vector of weights of the same length as x; defaults to all 1. |
df |
The desired equivalent number of degrees of freedom (trace of the smoother matrix). Must be in (1,nx], nx the number of unique x values. |
spar |
Smoothing parameter, typically (but not necessarily) in (0,1]. When spar is specified, the coefficient lambda of the integral of the squared second derivative in the fit (penalized log likelihood) criterion is a monotone function of spar. |
nknots |
Integer or function giving the number of knots. The function should accept data and x (the name of the predictor variable) as inputs. By default nknots = 0.1 * log(n_dat / n_cut) * n_cut where n_dat is the number of observations and n_cut the number of unique predictor values. |
df_offset |
Allows the degrees of freedom to be increased by df_offset in the GCV criterion. |
penalty |
The coefficient of the penalty for degrees of freedom in the GCV criterion. |
control_spar |
Optional list with named components controlling the root finding when the smoothing parameter spar is computed, i.e., NULL. See help("smooth.spline") for further information. |
tol_metric |
All cutpoints will be returned that lead to a metric value in the interval [m_max - tol_metric, m_max + tol_metric] where m_max is the maximum achievable metric value. This can be used to return multiple decent cutpoints and to avoid floating-point problems. |
use_midpoints |
(logical) If TRUE (default FALSE) the returned optimal cutpoint will be the mean of the optimal cutpoint and the next highest observation (for direction = ">") or the next lowest observation (for direction = "<") which avoids biasing the optimal cutpoint. |
... |
Further arguments that will be passed to metric_func. |
The above inputs are arrived at by using all unique values in x
, Inf, and
-Inf as possible cutpoints for classifying the variable in class.
A tibble with the columns optimal_cutpoint
, the corresponding metric
value and roc_curve
, a nested tibble that includes all possible cutoffs
and the corresponding numbers of true and false positives / negatives and
all corresponding metric values.
Other method functions:
maximize_boot_metric()
,
maximize_gam_metric()
,
maximize_loess_metric()
,
maximize_metric()
,
oc_manual()
,
oc_mean()
,
oc_median()
,
oc_youden_kernel()
,
oc_youden_normal()
oc <- cutpointr(suicide, dsi, suicide, gender, method = maximize_spline_metric, df = 5, metric = accuracy) plot_metric(oc)
oc <- cutpointr(suicide, dsi, suicide, gender, method = maximize_spline_metric, df = 5, metric = accuracy) plot_metric(oc)
For example, calculate sensitivity where a lower bound (minimal desired value) for specificty can be defined. All returned metric values for cutpoints that lead to values of the constraining metric below the specified minimum will be zero. The inputs must be vectors of equal length.
metric_constrain( tp, fp, tn, fn, main_metric = sensitivity, constrain_metric = specificity, min_constrain = 0.5, suffix = "_constrain", ... ) sens_constrain( tp, fp, tn, fn, constrain_metric = specificity, min_constrain = 0.5, ... ) spec_constrain( tp, fp, tn, fn, constrain_metric = sensitivity, min_constrain = 0.5, ... ) acc_constrain( tp, fp, tn, fn, constrain_metric = sensitivity, min_constrain = 0.5, ... )
metric_constrain( tp, fp, tn, fn, main_metric = sensitivity, constrain_metric = specificity, min_constrain = 0.5, suffix = "_constrain", ... ) sens_constrain( tp, fp, tn, fn, constrain_metric = specificity, min_constrain = 0.5, ... ) spec_constrain( tp, fp, tn, fn, constrain_metric = sensitivity, min_constrain = 0.5, ... ) acc_constrain( tp, fp, tn, fn, constrain_metric = sensitivity, min_constrain = 0.5, ... )
tp |
(numeric) number of true positives. |
fp |
(numeric) number of false positives. |
tn |
(numeric) number of true negatives. |
fn |
(numeric) number of false negatives. |
main_metric |
Metric to be optimized. |
constrain_metric |
Metric for constraint. |
min_constrain |
Minimum desired value of constrain_metric. |
suffix |
Character string to be added to the name of main_metric. |
... |
for capturing additional arguments passed by method. |
Other metric functions:
F1_score()
,
Jaccard()
,
abs_d_ppv_npv()
,
abs_d_sens_spec()
,
accuracy()
,
cohens_kappa()
,
cutpoint()
,
false_omission_rate()
,
misclassification_cost()
,
npv()
,
odds_ratio()
,
p_chisquared()
,
plr()
,
ppv()
,
precision()
,
prod_ppv_npv()
,
prod_sens_spec()
,
recall()
,
risk_ratio()
,
roc01()
,
sensitivity()
,
specificity()
,
sum_ppv_npv()
,
sum_sens_spec()
,
total_utility()
,
tpr()
,
tp()
,
youden()
## Maximum sensitivity when Positive Predictive Value (PPV) is at least 75% library(dplyr) library(purrr) library(cutpointr) cp <- cutpointr(data = suicide, x = dsi, class = suicide, method = maximize_metric, metric = sens_constrain, constrain_metric = ppv, min_constrain = 0.75) ## All metric values (m) where PPV < 0.75 are zero plot_metric(cp) cp$roc_curve ## We can confirm that PPV is indeed >= 0.75 cp %>% add_metric(list(ppv)) ## We can also do so for the complete ROC curve(s) cp %>% pull(roc_curve) %>% map(~ add_metric(., list(sensitivity, ppv))) ## Use the metric_constrain function for a combination of any two metrics ## Estimate optimal cutpoint for precision given a recall of at least 70% cp <- cutpointr(data = suicide, x = dsi, class = suicide, subgroup = gender, method = maximize_metric, metric = metric_constrain, main_metric = precision, suffix = "_constrained", constrain_metric = recall, min_constrain = 0.70) ## All metric values (m) where recall < 0.7 are zero plot_metric(cp) ## We can confirm that recall is indeed >= 0.70 and that precision_constrain ## is identical to precision for the estimated cutpoint cp %>% add_metric(list(recall, precision)) ## We can also do so for the complete ROC curve(s) cp %>% pull(roc_curve) %>% map(~ add_metric(., list(recall, precision)))
## Maximum sensitivity when Positive Predictive Value (PPV) is at least 75% library(dplyr) library(purrr) library(cutpointr) cp <- cutpointr(data = suicide, x = dsi, class = suicide, method = maximize_metric, metric = sens_constrain, constrain_metric = ppv, min_constrain = 0.75) ## All metric values (m) where PPV < 0.75 are zero plot_metric(cp) cp$roc_curve ## We can confirm that PPV is indeed >= 0.75 cp %>% add_metric(list(ppv)) ## We can also do so for the complete ROC curve(s) cp %>% pull(roc_curve) %>% map(~ add_metric(., list(sensitivity, ppv))) ## Use the metric_constrain function for a combination of any two metrics ## Estimate optimal cutpoint for precision given a recall of at least 70% cp <- cutpointr(data = suicide, x = dsi, class = suicide, subgroup = gender, method = maximize_metric, metric = metric_constrain, main_metric = precision, suffix = "_constrained", constrain_metric = recall, min_constrain = 0.70) ## All metric values (m) where recall < 0.7 are zero plot_metric(cp) ## We can confirm that recall is indeed >= 0.70 and that precision_constrain ## is identical to precision for the estimated cutpoint cp %>% add_metric(list(recall, precision)) ## We can also do so for the complete ROC curve(s) cp %>% pull(roc_curve) %>% map(~ add_metric(., list(recall, precision)))
Calculate the misclassification cost from
true positives, false positives, true negatives and false negatives.
The inputs must be vectors of equal length.
misclassification_cost = cost_fp * fp + cost_fn * fn
misclassification_cost(tp, fp, tn, fn, cost_fp = 1, cost_fn = 1, ...)
misclassification_cost(tp, fp, tn, fn, cost_fp = 1, cost_fn = 1, ...)
tp |
(numeric) number of true positives. |
fp |
(numeric) number of false positives. |
tn |
(numeric) number of true negatives. |
fn |
(numeric) number of false negatives. |
cost_fp |
(numeric) the cost of a false positive |
cost_fn |
(numeric) the cost of a false negative |
... |
for capturing additional arguments passed by method. |
Other metric functions:
F1_score()
,
Jaccard()
,
abs_d_ppv_npv()
,
abs_d_sens_spec()
,
accuracy()
,
cohens_kappa()
,
cutpoint()
,
false_omission_rate()
,
metric_constrain()
,
npv()
,
odds_ratio()
,
p_chisquared()
,
plr()
,
ppv()
,
precision()
,
prod_ppv_npv()
,
prod_sens_spec()
,
recall()
,
risk_ratio()
,
roc01()
,
sensitivity()
,
specificity()
,
sum_ppv_npv()
,
sum_sens_spec()
,
total_utility()
,
tpr()
,
tp()
,
youden()
misclassification_cost(10, 5, 20, 10, cost_fp = 1, cost_fn = 5) misclassification_cost(c(10, 8), c(5, 7), c(20, 12), c(10, 18), cost_fp = 1, cost_fn = 5)
misclassification_cost(10, 5, 20, 10, cost_fp = 1, cost_fn = 5) misclassification_cost(c(10, 8), c(5, 7), c(20, 12), c(10, 18), cost_fp = 1, cost_fn = 5)
Runs cutpointr
over multiple predictor variables. Tidyeval via
!!
is supported for class
and subgroup
. If
x = NULL
, cutpointr
will be run using all numeric columns
in the data set as predictors except for the
variable in class
and, if given, subgroup
.
multi_cutpointr(data, x = NULL, class, subgroup = NULL, silent = FALSE, ...)
multi_cutpointr(data, x = NULL, class, subgroup = NULL, silent = FALSE, ...)
data |
A data frame. |
x |
Character vector of predictor variables. If NULL all numeric columns. |
class |
The name of the outcome / independent variable. |
subgroup |
An additional covariate that identifies subgroups. Separate optimal cutpoints will be determined per group. |
silent |
Whether to suppress messages. |
... |
Further arguments to be passed to cutpointr, e.g., boot_runs |
The automatic determination of positive / negative classes and direction
will be carried out separately for every predictor variable. That way, if
direction
and the classes are not specified, the reported AUC for every
variable will be >= 0.5. AUC may be < 0.5 if subgroups are specified as
direction
is equal within every subgroup.
A data frame.
Other main cutpointr functions:
add_metric()
,
boot_ci()
,
boot_test()
,
cutpointr()
,
predict.cutpointr()
,
roc()
library(cutpointr) multi_cutpointr(suicide, x = c("age", "dsi"), class = suicide, pos_class = "yes") mcp <- multi_cutpointr(suicide, x = c("age", "dsi"), class = suicide, subgroup = gender, pos_class = "yes") mcp (scp <- summary(mcp)) ## Not run: ## The result is a data frame tibble:::print.tbl(scp) ## End(Not run)
library(cutpointr) multi_cutpointr(suicide, x = c("age", "dsi"), class = suicide, pos_class = "yes") mcp <- multi_cutpointr(suicide, x = c("age", "dsi"), class = suicide, subgroup = gender, pos_class = "yes") mcp (scp <- summary(mcp)) ## Not run: ## The result is a data frame tibble:::print.tbl(scp) ## End(Not run)
Calculate the negative predictive value (NPV)
from true positives, false positives, true negatives and false negatives.
The inputs must be vectors of equal length.
npv = tn / (tn + fn)
npv(tp, fp, tn, fn, ...)
npv(tp, fp, tn, fn, ...)
tp |
(numeric) number of true positives. |
fp |
(numeric) number of false positives. |
tn |
(numeric) number of true negatives. |
fn |
(numeric) number of false negatives. |
... |
for capturing additional arguments passed by method. |
Other metric functions:
F1_score()
,
Jaccard()
,
abs_d_ppv_npv()
,
abs_d_sens_spec()
,
accuracy()
,
cohens_kappa()
,
cutpoint()
,
false_omission_rate()
,
metric_constrain()
,
misclassification_cost()
,
odds_ratio()
,
p_chisquared()
,
plr()
,
ppv()
,
precision()
,
prod_ppv_npv()
,
prod_sens_spec()
,
recall()
,
risk_ratio()
,
roc01()
,
sensitivity()
,
specificity()
,
sum_ppv_npv()
,
sum_sens_spec()
,
total_utility()
,
tpr()
,
tp()
,
youden()
npv(10, 5, 20, 10) npv(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
npv(10, 5, 20, 10) npv(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
This function simply returns cutpoint
as the optimal cutpoint.
Mainly useful if bootstrap estimates of the out-of-bag performance of a
given cutpoint are desired, e.g. taking a cutpoint value from the literature.
oc_manual(cutpoint, ...)
oc_manual(cutpoint, ...)
cutpoint |
(numeric) The fixed cutpoint. |
... |
To capture further arguments that are always passed to the method function by cutpointr. The cutpointr function passes data, x, class, metric_func, direction, pos_class and neg_class to the method function. |
Other method functions:
maximize_boot_metric()
,
maximize_gam_metric()
,
maximize_loess_metric()
,
maximize_metric()
,
maximize_spline_metric()
,
oc_mean()
,
oc_median()
,
oc_youden_kernel()
,
oc_youden_normal()
cutpointr(suicide, dsi, suicide, method = oc_manual, cutpoint = 4)
cutpointr(suicide, dsi, suicide, method = oc_manual, cutpoint = 4)
The sample mean is calculated and returned as the optimal cutpoint.
oc_mean(data, x, trim = 0, ...)
oc_mean(data, x, trim = 0, ...)
data |
A data frame or tibble in which the columns that are given in x and class can be found. |
x |
(character) The variable name to be used for classification, e.g. predictions or test values. |
trim |
The fraction (0 to 0.5) of observations to be trimmed from each end of x before the mean is computed. Values of trim outside that range are taken as the nearest endpoint. |
... |
To capture further arguments that are always passed to the method function by cutpointr. The cutpointr function passes data, x, class, metric_func, direction, pos_class and neg_class to the method function. |
Other method functions:
maximize_boot_metric()
,
maximize_gam_metric()
,
maximize_loess_metric()
,
maximize_metric()
,
maximize_spline_metric()
,
oc_manual()
,
oc_median()
,
oc_youden_kernel()
,
oc_youden_normal()
data(suicide) oc_mean(suicide, "dsi") cutpointr(suicide, dsi, suicide, method = oc_mean)
data(suicide) oc_mean(suicide, "dsi") cutpointr(suicide, dsi, suicide, method = oc_mean)
The sample median is calculated and returned as the optimal cutpoint.
oc_median(data, x, ...)
oc_median(data, x, ...)
data |
A data frame or tibble in which the columns that are given in x and class can be found. |
x |
(character) The variable name to be used for classification, e.g. predictions or test values. |
... |
To capture further arguments that are always passed to the method function by cutpointr. The cutpointr function passes data, x, class, metric_func, direction, pos_class and neg_class to the method function. |
Other method functions:
maximize_boot_metric()
,
maximize_gam_metric()
,
maximize_loess_metric()
,
maximize_metric()
,
maximize_spline_metric()
,
oc_manual()
,
oc_mean()
,
oc_youden_kernel()
,
oc_youden_normal()
data(suicide) oc_median(suicide, "dsi") cutpointr(suicide, dsi, suicide, method = oc_median)
data(suicide) oc_median(suicide, "dsi") cutpointr(suicide, dsi, suicide, method = oc_median)
Instead of searching for an optimal cutpoint to maximize (sensitivity +
specificity - 1) on the ROC curve, this function first smoothes the empirical
distributions of x
per class. The smoothing is done using a binned kernel
density estimate. The bandwidth is automatically selected using the direct
plug-in method.
oc_youden_kernel(data, x, class, pos_class, neg_class, direction, ...)
oc_youden_kernel(data, x, class, pos_class, neg_class, direction, ...)
data |
A data frame or tibble in which the columns that are given in x and class can be found. |
x |
(character) The variable name to be used for classification, e.g. predictions or test values. |
class |
(character) The variable name indicating class membership. |
pos_class |
The value of class that indicates the positive class. |
neg_class |
The value of class that indicates the negative class. |
direction |
(character) Use ">=" or "<=" to select whether an x value >= or <= the cutoff predicts the positive class. |
... |
To capture further arguments that are always passed to the method function by cutpointr. The cutpointr function passes data, x, class, metric_func, direction, pos_class and neg_class to the method function. |
The functions for calculating the kernel density estimate and the bandwidth are both from KernSmooth with default parameters, except for the bandwidth selection, which uses the standard deviation as scale estimate.
The cutpoint is estimated as the cutpoint that maximizes the Youden-Index
given by where
and
are the smoothed distribution functions.
Fluss, R., Faraggi, D., & Reiser, B. (2005). Estimation of the Youden Index and its associated cutoff point. Biometrical Journal, 47(4), 458–472.
Matt Wand (2015). KernSmooth: Functions for Kernel Smoothing Supporting Wand & Jones (1995). R package version 2.23-15. https://CRAN.R-project.org/package=KernSmooth
Other method functions:
maximize_boot_metric()
,
maximize_gam_metric()
,
maximize_loess_metric()
,
maximize_metric()
,
maximize_spline_metric()
,
oc_manual()
,
oc_mean()
,
oc_median()
,
oc_youden_normal()
data(suicide) if (require(KernSmooth)) { oc_youden_kernel(suicide, "dsi", "suicide", oc_metric = "Youden", pos_class = "yes", neg_class = "no", direction = ">=") ## Within cutpointr cutpointr(suicide, dsi, suicide, method = oc_youden_kernel) }
data(suicide) if (require(KernSmooth)) { oc_youden_kernel(suicide, "dsi", "suicide", oc_metric = "Youden", pos_class = "yes", neg_class = "no", direction = ">=") ## Within cutpointr cutpointr(suicide, dsi, suicide, method = oc_youden_kernel) }
An optimal cutpoint maximizing the Youden- or J-Index (sensitivity + specificity - 1) is calculated parametrically assuming normal distributions per class.
oc_youden_normal( data, x, class, pos_class = NULL, neg_class = NULL, direction, ... )
oc_youden_normal( data, x, class, pos_class = NULL, neg_class = NULL, direction, ... )
data |
A data frame or tibble in which the columns that are given in x and class can be found. |
x |
(character) The variable name to be used for classification, e.g. predictions or test values. |
class |
(character) The variable name indicating class membership. |
pos_class |
The value of class that indicates the positive class. |
neg_class |
The value of class that indicates the negative class. |
direction |
(character) Use ">=" or "<=" to select whether an x value >= or <= the cutoff predicts the positive class. |
... |
To capture further arguments that are always passed to the method function by cutpointr. The cutpointr function passes data, x, class, metric_func, direction, pos_class and neg_class to the method function. |
Other method functions:
maximize_boot_metric()
,
maximize_gam_metric()
,
maximize_loess_metric()
,
maximize_metric()
,
maximize_spline_metric()
,
oc_manual()
,
oc_mean()
,
oc_median()
,
oc_youden_kernel()
data(suicide) oc_youden_normal(suicide, "dsi", "suicide", pos_class = "yes", neg_class = "no", direction = ">=") cutpointr(suicide, dsi, suicide, method = oc_youden_normal)
data(suicide) oc_youden_normal(suicide, "dsi", "suicide", pos_class = "yes", neg_class = "no", direction = ">=") cutpointr(suicide, dsi, suicide, method = oc_youden_normal)
Calculate the (diagnostic) odds ratio from
true positives, false positives, true negatives and false negatives.
The inputs must be vectors of equal length.
odds_ratio = (tp / fp) / (fn / tn)
odds_ratio(tp, fp, tn, fn, ...)
odds_ratio(tp, fp, tn, fn, ...)
tp |
(numeric) number of true positives. |
fp |
(numeric) number of false positives. |
tn |
(numeric) number of true negatives. |
fn |
(numeric) number of false negatives. |
... |
for capturing additional arguments passed by method. |
Other metric functions:
F1_score()
,
Jaccard()
,
abs_d_ppv_npv()
,
abs_d_sens_spec()
,
accuracy()
,
cohens_kappa()
,
cutpoint()
,
false_omission_rate()
,
metric_constrain()
,
misclassification_cost()
,
npv()
,
p_chisquared()
,
plr()
,
ppv()
,
precision()
,
prod_ppv_npv()
,
prod_sens_spec()
,
recall()
,
risk_ratio()
,
roc01()
,
sensitivity()
,
specificity()
,
sum_ppv_npv()
,
sum_sens_spec()
,
total_utility()
,
tpr()
,
tp()
,
youden()
odds_ratio(10, 5, 20, 10) odds_ratio(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
odds_ratio(10, 5, 20, 10) odds_ratio(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
Calculate the p-value of a chi-squared test from true positives, false positives, true negatives and false negatives. The inputs must be vectors of equal length.
p_chisquared(tp, fp, tn, fn, ...)
p_chisquared(tp, fp, tn, fn, ...)
tp |
(numeric) number of true positives. |
fp |
(numeric) number of false positives. |
tn |
(numeric) number of true negatives. |
fn |
(numeric) number of false negatives. |
... |
for capturing additional arguments passed by method. |
Other metric functions:
F1_score()
,
Jaccard()
,
abs_d_ppv_npv()
,
abs_d_sens_spec()
,
accuracy()
,
cohens_kappa()
,
cutpoint()
,
false_omission_rate()
,
metric_constrain()
,
misclassification_cost()
,
npv()
,
odds_ratio()
,
plr()
,
ppv()
,
precision()
,
prod_ppv_npv()
,
prod_sens_spec()
,
recall()
,
risk_ratio()
,
roc01()
,
sensitivity()
,
specificity()
,
sum_ppv_npv()
,
sum_sens_spec()
,
total_utility()
,
tpr()
,
tp()
,
youden()
p_chisquared(10, 5, 20, 10) p_chisquared(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
p_chisquared(10, 5, 20, 10) p_chisquared(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
Given a cutpointr object this function plots the bootstrapped distribution
of optimal cutpoints. cutpointr
has to be run with boot_runs
' > 0
to enable bootstrapping.
plot_cut_boot(x, ...)
plot_cut_boot(x, ...)
x |
A cutpointr object. |
... |
Additional arguments (unused). |
Other cutpointr plotting functions:
plot.cutpointr()
,
plot_cutpointr()
,
plot_metric_boot()
,
plot_metric()
,
plot_precision_recall()
,
plot_roc()
,
plot_sensitivity_specificity()
,
plot_x()
set.seed(100) opt_cut <- cutpointr(suicide, dsi, suicide, boot_runs = 10) plot_cut_boot(opt_cut)
set.seed(100) opt_cut <- cutpointr(suicide, dsi, suicide, boot_runs = 10) plot_cut_boot(opt_cut)
Flexibly plot various metrics against all cutpoints or any other metric.
The function can plot any metric based on a cutpointr
or roc_cutpointr
object. If cutpointr
was run with bootstrapping, bootstrapped confidence
intervals can be plotted. These represent the quantiles of the distribution
of the y-variable grouped by x-variable over all bootstrap repetitions.
plot_cutpointr( x, xvar = cutpoint, yvar = sum_sens_spec, conf_lvl = 0.95, aspect_ratio = NULL )
plot_cutpointr( x, xvar = cutpoint, yvar = sum_sens_spec, conf_lvl = 0.95, aspect_ratio = NULL )
x |
A |
xvar |
A function, typically |
yvar |
A function, typically a metric function. |
conf_lvl |
(numeric) If bootstrapping was run and x is a cutpointr object, a confidence interval at the level of conf_lvl can be plotted. To plot no confidence interval set conf_lvl = 0. |
aspect_ratio |
(numeric) Set to 1 to obtain a quadratic plot, e.g. for plotting a ROC curve. |
The arguments to xvar
and yvar
should be metric functions. Any metric
function that is suitable for cutpointr
can also be used in plot_cutpointr
.
Anonymous functions are also allowed.
To plot all possible cutpoints, the utility function cutpoint
can be used.
The functions for xvar
and yvar
may accept any or all of the arguments
tp
, fp
, tn
, or fn
and return a numeric vector,
a matrix or a data.frame
.
For more details on metric functions see vignette("cutpointr")
.
Note that confidence intervals can only be correctly plotted if the values of xvar
are constant across bootstrap samples. For example, confidence intervals for
tpr
by fpr
(a ROC curve) cannot be plotted, as the values of the false positive
rate vary per bootstrap sample.
Other cutpointr plotting functions:
plot.cutpointr()
,
plot_cut_boot()
,
plot_metric_boot()
,
plot_metric()
,
plot_precision_recall()
,
plot_roc()
,
plot_sensitivity_specificity()
,
plot_x()
set.seed(1) oc <- cutpointr(suicide, dsi, suicide, boot_runs = 10) plot_cutpointr(oc, cutpoint, F1_score) ## ROC curve plot_cutpointr(oc, fpr, tpr, aspect_ratio = 1) ## Custom function plot_cutpointr(oc, cutpoint, function(tp, tn, fp, fn, ...) tp / fp) + ggplot2::ggtitle("Custom metric") + ggplot2::ylab("value")
set.seed(1) oc <- cutpointr(suicide, dsi, suicide, boot_runs = 10) plot_cutpointr(oc, cutpoint, F1_score) ## ROC curve plot_cutpointr(oc, fpr, tpr, aspect_ratio = 1) ## Custom function plot_cutpointr(oc, cutpoint, function(tp, tn, fp, fn, ...) tp / fp) + ggplot2::ggtitle("Custom metric") + ggplot2::ylab("value")
If maximize_metric
is used as method
function in cutpointr the computed
metric values over all possible cutoffs can be plotted. Generally, this
works for method functions that return a ROC-curve including the metric
value for every cutpoint along with the optimal cutpoint.
plot_metric(x, conf_lvl = 0.95, add_unsmoothed = TRUE)
plot_metric(x, conf_lvl = 0.95, add_unsmoothed = TRUE)
x |
A cutpointr object. |
conf_lvl |
The confidence level of the bootstrap confidence interval. Set to 0 to draw no bootstrap confidence interval. |
add_unsmoothed |
Add the line of unsmoothed metric values to the plot. Applicable for some smoothing methods, e.g. maximize_gam_metric. |
Other cutpointr plotting functions:
plot.cutpointr()
,
plot_cut_boot()
,
plot_cutpointr()
,
plot_metric_boot()
,
plot_precision_recall()
,
plot_roc()
,
plot_sensitivity_specificity()
,
plot_x()
Other cutpointr plotting functions:
plot.cutpointr()
,
plot_cut_boot()
,
plot_cutpointr()
,
plot_metric_boot()
,
plot_precision_recall()
,
plot_roc()
,
plot_sensitivity_specificity()
,
plot_x()
opt_cut <- cutpointr(suicide, dsi, suicide) plot_metric(opt_cut)
opt_cut <- cutpointr(suicide, dsi, suicide) plot_metric(opt_cut)
Given a cutpointr
object this function plots the bootstrapped metric distribution,
i.e. the distribution of out-of-bag metric values.
The metric depends on the function that was supplied to metric
in the
call to cutpointr
.
The cutpointr
function has to be run with boot_runs
' > 0 to enable bootstrapping.
plot_metric_boot(x, ...)
plot_metric_boot(x, ...)
x |
A cutpointr object. |
... |
Additional arguments (unused) |
Other cutpointr plotting functions:
plot.cutpointr()
,
plot_cut_boot()
,
plot_cutpointr()
,
plot_metric()
,
plot_precision_recall()
,
plot_roc()
,
plot_sensitivity_specificity()
,
plot_x()
set.seed(300) opt_cut <- cutpointr(suicide, dsi, suicide, boot_runs = 10) plot_metric_boot(opt_cut)
set.seed(300) opt_cut <- cutpointr(suicide, dsi, suicide, boot_runs = 10) plot_metric_boot(opt_cut)
Given a cutpointr
object this function plots the precision recall curve(s)
per subgroup, if given.
plot_precision_recall(x, display_cutpoint = TRUE, ...)
plot_precision_recall(x, display_cutpoint = TRUE, ...)
x |
A cutpointr object. |
display_cutpoint |
(logical) Whether or not to display the optimal cutpoint as a dot on the precision recall curve. |
... |
Additional arguments (unused). |
Other cutpointr plotting functions:
plot.cutpointr()
,
plot_cut_boot()
,
plot_cutpointr()
,
plot_metric_boot()
,
plot_metric()
,
plot_roc()
,
plot_sensitivity_specificity()
,
plot_x()
library(cutpointr) ## Optimal cutpoint for dsi data(suicide) opt_cut <- cutpointr(suicide, dsi, suicide) plot_precision_recall(opt_cut)
library(cutpointr) ## Optimal cutpoint for dsi data(suicide) opt_cut <- cutpointr(suicide, dsi, suicide) plot_precision_recall(opt_cut)
Given a cutpointr
object this function plots the ROC curve(s)
per subgroup, if given. Also plots a ROC curve from the output of roc()
.
plot_roc(x, ...) ## S3 method for class 'cutpointr' plot_roc(x, display_cutpoint = TRUE, type = "line", ...) ## S3 method for class 'roc_cutpointr' plot_roc(x, type = "line", ...)
plot_roc(x, ...) ## S3 method for class 'cutpointr' plot_roc(x, display_cutpoint = TRUE, type = "line", ...) ## S3 method for class 'roc_cutpointr' plot_roc(x, type = "line", ...)
x |
A cutpointr or roc_cutpointr object. |
... |
Additional arguments (unused). |
display_cutpoint |
(logical) Whether or not to display the optimal cutpoint as a dot on the ROC curve for cutpointr objects. |
type |
"line" for line plot (default) or "step" for step plot. |
Other cutpointr plotting functions:
plot.cutpointr()
,
plot_cut_boot()
,
plot_cutpointr()
,
plot_metric_boot()
,
plot_metric()
,
plot_precision_recall()
,
plot_sensitivity_specificity()
,
plot_x()
opt_cut <- cutpointr(suicide, dsi, suicide) plot_roc(opt_cut, display_cutpoint = FALSE) opt_cut_2groups <- cutpointr(suicide, dsi, suicide, gender) plot_roc(opt_cut_2groups, display_cutpoint = TRUE) roc_curve <- roc(suicide, x = dsi, class = suicide, pos_class = "yes", neg_class = "no", direction = ">=") plot(roc_curve) auc(roc_curve)
opt_cut <- cutpointr(suicide, dsi, suicide) plot_roc(opt_cut, display_cutpoint = FALSE) opt_cut_2groups <- cutpointr(suicide, dsi, suicide, gender) plot_roc(opt_cut_2groups, display_cutpoint = TRUE) roc_curve <- roc(suicide, x = dsi, class = suicide, pos_class = "yes", neg_class = "no", direction = ">=") plot(roc_curve) auc(roc_curve)
Given a cutpointr
object this function plots the sensitivity and specificity
curve(s) per subgroup, if the latter is given.
plot_sensitivity_specificity(x, display_cutpoint = TRUE, ...)
plot_sensitivity_specificity(x, display_cutpoint = TRUE, ...)
x |
A cutpointr object. |
display_cutpoint |
(logical) Whether or not to display the optimal cutpoint as a dot on the precision recall curve. |
... |
Additional arguments (unused). |
Other cutpointr plotting functions:
plot.cutpointr()
,
plot_cut_boot()
,
plot_cutpointr()
,
plot_metric_boot()
,
plot_metric()
,
plot_precision_recall()
,
plot_roc()
,
plot_x()
library(cutpointr) ## Optimal cutpoint for dsi data(suicide) opt_cut <- cutpointr(suicide, dsi, suicide) plot_sensitivity_specificity(opt_cut)
library(cutpointr) ## Optimal cutpoint for dsi data(suicide) opt_cut <- cutpointr(suicide, dsi, suicide) plot_sensitivity_specificity(opt_cut)
Given a cutpointr
object this function plots the distribution(s) of the
independent variable(s) and the respective cutpoints per class.
plot_x(x, display_cutpoint = TRUE, ...)
plot_x(x, display_cutpoint = TRUE, ...)
x |
A cutpointr object. |
display_cutpoint |
(logical) Whether or not to display the optimal cutpoint as a vertical line. |
... |
Additional arguments (unused). |
Other cutpointr plotting functions:
plot.cutpointr()
,
plot_cut_boot()
,
plot_cutpointr()
,
plot_metric_boot()
,
plot_metric()
,
plot_precision_recall()
,
plot_roc()
,
plot_sensitivity_specificity()
opt_cut <- cutpointr(suicide, dsi, suicide) plot_x(opt_cut) ## With subgroup opt_cut_2groups <- cutpointr(suicide, dsi, suicide, gender) plot_x(opt_cut_2groups)
opt_cut <- cutpointr(suicide, dsi, suicide) plot_x(opt_cut) ## With subgroup opt_cut_2groups <- cutpointr(suicide, dsi, suicide, gender) plot_x(opt_cut_2groups)
The plot layout depends on whether subgroups were defined and whether bootstrapping was run.
## S3 method for class 'cutpointr' plot(x, ...)
## S3 method for class 'cutpointr' plot(x, ...)
x |
A cutpointr object. |
... |
Further arguments. |
The ...
argument can be used to apply ggplot2 functions to every individual
plot, for example for changing the theme.
Other cutpointr plotting functions:
plot_cut_boot()
,
plot_cutpointr()
,
plot_metric_boot()
,
plot_metric()
,
plot_precision_recall()
,
plot_roc()
,
plot_sensitivity_specificity()
,
plot_x()
opt_cut <- cutpointr(suicide, dsi, suicide, gender) plot(opt_cut) plot(opt_cut, ggplot2::theme_bw())
opt_cut <- cutpointr(suicide, dsi, suicide, gender) plot(opt_cut) plot(opt_cut, ggplot2::theme_bw())
You can try plotting the data manually instead.
## S3 method for class 'multi_cutpointr' plot(x, ...)
## S3 method for class 'multi_cutpointr' plot(x, ...)
x |
A multi_cutpointr object. |
... |
Further arguments. |
Given a cutpointr
object this function plots the ROC curve(s)
per subgroup, if given. Also plots a ROC curve from the output of roc()
.
## S3 method for class 'roc_cutpointr' plot(x, type = "line", ...)
## S3 method for class 'roc_cutpointr' plot(x, type = "line", ...)
x |
A cutpointr or roc_cutpointr object. |
type |
"line" for line plot (default) or "step" for step plot. |
... |
Additional arguments (unused). |
Other cutpointr plotting functions:
plot.cutpointr()
,
plot_cut_boot()
,
plot_cutpointr()
,
plot_metric_boot()
,
plot_metric()
,
plot_precision_recall()
,
plot_sensitivity_specificity()
,
plot_x()
opt_cut <- cutpointr(suicide, dsi, suicide) plot_roc(opt_cut, display_cutpoint = FALSE) opt_cut_2groups <- cutpointr(suicide, dsi, suicide, gender) plot_roc(opt_cut_2groups, display_cutpoint = TRUE) roc_curve <- roc(suicide, x = dsi, class = suicide, pos_class = "yes", neg_class = "no", direction = ">=") plot(roc_curve) auc(roc_curve)
opt_cut <- cutpointr(suicide, dsi, suicide) plot_roc(opt_cut, display_cutpoint = FALSE) opt_cut_2groups <- cutpointr(suicide, dsi, suicide, gender) plot_roc(opt_cut_2groups, display_cutpoint = TRUE) roc_curve <- roc(suicide, x = dsi, class = suicide, pos_class = "yes", neg_class = "no", direction = ">=") plot(roc_curve) auc(roc_curve)
Calculate the positive or negative likelihood ratio
from true positives, false positives, true negatives and false negatives.
The inputs must be vectors of equal length.
plr = tpr / fpr
nlr = fnr / tnr
plr(tp, fp, tn, fn, ...) nlr(tp, fp, tn, fn, ...)
plr(tp, fp, tn, fn, ...) nlr(tp, fp, tn, fn, ...)
tp |
(numeric) number of true positives. |
fp |
(numeric) number of false positives. |
tn |
(numeric) number of true negatives. |
fn |
(numeric) number of false negatives. |
... |
for capturing additional arguments passed by method. |
Other metric functions:
F1_score()
,
Jaccard()
,
abs_d_ppv_npv()
,
abs_d_sens_spec()
,
accuracy()
,
cohens_kappa()
,
cutpoint()
,
false_omission_rate()
,
metric_constrain()
,
misclassification_cost()
,
npv()
,
odds_ratio()
,
p_chisquared()
,
ppv()
,
precision()
,
prod_ppv_npv()
,
prod_sens_spec()
,
recall()
,
risk_ratio()
,
roc01()
,
sensitivity()
,
specificity()
,
sum_ppv_npv()
,
sum_sens_spec()
,
total_utility()
,
tpr()
,
tp()
,
youden()
plr(10, 5, 20, 10) plr(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
plr(10, 5, 20, 10) plr(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
Calculate the positive predictive value (PPV) from
true positives, false positives, true negatives and false negatives.
The inputs must be vectors of equal length.
ppv = tp / (tp + fp)
ppv(tp, fp, tn, fn, ...)
ppv(tp, fp, tn, fn, ...)
tp |
(numeric) number of true positives. |
fp |
(numeric) number of false positives. |
tn |
(numeric) number of true negatives. |
fn |
(numeric) number of false negatives. |
... |
for capturing additional arguments passed by method. |
Other metric functions:
F1_score()
,
Jaccard()
,
abs_d_ppv_npv()
,
abs_d_sens_spec()
,
accuracy()
,
cohens_kappa()
,
cutpoint()
,
false_omission_rate()
,
metric_constrain()
,
misclassification_cost()
,
npv()
,
odds_ratio()
,
p_chisquared()
,
plr()
,
precision()
,
prod_ppv_npv()
,
prod_sens_spec()
,
recall()
,
risk_ratio()
,
roc01()
,
sensitivity()
,
specificity()
,
sum_ppv_npv()
,
sum_sens_spec()
,
total_utility()
,
tpr()
,
tp()
,
youden()
ppv(10, 5, 20, 10) ppv(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
ppv(10, 5, 20, 10) ppv(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
Calculate precision (equal to the positive predictive value)
from true positives, false positives, true negatives and false negatives.
The inputs must be vectors of equal length.
precision = tp / (tp + fp)
precision(tp, fp, tn, fn, ...)
precision(tp, fp, tn, fn, ...)
tp |
(numeric) number of true positives. |
fp |
(numeric) number of false positives. |
tn |
(numeric) number of true negatives. |
fn |
(numeric) number of false negatives. |
... |
for capturing additional arguments passed by method. |
Other metric functions:
F1_score()
,
Jaccard()
,
abs_d_ppv_npv()
,
abs_d_sens_spec()
,
accuracy()
,
cohens_kappa()
,
cutpoint()
,
false_omission_rate()
,
metric_constrain()
,
misclassification_cost()
,
npv()
,
odds_ratio()
,
p_chisquared()
,
plr()
,
ppv()
,
prod_ppv_npv()
,
prod_sens_spec()
,
recall()
,
risk_ratio()
,
roc01()
,
sensitivity()
,
specificity()
,
sum_ppv_npv()
,
sum_sens_spec()
,
total_utility()
,
tpr()
,
tp()
,
youden()
precision(10, 5, 20, 10) precision(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
precision(10, 5, 20, 10) precision(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
Predictions are made on the data.frame
in newdata
using either the variable name or by applying the same transformation to
the data as in cutpointr
. The class of the output will be identical to the class
of the predictor.
## S3 method for class 'cutpointr' predict(object, newdata, cutpoint_nr = 1, ...)
## S3 method for class 'cutpointr' predict(object, newdata, cutpoint_nr = 1, ...)
object |
a cutpointr object. |
newdata |
a data.frame with a column that contains the predictor variable. |
cutpoint_nr |
if multiple optimal cutpoints were found this parameter defines which one should be used for predictions. Can be a vector if different cutpoint numbers are desired for different subgroups. |
... |
further arguments. |
Other main cutpointr functions:
add_metric()
,
boot_ci()
,
boot_test()
,
cutpointr()
,
multi_cutpointr()
,
roc()
oc <- cutpointr(suicide, dsi, suicide) ## Return in-sample predictions predict(oc, newdata = data.frame(dsi = oc$data[[1]]$dsi))
oc <- cutpointr(suicide, dsi, suicide) ## Return in-sample predictions predict(oc, newdata = data.frame(dsi = oc$data[[1]]$dsi))
Prints the cutpointr
object with full width like a tbl_df
.
## S3 method for class 'cutpointr' print(x, width = 1000, n = 50, sigfig = 6, ...)
## S3 method for class 'cutpointr' print(x, width = 1000, n = 50, sigfig = 6, ...)
x |
a cutpointr object. |
width |
width of output. |
n |
number of rows to print. |
sigfig |
Number of significant digits to print. Temporarily overrides options("pillar.sigfig"). |
... |
further arguments. |
Kirill Müller and Hadley Wickham (2017). tibble: Simple Data Frames. https://CRAN.R-project.org/package=tibble
Prints the multi_cutpointr
object with infinite width like a tbl_df
.
## S3 method for class 'multi_cutpointr' print(x, n = Inf, ...)
## S3 method for class 'multi_cutpointr' print(x, n = Inf, ...)
x |
a multi_cutpointr object. |
n |
number of rows to print. |
... |
further arguments. |
Kirill Müller and Hadley Wickham (2017). tibble: Simple Data Frames. https://CRAN.R-project.org/package=tibble
Calculate the product of positive predictive value (PPV) and
negative predictive value (NPV) from
true positives, false positives, true negatives and false negatives.
The inputs must be vectors of equal length.
ppv = tp / (tp + fp)
npv = tn / (tn + fn)
prod_ppv_npv = ppv * npv
prod_ppv_npv(tp, fp, tn, fn, ...)
prod_ppv_npv(tp, fp, tn, fn, ...)
tp |
(numeric) number of true positives. |
fp |
(numeric) number of false positives. |
tn |
(numeric) number of true negatives. |
fn |
(numeric) number of false negatives. |
... |
for capturing additional arguments passed by method. |
Other metric functions:
F1_score()
,
Jaccard()
,
abs_d_ppv_npv()
,
abs_d_sens_spec()
,
accuracy()
,
cohens_kappa()
,
cutpoint()
,
false_omission_rate()
,
metric_constrain()
,
misclassification_cost()
,
npv()
,
odds_ratio()
,
p_chisquared()
,
plr()
,
ppv()
,
precision()
,
prod_sens_spec()
,
recall()
,
risk_ratio()
,
roc01()
,
sensitivity()
,
specificity()
,
sum_ppv_npv()
,
sum_sens_spec()
,
total_utility()
,
tpr()
,
tp()
,
youden()
prod_ppv_npv(10, 5, 20, 10) prod_ppv_npv(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
prod_ppv_npv(10, 5, 20, 10) prod_ppv_npv(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
Calculate the product of sensitivity and specificity from
true positives, false positives, true negatives and false negatives.
The inputs must be vectors of equal length.
sensitivity = tp / (tp + fn)
specificity = tn / (tn + fp)
prod_sens_spec = sensitivity * specificity
prod_sens_spec(tp, fp, tn, fn, ...)
prod_sens_spec(tp, fp, tn, fn, ...)
tp |
(numeric) number of true positives. |
fp |
(numeric) number of false positives. |
tn |
(numeric) number of true negatives. |
fn |
(numeric) number of false negatives. |
... |
for capturing additional arguments passed by method. |
Other metric functions:
F1_score()
,
Jaccard()
,
abs_d_ppv_npv()
,
abs_d_sens_spec()
,
accuracy()
,
cohens_kappa()
,
cutpoint()
,
false_omission_rate()
,
metric_constrain()
,
misclassification_cost()
,
npv()
,
odds_ratio()
,
p_chisquared()
,
plr()
,
ppv()
,
precision()
,
prod_ppv_npv()
,
recall()
,
risk_ratio()
,
roc01()
,
sensitivity()
,
specificity()
,
sum_ppv_npv()
,
sum_sens_spec()
,
total_utility()
,
tpr()
,
tp()
,
youden()
prod_sens_spec(10, 5, 20, 10) prod_sens_spec(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
prod_sens_spec(10, 5, 20, 10) prod_sens_spec(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
Prostatic acid phosphatase (PAP) emerged as the first clinically useful tumor marker in the 1940s and 1950s. This data set contains the serum levels of acid phosphatase of 53 patients that were confirmed to have prostate cancer and whether the neighboring lymph nodes were involved.
prostate_nodal
prostate_nodal
A data frame with 53 rows and 2 variables:
(numeric) Blood serum level of acid phosphatase
(logical) Whether neighboring lymph nodes were involved
Le CT (2006). A solution for the most basic optimization problem associated with an ROC curve. Statistical methods in medical research 15: 571–584
Calculate recall (equal to sensitivity) from
true positives, false positives, true negatives and false negatives.
The inputs must be vectors of equal length.
recall = tp / (tp + fn)
recall(tp, fp, tn, fn, ...)
recall(tp, fp, tn, fn, ...)
tp |
(numeric) number of true positives. |
fp |
(numeric) number of false positives. |
tn |
(numeric) number of true negatives. |
fn |
(numeric) number of false negatives. |
... |
for capturing additional arguments passed by method. |
Other metric functions:
F1_score()
,
Jaccard()
,
abs_d_ppv_npv()
,
abs_d_sens_spec()
,
accuracy()
,
cohens_kappa()
,
cutpoint()
,
false_omission_rate()
,
metric_constrain()
,
misclassification_cost()
,
npv()
,
odds_ratio()
,
p_chisquared()
,
plr()
,
ppv()
,
precision()
,
prod_ppv_npv()
,
prod_sens_spec()
,
risk_ratio()
,
roc01()
,
sensitivity()
,
specificity()
,
sum_ppv_npv()
,
sum_sens_spec()
,
total_utility()
,
tpr()
,
tp()
,
youden()
recall(10, 5, 20, 10) recall(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
recall(10, 5, 20, 10) recall(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
Calculate the risk ratio (or relative risk) from
true positives, false positives, true negatives and false negatives.
The inputs must be vectors of equal length.
risk_ratio = (tp / (tp + fn)) / (fp / (fp + tn))
risk_ratio(tp, fp, tn, fn, ...)
risk_ratio(tp, fp, tn, fn, ...)
tp |
(numeric) number of true positives. |
fp |
(numeric) number of false positives. |
tn |
(numeric) number of true negatives. |
fn |
(numeric) number of false negatives. |
... |
for capturing additional arguments passed by method. |
Other metric functions:
F1_score()
,
Jaccard()
,
abs_d_ppv_npv()
,
abs_d_sens_spec()
,
accuracy()
,
cohens_kappa()
,
cutpoint()
,
false_omission_rate()
,
metric_constrain()
,
misclassification_cost()
,
npv()
,
odds_ratio()
,
p_chisquared()
,
plr()
,
ppv()
,
precision()
,
prod_ppv_npv()
,
prod_sens_spec()
,
recall()
,
roc01()
,
sensitivity()
,
specificity()
,
sum_ppv_npv()
,
sum_sens_spec()
,
total_utility()
,
tpr()
,
tp()
,
youden()
risk_ratio(10, 5, 20, 10) risk_ratio(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
risk_ratio(10, 5, 20, 10) risk_ratio(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
Given a data.frame
with a numeric predictor variable and a binary outcome
variable this function returns a data.frame
that includes all elements of
the confusion matrix (true positives, false positives, true negatives,
and false negatives) for every unique value of the predictor variable.
Additionally, the true positive rate (tpr), false positive rate (fpr),
true negative rate (tnr) and false negative rate (fnr) are returned.
roc(data, x, class, pos_class, neg_class, direction = ">=", silent = FALSE)
roc(data, x, class, pos_class, neg_class, direction = ">=", silent = FALSE)
data |
A data.frame or matrix. Will be converted to a data.frame. |
x |
The name of the numeric predictor variable. |
class |
The name of the binary outcome variable. |
pos_class |
The value of 'class' that represents the positive cases. |
neg_class |
The value of 'class' that represents the negative cases. |
direction |
(character) One of ">=" or "<=". Specifies if the positive class is associated with higher values of x (default). |
silent |
If FALSE and the ROC curve contains no positives or negatives, a warning is generated. |
To enable classifying all observations as belonging to only one class the predictor values will be augmented by Inf or -Inf. The returned object can be plotted with plot_roc.
This function uses tidyeval to support unquoted arguments. For programming
with roc
the operator !!
can be used to unquote an argument,
see the examples.
A data frame with the columns x.sorted, tp, fp, tn, fn, tpr, tnr, fpr, and fnr.
Forked from the ROCR package
Other main cutpointr functions:
add_metric()
,
boot_ci()
,
boot_test()
,
cutpointr()
,
multi_cutpointr()
,
predict.cutpointr()
roc_curve <- roc(data = suicide, x = dsi, class = suicide, pos_class = "yes", neg_class = "no", direction = ">=") roc_curve plot_roc(roc_curve) auc(roc_curve) ## Unquoting an argument myvar <- "dsi" roc(suicide, x = !!myvar, suicide, pos_class = "yes", neg_class = "no")
roc_curve <- roc(data = suicide, x = dsi, class = suicide, pos_class = "yes", neg_class = "no", direction = ">=") roc_curve plot_roc(roc_curve) auc(roc_curve) ## Unquoting an argument myvar <- "dsi" roc(suicide, x = !!myvar, suicide, pos_class = "yes", neg_class = "no")
Calculate the distance on the ROC space between points on the ROC curve
and the point of perfect discrimination
from true positives, false positives, true negatives and false negatives.
The inputs must be vectors of equal length. To be used with
method = minimize_metric
.
sensitivity = tp / (tp + fn)
specificity = tn / (tn + fp)
roc01 = sqrt((1 - sensitivity)^2 + (1 - specificity)^2)
roc01(tp, fp, tn, fn, ...)
roc01(tp, fp, tn, fn, ...)
tp |
(numeric) number of true positives. |
fp |
(numeric) number of false positives. |
tn |
(numeric) number of true negatives. |
fn |
(numeric) number of false negatives. |
... |
for capturing additional arguments passed by method. |
Other metric functions:
F1_score()
,
Jaccard()
,
abs_d_ppv_npv()
,
abs_d_sens_spec()
,
accuracy()
,
cohens_kappa()
,
cutpoint()
,
false_omission_rate()
,
metric_constrain()
,
misclassification_cost()
,
npv()
,
odds_ratio()
,
p_chisquared()
,
plr()
,
ppv()
,
precision()
,
prod_ppv_npv()
,
prod_sens_spec()
,
recall()
,
risk_ratio()
,
sensitivity()
,
specificity()
,
sum_ppv_npv()
,
sum_sens_spec()
,
total_utility()
,
tpr()
,
tp()
,
youden()
roc01(10, 5, 20, 10) roc01(c(10, 8), c(5, 7), c(20, 12), c(10, 18)) oc <- cutpointr(suicide, dsi, suicide, method = minimize_metric, metric = roc01) plot_roc(oc)
roc01(10, 5, 20, 10) roc01(c(10, 8), c(5, 7), c(20, 12), c(10, 18)) oc <- cutpointr(suicide, dsi, suicide, method = minimize_metric, metric = roc01) plot_roc(oc)
Calculate sensitivity from
true positives, false positives, true negatives and false negatives.
The inputs must be vectors of equal length.
sensitivity = tp / (tp + fn)
sensitivity(tp, fn, ...)
sensitivity(tp, fn, ...)
tp |
(numeric) number of true positives. |
fn |
(numeric) number of false negatives. |
... |
for capturing additional arguments passed by method. |
Other metric functions:
F1_score()
,
Jaccard()
,
abs_d_ppv_npv()
,
abs_d_sens_spec()
,
accuracy()
,
cohens_kappa()
,
cutpoint()
,
false_omission_rate()
,
metric_constrain()
,
misclassification_cost()
,
npv()
,
odds_ratio()
,
p_chisquared()
,
plr()
,
ppv()
,
precision()
,
prod_ppv_npv()
,
prod_sens_spec()
,
recall()
,
risk_ratio()
,
roc01()
,
specificity()
,
sum_ppv_npv()
,
sum_sens_spec()
,
total_utility()
,
tpr()
,
tp()
,
youden()
sensitivity(10, 5, 20, 10) sensitivity(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
sensitivity(10, 5, 20, 10) sensitivity(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
Calculate specificity from true positives, false positives, true negatives and false negatives.
The inputs must be vectors of equal length.
specificity = tn / (tn + fp)
specificity(fp, tn, ...)
specificity(fp, tn, ...)
fp |
(numeric) number of false positives. |
tn |
(numeric) number of true negatives. |
... |
for capturing additional arguments passed by method. |
Other metric functions:
F1_score()
,
Jaccard()
,
abs_d_ppv_npv()
,
abs_d_sens_spec()
,
accuracy()
,
cohens_kappa()
,
cutpoint()
,
false_omission_rate()
,
metric_constrain()
,
misclassification_cost()
,
npv()
,
odds_ratio()
,
p_chisquared()
,
plr()
,
ppv()
,
precision()
,
prod_ppv_npv()
,
prod_sens_spec()
,
recall()
,
risk_ratio()
,
roc01()
,
sensitivity()
,
sum_ppv_npv()
,
sum_sens_spec()
,
total_utility()
,
tpr()
,
tp()
,
youden()
specificity(10, 5, 20, 10) specificity(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
specificity(10, 5, 20, 10) specificity(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
Various personality and clinical psychological characteristics were assessed as part of an online-study preventing suicide. To identify persons at risk for attempting suicide, various demographic and clinical characteristics were assessed. Depressive Symptom Inventory - Suicidality Subscale (DSA-SS) sum scores and past suicide attempts from 532 subjects are included as a demonstration set to calculate optimal cutpoints. Two additional demographic variables (age, gender) are also included to test for group differences.
suicide
suicide
A data frame with 532 rows and 4 variables:
(numeric) Age of participants in years
(factor) Gender
(numeric) Sum-score (0 = low suicidality, 12 = high suicidality)
(factor) Past suicide attempt (no = no attempt, yes = at least one attempt)
von Glischinski, M., Teisman, T., Prinz, S., Gebauer, J., and Hirschfeld, G. (2017). Depressive Symptom Inventory- Suicidality Subscale: Optimal cut points for clinical and non-clinical samples. Clinical Psychology & Psychotherapy
Calculate the sum of positive predictive value (PPV) and
negative predictive value (NPV) from
true positives, false positives, true negatives and false negatives.
The inputs must be vectors of equal length.
ppv = tp / (tp + fp)
npv = tn / (tn + fn)
sum_ppv_npv = ppv + npv
sum_ppv_npv(tp, fp, tn, fn, ...)
sum_ppv_npv(tp, fp, tn, fn, ...)
tp |
(numeric) number of true positives. |
fp |
(numeric) number of false positives. |
tn |
(numeric) number of true negatives. |
fn |
(numeric) number of false negatives. |
... |
for capturing additional arguments passed by method. |
Other metric functions:
F1_score()
,
Jaccard()
,
abs_d_ppv_npv()
,
abs_d_sens_spec()
,
accuracy()
,
cohens_kappa()
,
cutpoint()
,
false_omission_rate()
,
metric_constrain()
,
misclassification_cost()
,
npv()
,
odds_ratio()
,
p_chisquared()
,
plr()
,
ppv()
,
precision()
,
prod_ppv_npv()
,
prod_sens_spec()
,
recall()
,
risk_ratio()
,
roc01()
,
sensitivity()
,
specificity()
,
sum_sens_spec()
,
total_utility()
,
tpr()
,
tp()
,
youden()
sum_ppv_npv(10, 5, 20, 10) sum_ppv_npv(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
sum_ppv_npv(10, 5, 20, 10) sum_ppv_npv(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
Calculate the sum of sensitivity and specificity from
true positives, false positives, true negatives and false negatives.
The inputs must be vectors of equal length.
sensitivity = tp / (tp + fn)
specificity = tn / (tn + fp)
sum_sens_spec = sensitivity + specificity
sum_sens_spec(tp, fp, tn, fn, ...)
sum_sens_spec(tp, fp, tn, fn, ...)
tp |
(numeric) number of true positives. |
fp |
(numeric) number of false positives. |
tn |
(numeric) number of true negatives. |
fn |
(numeric) number of false negatives. |
... |
for capturing additional arguments passed by method. |
Other metric functions:
F1_score()
,
Jaccard()
,
abs_d_ppv_npv()
,
abs_d_sens_spec()
,
accuracy()
,
cohens_kappa()
,
cutpoint()
,
false_omission_rate()
,
metric_constrain()
,
misclassification_cost()
,
npv()
,
odds_ratio()
,
p_chisquared()
,
plr()
,
ppv()
,
precision()
,
prod_ppv_npv()
,
prod_sens_spec()
,
recall()
,
risk_ratio()
,
roc01()
,
sensitivity()
,
specificity()
,
sum_ppv_npv()
,
total_utility()
,
tpr()
,
tp()
,
youden()
sum_sens_spec(10, 5, 20, 10) sum_sens_spec(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
sum_sens_spec(10, 5, 20, 10) sum_sens_spec(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
Calculate the total utility from
true positives, false positives, true negatives and false negatives.
total_utility = utility_tp * tp + utility_tn * tn - cost_fp * fp - cost_fn * fn
The inputs must be vectors of equal length.
total_utility( tp, fp, tn, fn, utility_tp = 1, utility_tn = 1, cost_fp = 1, cost_fn = 1, ... )
total_utility( tp, fp, tn, fn, utility_tp = 1, utility_tn = 1, cost_fp = 1, cost_fn = 1, ... )
tp |
(numeric) number of true positives. |
fp |
(numeric) number of false positives. |
tn |
(numeric) number of true negatives. |
fn |
(numeric) number of false negatives. |
utility_tp |
(numeric) the utility of a true positive |
utility_tn |
(numeric) the utility of a true negative |
cost_fp |
(numeric) the cost of a false positive |
cost_fn |
(numeric) the cost of a false negative |
... |
for capturing additional arguments passed by method. |
Other metric functions:
F1_score()
,
Jaccard()
,
abs_d_ppv_npv()
,
abs_d_sens_spec()
,
accuracy()
,
cohens_kappa()
,
cutpoint()
,
false_omission_rate()
,
metric_constrain()
,
misclassification_cost()
,
npv()
,
odds_ratio()
,
p_chisquared()
,
plr()
,
ppv()
,
precision()
,
prod_ppv_npv()
,
prod_sens_spec()
,
recall()
,
risk_ratio()
,
roc01()
,
sensitivity()
,
specificity()
,
sum_ppv_npv()
,
sum_sens_spec()
,
tpr()
,
tp()
,
youden()
total_utility(10, 5, 20, 10, utility_tp = 3, utility_tn = 3, cost_fp = 1, cost_fn = 5) total_utility(c(10, 8), c(5, 7), c(20, 12), c(10, 18), utility_tp = 3, utility_tn = 3, cost_fp = 1, cost_fn = 5)
total_utility(10, 5, 20, 10, utility_tp = 3, utility_tn = 3, cost_fp = 1, cost_fn = 5) total_utility(c(10, 8), c(5, 7), c(20, 12), c(10, 18), utility_tp = 3, utility_tn = 3, cost_fp = 1, cost_fn = 5)
Extract the number of true positives (tp), false positives (fp),
true negatives (tn), or false negatives (fn).
The inputs must be vectors of equal length. Mainly useful for plot_cutpointr
.
tp(tp, ...) tn(tn, ...) fp(fp, ...) fn(fn, ...)
tp(tp, ...) tn(tn, ...) fp(fp, ...) fn(fn, ...)
tp |
(numeric) number of true positives. |
... |
for capturing additional arguments passed by method. |
tn |
(numeric) number of true negatives. |
fp |
(numeric) number of false positives. |
fn |
(numeric) number of false negatives. |
Other metric functions:
F1_score()
,
Jaccard()
,
abs_d_ppv_npv()
,
abs_d_sens_spec()
,
accuracy()
,
cohens_kappa()
,
cutpoint()
,
false_omission_rate()
,
metric_constrain()
,
misclassification_cost()
,
npv()
,
odds_ratio()
,
p_chisquared()
,
plr()
,
ppv()
,
precision()
,
prod_ppv_npv()
,
prod_sens_spec()
,
recall()
,
risk_ratio()
,
roc01()
,
sensitivity()
,
specificity()
,
sum_ppv_npv()
,
sum_sens_spec()
,
total_utility()
,
tpr()
,
youden()
tp(10, 5, 20, 10) tp(c(10, 8), c(5, 7), c(20, 12), c(10, 18)) fp(10, 5, 20, 10) tn(10, 5, 20, 10) fn(10, 5, 20, 10)
tp(10, 5, 20, 10) tp(c(10, 8), c(5, 7), c(20, 12), c(10, 18)) fp(10, 5, 20, 10) tn(10, 5, 20, 10) fn(10, 5, 20, 10)
Calculate the true positive rate (tpr, equal to sensitivity and recall),
the false positive rate (fpr, equal to fall-out),
the true negative rate (tnr, equal to specificity),
or the false negative rate (fnr) from
true positives, false positives, true negatives and false negatives.
The inputs must be vectors of equal length.
tpr = tp / (tp + fn)
fpr = fp / (fp + tn)
tnr = tn / (tn + fp)
fnr = fn / (fn + tp)
tpr(tp, fn, ...) fpr(fp, tn, ...) tnr(fp, tn, ...) fnr(tp, fn, ...)
tpr(tp, fn, ...) fpr(fp, tn, ...) tnr(fp, tn, ...) fnr(tp, fn, ...)
tp |
(numeric) number of true positives. |
fn |
(numeric) number of false negatives. |
... |
for capturing additional arguments passed by method. |
fp |
(numeric) number of false positives. |
tn |
(numeric) number of true negatives. |
Other metric functions:
F1_score()
,
Jaccard()
,
abs_d_ppv_npv()
,
abs_d_sens_spec()
,
accuracy()
,
cohens_kappa()
,
cutpoint()
,
false_omission_rate()
,
metric_constrain()
,
misclassification_cost()
,
npv()
,
odds_ratio()
,
p_chisquared()
,
plr()
,
ppv()
,
precision()
,
prod_ppv_npv()
,
prod_sens_spec()
,
recall()
,
risk_ratio()
,
roc01()
,
sensitivity()
,
specificity()
,
sum_ppv_npv()
,
sum_sens_spec()
,
total_utility()
,
tp()
,
youden()
tpr(10, 5, 20, 10) tpr(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
tpr(10, 5, 20, 10) tpr(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
This function implements a rule of thumb for selecting the bandwidth when
smoothing a function of metric values per cutpoint value, particularly
in maximize_loess_metric
and minimize_loess_metric
.
user_span_cutpointr(data, x)
user_span_cutpointr(data, x)
data |
A data frame |
x |
The predictor variable |
The function used for calculating the bandwidth is 0.1 * xsd / sqrt(xn), where xsd is the standard deviation of the unique values of the predictor variable (i.e. all cutpoints) and xn is the number of unique predictor values.
Calculate the Youden-Index (J-Index) from
true positives, false positives, true negatives and false negatives.
The inputs must be vectors of equal length.
sensitivity = tp / (tp + fn)
specificity = tn / (tn + fp)
youden_index = sensitivity + specificity - 1
youden(tp, fp, tn, fn, ...)
youden(tp, fp, tn, fn, ...)
tp |
(numeric) number of true positives. |
fp |
(numeric) number of false positives. |
tn |
(numeric) number of true negatives. |
fn |
(numeric) number of false negatives. |
... |
for capturing additional arguments passed by method. |
Other metric functions:
F1_score()
,
Jaccard()
,
abs_d_ppv_npv()
,
abs_d_sens_spec()
,
accuracy()
,
cohens_kappa()
,
cutpoint()
,
false_omission_rate()
,
metric_constrain()
,
misclassification_cost()
,
npv()
,
odds_ratio()
,
p_chisquared()
,
plr()
,
ppv()
,
precision()
,
prod_ppv_npv()
,
prod_sens_spec()
,
recall()
,
risk_ratio()
,
roc01()
,
sensitivity()
,
specificity()
,
sum_ppv_npv()
,
sum_sens_spec()
,
total_utility()
,
tpr()
,
tp()
youden(10, 5, 20, 10) youden(c(10, 8), c(5, 7), c(20, 12), c(10, 18))
youden(10, 5, 20, 10) youden(c(10, 8), c(5, 7), c(20, 12), c(10, 18))