Title: | Surrogate Residuals for Ordinal and General Regression Models |
---|---|
Description: | An implementation of the surrogate approach to residuals and diagnostics for ordinal and general regression models; for details, see Liu and Zhang (2017, <doi:https://doi.org/10.1080/01621459.2017.1292915>) and Greenwell et al. (2017, <https://journal.r-project.org/archive/2018/RJ-2018-004/index.html>). These residuals can be used to construct standard residual plots for model diagnostics (e.g., residual-vs-fitted value plots, residual-vs-covariate plots, Q-Q plots, etc.). The package also provides an 'autoplot' function for producing standard diagnostic plots using 'ggplot2' graphics. The package currently supports cumulative link models from packages 'MASS', 'ordinal', 'rms', and 'VGAM'. Support for binary regression models using the standard 'glm' function is also available. |
Authors: | Brandon Greenwell [aut, cre] |
Maintainer: | Brandon Greenwell <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.2.2.9000 |
Built: | 2025-03-11 05:02:06 UTC |
Source: | https://github.com/koalaverse/sure |
Residual-based diagnostic plots for cumulative link and general regression
models using ggplot2
graphics.
autoplot.resid( object, what = c("qq", "fitted", "covariate"), x = NULL, fit = NULL, distribution = qnorm, ncol = NULL, alpha = 1, xlab = NULL, color = "#444444", shape = 19, size = 2, qqpoint.color = "#444444", qqpoint.shape = 19, qqpoint.size = 2, qqline.color = "#888888", qqline.linetype = "dashed", qqline.size = 1, smooth = TRUE, smooth.color = "red", smooth.linetype = 1, smooth.size = 1, fill = NULL, ... ) autoplot.clm( object, what = c("qq", "fitted", "covariate"), x = NULL, fit = NULL, distribution = qnorm, ncol = NULL, alpha = 1, xlab = NULL, color = "#444444", shape = 19, size = 2, qqpoint.color = "#444444", qqpoint.shape = 19, qqpoint.size = 2, qqline.color = "#888888", qqline.linetype = "dashed", qqline.size = 1, smooth = TRUE, smooth.color = "red", smooth.linetype = 1, smooth.size = 1, fill = NULL, ... ) autoplot.glm( object, what = c("qq", "fitted", "covariate"), x = NULL, fit = NULL, distribution = qnorm, ncol = NULL, alpha = 1, xlab = NULL, color = "#444444", shape = 19, size = 2, qqpoint.color = "#444444", qqpoint.shape = 19, qqpoint.size = 2, qqline.color = "#888888", qqline.linetype = "dashed", qqline.size = 1, smooth = TRUE, smooth.color = "red", smooth.linetype = 1, smooth.size = 1, fill = NULL, ... ) autoplot.lrm( object, what = c("qq", "fitted", "covariate"), x = NULL, fit = NULL, distribution = qnorm, ncol = NULL, alpha = 1, xlab = NULL, color = "#444444", shape = 19, size = 2, qqpoint.color = "#444444", qqpoint.shape = 19, qqpoint.size = 2, qqline.color = "#888888", qqline.linetype = "dashed", qqline.size = 1, smooth = TRUE, smooth.color = "red", smooth.linetype = 1, smooth.size = 1, fill = NULL, ... ) autoplot.orm( object, what = c("qq", "fitted", "covariate"), x = NULL, fit = NULL, distribution = qnorm, ncol = NULL, alpha = 1, xlab = NULL, color = "#444444", shape = 19, size = 2, qqpoint.color = "#444444", qqpoint.shape = 19, qqpoint.size = 2, qqline.color = "#888888", qqline.linetype = "dashed", qqline.size = 1, smooth = TRUE, smooth.color = "red", smooth.linetype = 1, smooth.size = 1, fill = NULL, ... ) autoplot.polr( object, what = c("qq", "fitted", "covariate"), x = NULL, fit = NULL, distribution = qnorm, ncol = NULL, alpha = 1, xlab = NULL, color = "#444444", shape = 19, size = 2, qqpoint.color = "#444444", qqpoint.shape = 19, qqpoint.size = 2, qqline.color = "#888888", qqline.linetype = "dashed", qqline.size = 1, smooth = TRUE, smooth.color = "red", smooth.linetype = 1, smooth.size = 1, fill = NULL, ... ) autoplot.vglm( object, what = c("qq", "fitted", "covariate"), x = NULL, fit = NULL, distribution = qnorm, ncol = NULL, alpha = 1, xlab = NULL, color = "#444444", shape = 19, size = 2, qqpoint.color = "#444444", qqpoint.shape = 19, qqpoint.size = 2, qqline.color = "#888888", qqline.linetype = "dashed", qqline.size = 1, smooth = TRUE, smooth.color = "red", smooth.linetype = 1, smooth.size = 1, fill = NULL, ... )
autoplot.resid( object, what = c("qq", "fitted", "covariate"), x = NULL, fit = NULL, distribution = qnorm, ncol = NULL, alpha = 1, xlab = NULL, color = "#444444", shape = 19, size = 2, qqpoint.color = "#444444", qqpoint.shape = 19, qqpoint.size = 2, qqline.color = "#888888", qqline.linetype = "dashed", qqline.size = 1, smooth = TRUE, smooth.color = "red", smooth.linetype = 1, smooth.size = 1, fill = NULL, ... ) autoplot.clm( object, what = c("qq", "fitted", "covariate"), x = NULL, fit = NULL, distribution = qnorm, ncol = NULL, alpha = 1, xlab = NULL, color = "#444444", shape = 19, size = 2, qqpoint.color = "#444444", qqpoint.shape = 19, qqpoint.size = 2, qqline.color = "#888888", qqline.linetype = "dashed", qqline.size = 1, smooth = TRUE, smooth.color = "red", smooth.linetype = 1, smooth.size = 1, fill = NULL, ... ) autoplot.glm( object, what = c("qq", "fitted", "covariate"), x = NULL, fit = NULL, distribution = qnorm, ncol = NULL, alpha = 1, xlab = NULL, color = "#444444", shape = 19, size = 2, qqpoint.color = "#444444", qqpoint.shape = 19, qqpoint.size = 2, qqline.color = "#888888", qqline.linetype = "dashed", qqline.size = 1, smooth = TRUE, smooth.color = "red", smooth.linetype = 1, smooth.size = 1, fill = NULL, ... ) autoplot.lrm( object, what = c("qq", "fitted", "covariate"), x = NULL, fit = NULL, distribution = qnorm, ncol = NULL, alpha = 1, xlab = NULL, color = "#444444", shape = 19, size = 2, qqpoint.color = "#444444", qqpoint.shape = 19, qqpoint.size = 2, qqline.color = "#888888", qqline.linetype = "dashed", qqline.size = 1, smooth = TRUE, smooth.color = "red", smooth.linetype = 1, smooth.size = 1, fill = NULL, ... ) autoplot.orm( object, what = c("qq", "fitted", "covariate"), x = NULL, fit = NULL, distribution = qnorm, ncol = NULL, alpha = 1, xlab = NULL, color = "#444444", shape = 19, size = 2, qqpoint.color = "#444444", qqpoint.shape = 19, qqpoint.size = 2, qqline.color = "#888888", qqline.linetype = "dashed", qqline.size = 1, smooth = TRUE, smooth.color = "red", smooth.linetype = 1, smooth.size = 1, fill = NULL, ... ) autoplot.polr( object, what = c("qq", "fitted", "covariate"), x = NULL, fit = NULL, distribution = qnorm, ncol = NULL, alpha = 1, xlab = NULL, color = "#444444", shape = 19, size = 2, qqpoint.color = "#444444", qqpoint.shape = 19, qqpoint.size = 2, qqline.color = "#888888", qqline.linetype = "dashed", qqline.size = 1, smooth = TRUE, smooth.color = "red", smooth.linetype = 1, smooth.size = 1, fill = NULL, ... ) autoplot.vglm( object, what = c("qq", "fitted", "covariate"), x = NULL, fit = NULL, distribution = qnorm, ncol = NULL, alpha = 1, xlab = NULL, color = "#444444", shape = 19, size = 2, qqpoint.color = "#444444", qqpoint.shape = 19, qqpoint.size = 2, qqline.color = "#888888", qqline.linetype = "dashed", qqline.size = 1, smooth = TRUE, smooth.color = "red", smooth.linetype = 1, smooth.size = 1, fill = NULL, ... )
object |
|
what |
Character string specifying what to plot. Default is |
x |
A vector giving the covariate values to use for residual-by-
covariate plots (i.e., when |
fit |
The fitted model from which the residuals were extracted. (Only
required if |
distribution |
Function that computes the quantiles for the reference
distribution to use in the quantile-quantile plot. Default is |
ncol |
Integer specifying the number of columns to use for the plot
layout (if requesting multiple plots). Default is |
alpha |
A single values in the interval [0, 1] controlling the opacity
alpha of the plotted points. Only used when |
xlab |
Character string giving the text to use for the x-axis label in
residual-by-covariate plots. Default is |
color |
Character string or integer specifying what color to use for the
points in the residual vs fitted value/covariate plot.
Default is |
shape |
Integer or single character specifying a symbol to be used for plotting the points in the residual vs fitted value/covariate plot. |
size |
Numeric value specifying the size to use for the points in the residual vs fitted value/covariate plot. |
qqpoint.color |
Character string or integer specifying what color to use for the points in the quantile-quantile plot. |
qqpoint.shape |
Integer or single character specifying a symbol to be used for plotting the points in the quantile-quantile plot. |
qqpoint.size |
Numeric value specifying the size to use for the points in the quantile-quantile plot. |
qqline.color |
Character string or integer specifying what color to use for the points in the quantile-quantile plot. |
qqline.linetype |
Integer or character string (e.g., |
qqline.size |
Numeric value specifying the thickness of the line in the quantile-quantile plot. |
smooth |
Logical indicating whether or not too add a nonparametric
smooth to certain plots. Default is |
smooth.color |
Character string or integer specifying what color to use for the nonparametric smooth. |
smooth.linetype |
Integer or character string (e.g., |
smooth.size |
Numeric value specifying the thickness of the line for the nonparametric smooth. |
fill |
Character string or integer specifying the color to use to fill
the boxplots for residual-by-covariate plots when |
... |
Additional optional arguments to be passed onto
|
A "ggplot"
object.
# See ?resids for an example ?resids
# See ?resids for an example ?resids
Data simulated from a probit model with a quadratic trend. The data are described in Example 2 of Liu and Zhang (2017).
data(df1)
data(df1)
A data frame with 2000 rows and 2 variables.
x
The predictor variable.
y
The response variable; an ordered factor.
Liu, Dungang and Zhang, Heping. Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach. Journal of the American Statistical Association (accepted).
head(df1)
head(df1)
Data simulated from a probit model with heteroscedasticity. The data are described in Example 4 of Liu and Zhang (2017).
data(df2)
data(df2)
A data frame with 2000 rows and 2 variables.
x
The predictor variable.
y
The response variable; an ordered factor.
Liu, Dungang and Zhang, Heping. Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach. Journal of the American Statistical Association (accepted).
head(df2)
head(df2)
Data simulated from a log-log model with a quadratic trend. The data are described in Example 3 of Liu and Zhang (2017).
data(df3)
data(df3)
A data frame with 2000 rows and 2 variables.
x
The predictor variable.
y
The response variable; an ordered factor.
Liu, Dungang and Zhang, Heping. Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach. Journal of the American Statistical Association (accepted).
head(df3)
head(df3)
Data simulated from from two separate probit models. The data are described in Example 5 of Liu and Zhang (2017).
data(df4)
data(df4)
A data frame with 4000 rows and 2 variables.
x
The predictor variable.
y
The response variable; an ordered factor.
Liu, Dungang and Zhang, Heping. Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach. Journal of the American Statistical Association (accepted).
head(df4)
head(df4)
Data simulated from from an ordered probit model with an interaction effect.
data(df5)
data(df5)
A data frame with 2000 rows and 3 variables.
x1
A continuous predictor variable.
x2
A factor with two levels: "Control"
and
"Treatment"
.
y
The response variable; an ordered factor.
Liu, Dungang and Zhang, Heping. Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach. Journal of the American Statistical Association (accepted).
head(df5)
head(df5)
Simulate p-values from a goodness-of-fit test.
gof(object, nsim = 10, test = c("ks", "ad", "cvm"), ...) ## Default S3 method: gof(object, nsim = 10, test = c("ks", "ad", "cvm"), ...) ## S3 method for class 'gof' plot(x, ...)
gof(object, nsim = 10, test = c("ks", "ad", "cvm"), ...) ## Default S3 method: gof(object, nsim = 10, test = c("ks", "ad", "cvm"), ...) ## S3 method for class 'gof' plot(x, ...)
object |
|
nsim |
Integer specifying the number of bootstrap replicates to use. |
test |
Character string specifying which goodness-of-fit test to use.
Current options include: |
... |
Additional optional arguments. (Currently ignored.) |
x |
An object of class |
Under the null hypothesis, the distribution of the p-values should appear
uniformly distributed on the interval [0, 1]. This can be visually
investigated using the plot
method. A 45 degree line is indicative of
a "good" fit.
A numeric vector of class "gof", "numeric"
containing the
simulated p-values.
# See ?resids for an example ?resids
# See ?resids for an example ?resids
Simulate surrogate response values for cumulative link regression models using the latent method described in Liu and Zhang (2017).
resids( object, nsim = 1L, method = c("latent", "jitter"), jitter.scale = c("response", "probability"), ... )
resids( object, nsim = 1L, method = c("latent", "jitter"), jitter.scale = c("response", "probability"), ... )
object |
|
nsim |
Integer specifying the number of bootstrap replicates to use.
Default is |
method |
Character string specifying which method to use to generate the
surrogate response values. Current options are |
jitter.scale |
Character string specifyint the scale on which to perform
the jittering whenever |
... |
Additional optional arguments. (Currently ignored.) |
A numeric vector of class c("numeric", "surrogate")
containing
the simulated surrogate response values. Additionally, if nsim
> 1,
then the result will contain the attributes:
boot_reps
A matrix with nsim
columns, one for each
bootstrap replicate of the surrogate values. Note, these are random and do
not correspond to the original ordering of the data;
boot_id
A matrix with nsim
columns. Each column
contains the observation number each surrogate value corresponds to in
boot_reps
. (This is used for plotting purposes.)
Surrogate response values require sampling from a continuous distribution;
consequently, the result will be different with every call to
surrogate
. The internal functions used for sampling from truncated
distributions are based on modified versions of
rtrunc
and qtrunc
.
For "glm"
objects, only the binomial()
family is supported.
Liu, Dungang and Zhang, Heping. Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach. Journal of the American Statistical Association (accepted). URL http://www.tandfonline.com/doi/abs/10.1080/01621459.2017.1292915?journalCode=uasa20
Nadarajah, Saralees and Kotz, Samuel. R Programs for Truncated Distributions. Journal of Statistical Software, Code Snippet, 16(2), 1-8, 2006. URL https://www.jstatsoft.org/v016/c02.
# Generate data from a quadratic probit model set.seed(101) n <- 2000 x <- runif(n, min = -3, max = 6) z <- 10 + 3 * x - 1 * x^2 + rnorm(n) y <- ifelse(z <= 0, yes = 0, no = 1) # Scatterplot matrix pairs(~ x + y + z) # Setup for side-by-side plots par(mfrow = c(1, 2)) # Misspecified mean structure fm1 <- glm(y ~ x, family = binomial(link = "probit")) scatter.smooth(x, y = resids(fm1), main = "Misspecified model", ylab = "Surrogate residual", lpars = list(lwd = 3, col = "red2")) abline(h = 0, lty = 2, col = "blue2") # Correctly specified mean structure fm2 <- glm(y ~ x + I(x ^ 2), family = binomial(link = "probit")) scatter.smooth(x, y = resids(fm2), main = "Correctly specified model", ylab = "Surrogate residual", lpars = list(lwd = 3, col = "red2")) abline(h = 0, lty = 2, col = "blue2")
# Generate data from a quadratic probit model set.seed(101) n <- 2000 x <- runif(n, min = -3, max = 6) z <- 10 + 3 * x - 1 * x^2 + rnorm(n) y <- ifelse(z <= 0, yes = 0, no = 1) # Scatterplot matrix pairs(~ x + y + z) # Setup for side-by-side plots par(mfrow = c(1, 2)) # Misspecified mean structure fm1 <- glm(y ~ x, family = binomial(link = "probit")) scatter.smooth(x, y = resids(fm1), main = "Misspecified model", ylab = "Surrogate residual", lpars = list(lwd = 3, col = "red2")) abline(h = 0, lty = 2, col = "blue2") # Correctly specified mean structure fm2 <- glm(y ~ x + I(x ^ 2), family = binomial(link = "probit")) scatter.smooth(x, y = resids(fm2), main = "Correctly specified model", ylab = "Surrogate residual", lpars = list(lwd = 3, col = "red2")) abline(h = 0, lty = 2, col = "blue2")
The sure
package provides surrogate-based residuals for fitted ordinal
and general (e.g., binary) regression models of class
clm
, glm
, lrm
,
orm
, polr
, or
vglm
.
The development version can be found on GitHub:
https://github.com/AFIT-R/sure. As of right now, sure
exports the
following functions:
resids
- construct (surrogate-based) residuals;
autoplot
- plot diagnostics using
ggplot2
-based graphics;
gof
- simulate p-values from a goodness-of-fit test.
Liu, Dungang and Zhang, Heping. Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach. Journal of the American Statistical Association (accepted).
Simulate surrogate response values for cumulative link regression models using the latent method described in Liu and Zhang (2017).
surrogate( object, nsim = 1L, method = c("latent", "jitter"), jitter.scale = c("response", "probability"), ... )
surrogate( object, nsim = 1L, method = c("latent", "jitter"), jitter.scale = c("response", "probability"), ... )
object |
|
nsim |
Integer specifying the number of bootstrap replicates to use.
Default is |
method |
Character string specifying which method to use to generate the
surrogate response values. Current options are |
jitter.scale |
Character string specifyint the scale on which to perform
the jittering whenever |
... |
Additional optional arguments. (Currently ignored.) |
A numeric vector of class c("numeric", "surrogate")
containing
the simulated surrogate response values. Additionally, if nsim
> 1,
then the result will contain the attributes:
boot_reps
A matrix with nsim
columns, one for each
bootstrap replicate of the surrogate values. Note, these are random and do
not correspond to the original ordering of the data;
boot_id
A matrix with nsim
columns. Each column
contains the observation number each surrogate value corresponds to in
boot_reps
. (This is used for plotting purposes.)
Surrogate response values require sampling from a continuous distribution;
consequently, the result will be different with every call to
surrogate
. The internal functions used for sampling from truncated
distributions are based on modified versions of
rtrunc
and qtrunc
.
For "glm"
objects, only the binomial()
family is supported.
Liu, Dungang and Zhang, Heping. Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach. Journal of the American Statistical Association (accepted). URL http://www.tandfonline.com/doi/abs/10.1080/01621459.2017.1292915?journalCode=uasa20
Nadarajah, Saralees and Kotz, Samuel. R Programs for Truncated Distributions. Journal of Statistical Software, Code Snippet, 16(2), 1-8, 2006. URL https://www.jstatsoft.org/v016/c02.
# Generate data from a quadratic probit model set.seed(101) n <- 2000 x <- runif(n, min = -3, max = 6) z <- 10 + 3*x - 1*x^2 + rnorm(n) y <- ifelse(z <= 0, yes = 0, no = 1) # Scatterplot matrix pairs(~ x + y + z) # Setup for side-by-side plots par(mfrow = c(1, 2)) # Misspecified mean structure fm1 <- glm(y ~ x, family = binomial(link = "probit")) s1 <- surrogate(fm1) scatter.smooth(x, s1 - fm1$linear.predictors, main = "Misspecified model", ylab = "Surrogate residual", lpars = list(lwd = 3, col = "red2")) abline(h = 0, lty = 2, col = "blue2") # Correctly specified mean structure fm2 <- glm(y ~ x + I(x ^ 2), family = binomial(link = "probit")) s2 <- surrogate(fm2) scatter.smooth(x, s2 - fm2$linear.predictors, main = "Correctly specified model", ylab = "Surrogate residual", lpars = list(lwd = 3, col = "red2")) abline(h = 0, lty = 2, col = "blue2")
# Generate data from a quadratic probit model set.seed(101) n <- 2000 x <- runif(n, min = -3, max = 6) z <- 10 + 3*x - 1*x^2 + rnorm(n) y <- ifelse(z <= 0, yes = 0, no = 1) # Scatterplot matrix pairs(~ x + y + z) # Setup for side-by-side plots par(mfrow = c(1, 2)) # Misspecified mean structure fm1 <- glm(y ~ x, family = binomial(link = "probit")) s1 <- surrogate(fm1) scatter.smooth(x, s1 - fm1$linear.predictors, main = "Misspecified model", ylab = "Surrogate residual", lpars = list(lwd = 3, col = "red2")) abline(h = 0, lty = 2, col = "blue2") # Correctly specified mean structure fm2 <- glm(y ~ x + I(x ^ 2), family = binomial(link = "probit")) s2 <- surrogate(fm2) scatter.smooth(x, s2 - fm2$linear.predictors, main = "Correctly specified model", ylab = "Surrogate residual", lpars = list(lwd = 3, col = "red2")) abline(h = 0, lty = 2, col = "blue2")