Package 'sure'

Title: Surrogate Residuals for Ordinal and General Regression Models
Description: An implementation of the surrogate approach to residuals and diagnostics for ordinal and general regression models; for details, see Liu and Zhang (2017, <doi:https://doi.org/10.1080/01621459.2017.1292915>) and Greenwell et al. (2017, <https://journal.r-project.org/archive/2018/RJ-2018-004/index.html>). These residuals can be used to construct standard residual plots for model diagnostics (e.g., residual-vs-fitted value plots, residual-vs-covariate plots, Q-Q plots, etc.). The package also provides an 'autoplot' function for producing standard diagnostic plots using 'ggplot2' graphics. The package currently supports cumulative link models from packages 'MASS', 'ordinal', 'rms', and 'VGAM'. Support for binary regression models using the standard 'glm' function is also available.
Authors: Brandon Greenwell [aut, cre] , Brad Boehmke [aut] , Andrew McCarthy [aut], Dungang Liu [ctb]
Maintainer: Brandon Greenwell <[email protected]>
License: GPL (>= 2)
Version: 0.2.2.9000
Built: 2025-03-11 05:02:06 UTC
Source: https://github.com/koalaverse/sure

Help Index


Residual plots

Description

Residual-based diagnostic plots for cumulative link and general regression models using ggplot2 graphics.

Usage

autoplot.resid(
  object,
  what = c("qq", "fitted", "covariate"),
  x = NULL,
  fit = NULL,
  distribution = qnorm,
  ncol = NULL,
  alpha = 1,
  xlab = NULL,
  color = "#444444",
  shape = 19,
  size = 2,
  qqpoint.color = "#444444",
  qqpoint.shape = 19,
  qqpoint.size = 2,
  qqline.color = "#888888",
  qqline.linetype = "dashed",
  qqline.size = 1,
  smooth = TRUE,
  smooth.color = "red",
  smooth.linetype = 1,
  smooth.size = 1,
  fill = NULL,
  ...
)

autoplot.clm(
  object,
  what = c("qq", "fitted", "covariate"),
  x = NULL,
  fit = NULL,
  distribution = qnorm,
  ncol = NULL,
  alpha = 1,
  xlab = NULL,
  color = "#444444",
  shape = 19,
  size = 2,
  qqpoint.color = "#444444",
  qqpoint.shape = 19,
  qqpoint.size = 2,
  qqline.color = "#888888",
  qqline.linetype = "dashed",
  qqline.size = 1,
  smooth = TRUE,
  smooth.color = "red",
  smooth.linetype = 1,
  smooth.size = 1,
  fill = NULL,
  ...
)

autoplot.glm(
  object,
  what = c("qq", "fitted", "covariate"),
  x = NULL,
  fit = NULL,
  distribution = qnorm,
  ncol = NULL,
  alpha = 1,
  xlab = NULL,
  color = "#444444",
  shape = 19,
  size = 2,
  qqpoint.color = "#444444",
  qqpoint.shape = 19,
  qqpoint.size = 2,
  qqline.color = "#888888",
  qqline.linetype = "dashed",
  qqline.size = 1,
  smooth = TRUE,
  smooth.color = "red",
  smooth.linetype = 1,
  smooth.size = 1,
  fill = NULL,
  ...
)

autoplot.lrm(
  object,
  what = c("qq", "fitted", "covariate"),
  x = NULL,
  fit = NULL,
  distribution = qnorm,
  ncol = NULL,
  alpha = 1,
  xlab = NULL,
  color = "#444444",
  shape = 19,
  size = 2,
  qqpoint.color = "#444444",
  qqpoint.shape = 19,
  qqpoint.size = 2,
  qqline.color = "#888888",
  qqline.linetype = "dashed",
  qqline.size = 1,
  smooth = TRUE,
  smooth.color = "red",
  smooth.linetype = 1,
  smooth.size = 1,
  fill = NULL,
  ...
)

autoplot.orm(
  object,
  what = c("qq", "fitted", "covariate"),
  x = NULL,
  fit = NULL,
  distribution = qnorm,
  ncol = NULL,
  alpha = 1,
  xlab = NULL,
  color = "#444444",
  shape = 19,
  size = 2,
  qqpoint.color = "#444444",
  qqpoint.shape = 19,
  qqpoint.size = 2,
  qqline.color = "#888888",
  qqline.linetype = "dashed",
  qqline.size = 1,
  smooth = TRUE,
  smooth.color = "red",
  smooth.linetype = 1,
  smooth.size = 1,
  fill = NULL,
  ...
)

autoplot.polr(
  object,
  what = c("qq", "fitted", "covariate"),
  x = NULL,
  fit = NULL,
  distribution = qnorm,
  ncol = NULL,
  alpha = 1,
  xlab = NULL,
  color = "#444444",
  shape = 19,
  size = 2,
  qqpoint.color = "#444444",
  qqpoint.shape = 19,
  qqpoint.size = 2,
  qqline.color = "#888888",
  qqline.linetype = "dashed",
  qqline.size = 1,
  smooth = TRUE,
  smooth.color = "red",
  smooth.linetype = 1,
  smooth.size = 1,
  fill = NULL,
  ...
)

autoplot.vglm(
  object,
  what = c("qq", "fitted", "covariate"),
  x = NULL,
  fit = NULL,
  distribution = qnorm,
  ncol = NULL,
  alpha = 1,
  xlab = NULL,
  color = "#444444",
  shape = 19,
  size = 2,
  qqpoint.color = "#444444",
  qqpoint.shape = 19,
  qqpoint.size = 2,
  qqline.color = "#888888",
  qqline.linetype = "dashed",
  qqline.size = 1,
  smooth = TRUE,
  smooth.color = "red",
  smooth.linetype = 1,
  smooth.size = 1,
  fill = NULL,
  ...
)

Arguments

object

An object of class clm, glm, lrm, orm, polr, or vglm.

what

Character string specifying what to plot. Default is "qq" which produces a quantile-quantile plots of the residuals.

x

A vector giving the covariate values to use for residual-by- covariate plots (i.e., when what = "covariate").

fit

The fitted model from which the residuals were extracted. (Only required if what = "fitted" and object inherits from class "resid".)

distribution

Function that computes the quantiles for the reference distribution to use in the quantile-quantile plot. Default is qnorm which is only appropriate for models using a probit link function. When jitter.scale = "probability", the reference distribution is always U(-0.5, 0.5). (Only required if object inherits from class "resid".)

ncol

Integer specifying the number of columns to use for the plot layout (if requesting multiple plots). Default is NULL.

alpha

A single values in the interval [0, 1] controlling the opacity alpha of the plotted points. Only used when nsim > 1.

xlab

Character string giving the text to use for the x-axis label in residual-by-covariate plots. Default is NULL.

color

Character string or integer specifying what color to use for the points in the residual vs fitted value/covariate plot. Default is "black".

shape

Integer or single character specifying a symbol to be used for plotting the points in the residual vs fitted value/covariate plot.

size

Numeric value specifying the size to use for the points in the residual vs fitted value/covariate plot.

qqpoint.color

Character string or integer specifying what color to use for the points in the quantile-quantile plot.

qqpoint.shape

Integer or single character specifying a symbol to be used for plotting the points in the quantile-quantile plot.

qqpoint.size

Numeric value specifying the size to use for the points in the quantile-quantile plot.

qqline.color

Character string or integer specifying what color to use for the points in the quantile-quantile plot.

qqline.linetype

Integer or character string (e.g., "dashed") specifying the type of line to use in the quantile-quantile plot.

qqline.size

Numeric value specifying the thickness of the line in the quantile-quantile plot.

smooth

Logical indicating whether or not too add a nonparametric smooth to certain plots. Default is TRUE.

smooth.color

Character string or integer specifying what color to use for the nonparametric smooth.

smooth.linetype

Integer or character string (e.g., "dashed") specifying the type of line to use for the nonparametric smooth.

smooth.size

Numeric value specifying the thickness of the line for the nonparametric smooth.

fill

Character string or integer specifying the color to use to fill the boxplots for residual-by-covariate plots when x is of class "factor". Default is NULL which colors the boxplots according to the factor levels.

...

Additional optional arguments to be passed onto resids.

Value

A "ggplot" object.

Examples

# See ?resids for an example
?resids

Simulated quadratic data

Description

Data simulated from a probit model with a quadratic trend. The data are described in Example 2 of Liu and Zhang (2017).

Usage

data(df1)

Format

A data frame with 2000 rows and 2 variables.

  • x The predictor variable.

  • y The response variable; an ordered factor.

References

Liu, Dungang and Zhang, Heping. Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach. Journal of the American Statistical Association (accepted).

Examples

head(df1)

Simulated heteroscedastic data

Description

Data simulated from a probit model with heteroscedasticity. The data are described in Example 4 of Liu and Zhang (2017).

Usage

data(df2)

Format

A data frame with 2000 rows and 2 variables.

  • x The predictor variable.

  • y The response variable; an ordered factor.

References

Liu, Dungang and Zhang, Heping. Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach. Journal of the American Statistical Association (accepted).

Examples

head(df2)

Simulated Gumbel data

Description

Data simulated from a log-log model with a quadratic trend. The data are described in Example 3 of Liu and Zhang (2017).

Usage

data(df3)

Format

A data frame with 2000 rows and 2 variables.

  • x The predictor variable.

  • y The response variable; an ordered factor.

References

Liu, Dungang and Zhang, Heping. Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach. Journal of the American Statistical Association (accepted).

Examples

head(df3)

Simulated proportionality data

Description

Data simulated from from two separate probit models. The data are described in Example 5 of Liu and Zhang (2017).

Usage

data(df4)

Format

A data frame with 4000 rows and 2 variables.

  • x The predictor variable.

  • y The response variable; an ordered factor.

References

Liu, Dungang and Zhang, Heping. Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach. Journal of the American Statistical Association (accepted).

Examples

head(df4)

Simulated interaction data

Description

Data simulated from from an ordered probit model with an interaction effect.

Usage

data(df5)

Format

A data frame with 2000 rows and 3 variables.

  • x1 A continuous predictor variable.

  • x2 A factor with two levels: "Control" and "Treatment".

  • y The response variable; an ordered factor.

References

Liu, Dungang and Zhang, Heping. Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach. Journal of the American Statistical Association (accepted).

Examples

head(df5)

Goodness-of-Fit Simulation

Description

Simulate p-values from a goodness-of-fit test.

Usage

gof(object, nsim = 10, test = c("ks", "ad", "cvm"), ...)

## Default S3 method:
gof(object, nsim = 10, test = c("ks", "ad", "cvm"), ...)

## S3 method for class 'gof'
plot(x, ...)

Arguments

object

An object of class clm, glm, lrm, orm, polr, or vglm.

nsim

Integer specifying the number of bootstrap replicates to use.

test

Character string specifying which goodness-of-fit test to use. Current options include: "ks" for the Kolmogorov-Smirnov test, "ad" for the Anderson-Darling test, and "cvm" for the Cramer-Von Mises test. Default is "ks".

...

Additional optional arguments. (Currently ignored.)

x

An object of class "gof".

Details

Under the null hypothesis, the distribution of the p-values should appear uniformly distributed on the interval [0, 1]. This can be visually investigated using the plot method. A 45 degree line is indicative of a "good" fit.

Value

A numeric vector of class "gof", "numeric" containing the simulated p-values.

Examples

# See ?resids for an example
?resids

Surrogate residuals

Description

Simulate surrogate response values for cumulative link regression models using the latent method described in Liu and Zhang (2017).

Usage

resids(
  object,
  nsim = 1L,
  method = c("latent", "jitter"),
  jitter.scale = c("response", "probability"),
  ...
)

Arguments

object

An object of class clm, glm, lrm, orm, polr, or vglm.

nsim

Integer specifying the number of bootstrap replicates to use. Default is 1L meaning no bootstrap samples.

method

Character string specifying which method to use to generate the surrogate response values. Current options are "latent" and "jitter". Default is "latent".

jitter.scale

Character string specifyint the scale on which to perform the jittering whenever method = "jitter". Current options are "response" and "probability". Default is "response".

...

Additional optional arguments. (Currently ignored.)

Value

A numeric vector of class c("numeric", "surrogate") containing the simulated surrogate response values. Additionally, if nsim > 1, then the result will contain the attributes:

boot_reps

A matrix with nsim columns, one for each bootstrap replicate of the surrogate values. Note, these are random and do not correspond to the original ordering of the data;

boot_id

A matrix with nsim columns. Each column contains the observation number each surrogate value corresponds to in boot_reps. (This is used for plotting purposes.)

Note

Surrogate response values require sampling from a continuous distribution; consequently, the result will be different with every call to surrogate. The internal functions used for sampling from truncated distributions are based on modified versions of rtrunc and qtrunc.

For "glm" objects, only the binomial() family is supported.

References

Liu, Dungang and Zhang, Heping. Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach. Journal of the American Statistical Association (accepted). URL http://www.tandfonline.com/doi/abs/10.1080/01621459.2017.1292915?journalCode=uasa20

Nadarajah, Saralees and Kotz, Samuel. R Programs for Truncated Distributions. Journal of Statistical Software, Code Snippet, 16(2), 1-8, 2006. URL https://www.jstatsoft.org/v016/c02.

Examples

# Generate data from a quadratic probit model
set.seed(101)
n <- 2000
x <- runif(n, min = -3, max = 6)
z <- 10 + 3 * x - 1 * x^2 + rnorm(n)
y <- ifelse(z <= 0, yes = 0, no = 1)

# Scatterplot matrix
pairs(~ x + y + z)

# Setup for side-by-side plots
par(mfrow = c(1, 2))

# Misspecified mean structure
fm1 <- glm(y ~ x, family = binomial(link = "probit"))
scatter.smooth(x, y = resids(fm1),
               main = "Misspecified model",
               ylab = "Surrogate residual",
               lpars = list(lwd = 3, col = "red2"))
abline(h = 0, lty = 2, col = "blue2")

# Correctly specified mean structure
fm2 <- glm(y ~ x + I(x ^ 2), family = binomial(link = "probit"))
scatter.smooth(x, y = resids(fm2),
               main = "Correctly specified model",
               ylab = "Surrogate residual",
               lpars = list(lwd = 3, col = "red2"))
abline(h = 0, lty = 2, col = "blue2")

sure: An R package for constructing surrogate-based residuals and diagnostics for ordinal and general regression models.

Description

The sure package provides surrogate-based residuals for fitted ordinal and general (e.g., binary) regression models of class clm, glm, lrm, orm, polr, or vglm.

Details

The development version can be found on GitHub: https://github.com/AFIT-R/sure. As of right now, sure exports the following functions:

  • resids - construct (surrogate-based) residuals;

  • autoplot - plot diagnostics using ggplot2-based graphics;

  • gof - simulate p-values from a goodness-of-fit test.

References

Liu, Dungang and Zhang, Heping. Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach. Journal of the American Statistical Association (accepted).


Surrogate response

Description

Simulate surrogate response values for cumulative link regression models using the latent method described in Liu and Zhang (2017).

Usage

surrogate(
  object,
  nsim = 1L,
  method = c("latent", "jitter"),
  jitter.scale = c("response", "probability"),
  ...
)

Arguments

object

An object of class clm, glm lrm, orm, polr, or vglm.

nsim

Integer specifying the number of bootstrap replicates to use. Default is 1L meaning no bootstrap samples.

method

Character string specifying which method to use to generate the surrogate response values. Current options are "latent" and "jitter". Default is "latent".

jitter.scale

Character string specifyint the scale on which to perform the jittering whenever method = "jitter". Current options are "response" and "probability". Default is "response".

...

Additional optional arguments. (Currently ignored.)

Value

A numeric vector of class c("numeric", "surrogate") containing the simulated surrogate response values. Additionally, if nsim > 1, then the result will contain the attributes:

boot_reps

A matrix with nsim columns, one for each bootstrap replicate of the surrogate values. Note, these are random and do not correspond to the original ordering of the data;

boot_id

A matrix with nsim columns. Each column contains the observation number each surrogate value corresponds to in boot_reps. (This is used for plotting purposes.)

Note

Surrogate response values require sampling from a continuous distribution; consequently, the result will be different with every call to surrogate. The internal functions used for sampling from truncated distributions are based on modified versions of rtrunc and qtrunc.

For "glm" objects, only the binomial() family is supported.

References

Liu, Dungang and Zhang, Heping. Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach. Journal of the American Statistical Association (accepted). URL http://www.tandfonline.com/doi/abs/10.1080/01621459.2017.1292915?journalCode=uasa20

Nadarajah, Saralees and Kotz, Samuel. R Programs for Truncated Distributions. Journal of Statistical Software, Code Snippet, 16(2), 1-8, 2006. URL https://www.jstatsoft.org/v016/c02.

Examples

# Generate data from a quadratic probit model
set.seed(101)
n <- 2000
x <- runif(n, min = -3, max = 6)
z <- 10 + 3*x - 1*x^2 + rnorm(n)
y <- ifelse(z <= 0, yes = 0, no = 1)

# Scatterplot matrix
pairs(~ x + y + z)

# Setup for side-by-side plots
par(mfrow = c(1, 2))

# Misspecified mean structure
fm1 <- glm(y ~ x, family = binomial(link = "probit"))
s1 <- surrogate(fm1)
scatter.smooth(x, s1 - fm1$linear.predictors,
               main = "Misspecified model",
               ylab = "Surrogate residual",
               lpars = list(lwd = 3, col = "red2"))
abline(h = 0, lty = 2, col = "blue2")

# Correctly specified mean structure
fm2 <- glm(y ~ x + I(x ^ 2), family = binomial(link = "probit"))
s2 <- surrogate(fm2)
scatter.smooth(x, s2 - fm2$linear.predictors,
               main = "Correctly specified model",
               ylab = "Surrogate residual",
               lpars = list(lwd = 3, col = "red2"))
abline(h = 0, lty = 2, col = "blue2")