| Title: | Multivariate Covariance Generalized Linear Models |
|---|---|
| Description: | Fitting multivariate covariance generalized linear models (McGLMs) to data. McGLM is a general framework for non-normal multivariate data analysis, designed to handle multivariate response variables, along with a wide range of temporal and spatial correlation structures defined in terms of a covariance link function combined with a matrix linear predictor involving known matrices. The models take non-normality into account in the conventional way by means of a variance function, and the mean structure is modelled by means of a link function and a linear predictor. The models are fitted using an efficient Newton scoring algorithm based on quasi-likelihood and Pearson estimating functions, using only second-moment assumptions. This provides a unified approach to a wide variety of different types of response variables and covariance structures, including multivariate extensions of repeated measures, time series, longitudinal, spatial and spatio-temporal structures. The package offers a user-friendly interface for fitting McGLMs similar to the glm() R function. See Bonat (2018) <doi:10.18637/jss.v084.i04>, for more information and examples. |
| Authors: | Wagner Hugo Bonat [aut, cre] |
| Maintainer: | Wagner Hugo Bonat <[email protected]> |
| License: | GPL-3 | file LICENSE |
| Version: | 0.9.0 |
| Built: | 2026-06-08 07:41:10 UTC |
| Source: | https://github.com/bonatwagner/mcglm |
The Australian Health Survey (AHS) was used by Bonat and Jorgensen (2016) as an example of multivariate count regression modeling. The dataset contains five count response variables related to health system usage and nine covariates related to social conditions in Australia for the years 1987-88.
data(ahs)data(ahs)
A data.frame with 5190 observations and 15 variables:
sexFactor with levels male and female.
ageRespondent's age in years divided by 100.
incomeRespondent's annual income in Australian dollars divided by 1000.
levyplusFactor indicating coverage by private health insurance for private patients in public hospital with doctor of choice (1) or otherwise (0).
freepoorFactor indicating government coverage due to low income, recent immigration, or unemployment (1) or otherwise (0).
freerepaFactor indicating government coverage due to old-age/disability pension, veteran status, or family of deceased veteran (1) or otherwise (0).
illnesNumber of illnesses in the past two weeks, capped at 5.
actdaysNumber of days of reduced activity in the past two weeks due to illness or injury.
hscoreGeneral health questionnaire score (Goldberg's method); higher scores indicate poorer health.
chcondFactor with levels: limited (chronic condition with activity limitation), nonlimited (chronic condition without limitation), otherwise (reference level).
NdocNumber of consultations with a doctor or specialist (response variable).
NndocNumber of consultations with health professionals (response variable).
NadmNumber of admissions to hospital, psychiatric hospital, nursing, or convalescence home in the past 12 months (response variable).
NhospNumber of nights in a hospital during the most recent admission.
NmedTotal number of prescribed and non-prescribed medications used in the past two days.
Deb, P. and Trivedi, P. K. (1997) "Demand for medical care by the elderly: A finite mixture approach." Journal of Applied Econometrics, 12(3):313–336.
Bonat, W. H. and Jorgensen, B. (2016) "Multivariate covariance generalized linear models." Journal of the Royal Statistical Society: Series C, 65:649–675.
## Not run: library(mcglm) data(ahs, package = "mcglm") form1 <- Ndoc ~ income + age form2 <- Nndoc ~ income + age Z0 <- mc_id(ahs) fit.ahs <- mcglm(linear_pred = c(form1, form2), matrix_pred = list(Z0, Z0), link = c("log", "log"), variance = c("poisson_tweedie", "poisson_tweedie"), data = ahs) summary(fit.ahs) ## End(Not run)## Not run: library(mcglm) data(ahs, package = "mcglm") form1 <- Ndoc ~ income + age form2 <- Nndoc ~ income + age Z0 <- mc_id(ahs) fit.ahs <- mcglm(linear_pred = c(form1, form2), matrix_pred = list(Z0, Z0), link = c("log", "log"), variance = c("poisson_tweedie", "poisson_tweedie"), data = ahs) summary(fit.ahs) ## End(Not run)
Performs Wald chi-square tests for assessing the significance of
fixed-effect terms in the linear predictors of an mcglm model.
The tests are conducted separately for each response variable and
are particularly useful for joint hypothesis testing of regression
coefficients associated with categorical covariates with more than
two levels. This function is not intended for model comparison.
## S3 method for class 'mcglm' anova(object, ..., verbose = TRUE)## S3 method for class 'mcglm' anova(object, ..., verbose = TRUE)
object |
An object of class |
... |
Additional arguments. Currently ignored. |
verbose |
Logical indicating whether the Wald test results should be printed
to the console. If |
The Wald tests are computed using the observed covariance matrix of the regression parameter estimates. For each response variable, joint tests are performed for sets of parameters corresponding to the same model term, as defined by the design matrix.
A list of data frames, one for each response variable. Each data frame contains the results of Wald chi-square tests for the fixed-effect terms in the corresponding linear predictor, with the following columns:
Name of the covariate or model term tested.
Value of the Wald chi-square statistic.
Degrees of freedom associated with the test.
P-value of the Wald test.
The returned object is invisible and is primarily intended for programmatic use.
Wagner Hugo Bonat, [email protected]
summary.mcglm, coef.mcglm,
vcov.mcglm
x1 <- seq(-1, 1, length.out = 100) x2 <- gl(5, 20) beta <- c(5, 0, -2, -1, 1, 2) X <- model.matrix(~ x1 + x2) set.seed(123) y <- rnorm(100, mean = X %*% beta, sd = 1) data <- data.frame(y = y, x1 = x1, x2 = x2) fit <- mcglm(c(y ~ x1 + x2), list(mc_id(data)), data = data) anova(fit)x1 <- seq(-1, 1, length.out = 100) x2 <- gl(5, 20) beta <- c(5, 0, -2, -1, 1, 2) X <- model.matrix(~ x1 + x2) set.seed(123) y <- rnorm(100, mean = X %*% beta, sd = 1) data <- data.frame(y = y, x1 = x1, x2 = x2) fit <- mcglm(c(y ~ x1 + x2), list(mc_id(data)), data = data) anova(fit)
Extract regression, dispersion and correlation parameter estimates
from objects of class mcglm.
## S3 method for class 'mcglm' coef( object, std.error = FALSE, response = NA, type = c("beta", "tau", "power", "correlation"), ... )## S3 method for class 'mcglm' coef( object, std.error = FALSE, response = NA, type = c("beta", "tau", "power", "correlation"), ... )
object |
An object of class |
std.error |
Logical indicating whether standard errors should be returned
alongside the parameter estimates. Default is |
response |
Integer vector indicating for which response variables the
coefficients should be returned. If |
type |
Character vector specifying which type of coefficients should be
returned. Possible values are |
... |
Additional arguments. Currently ignored and included for
compatibility with the generic |
A data.frame with one row per parameter, containing:
Estimates: parameter estimates;
Std.error: standard errors (if requested);
Parameters: parameter names;
Type: parameter type;
Response: response variable index.
Wagner Hugo Bonat, [email protected]
Computes Wald-type confidence intervals for parameter estimates
from a fitted mcglm model, based on asymptotic normality.
## S3 method for class 'mcglm' confint(object, parm, level = 0.95, ...)## S3 method for class 'mcglm' confint(object, parm, level = 0.95, ...)
object |
A fitted object of class |
parm |
Optional specification of parameters for which confidence intervals are required. Can be a numeric vector of indices or a character vector of parameter names. If omitted, confidence intervals for all parameters are returned. |
level |
Numeric value giving the confidence level. Must be between 0 and 1.
Default is |
... |
Additional arguments. Currently ignored and included for
compatibility with the generic
|
A numeric matrix with two columns corresponding to the lower and upper confidence limits. Rows correspond to model parameters.
Wagner Hugo Bonat, [email protected]
Extract the generalized error sum of squares (ESS) for
objects of mcglm class.
ESS(object, verbose = TRUE)ESS(object, verbose = TRUE)
object |
an object or a list of objects representing a model
of |
verbose |
logical. Print or not the ESS value. |
An invisible list with a single numeric component:
The generalized error sum of squares, scaled by the residual degrees of freedom.
Wagner Hugo Bonat, [email protected]
Bonat, W. H. (2018). Multiple Response Variables Regression Models in R: The mcglm Package. Journal of Statistical Software, 84(4):1–30.
Wang, M. (2014). Generalized Estimating Equations in Longitudinal Data Analysis: A Review and Recent Developments. Advances in Statistics, 1(1)1–13.
gof, plogLik, pAIC, pKLIC,
GOSHO and RJC.
This function implements the two main algorithms used for fitting multivariate covariance generalized linear models. The chaser and the reciprocal likelihood algorithms.
fit_mcglm(list_initial, list_link, list_variance, list_covariance, list_X, list_Z, list_offset, list_Ntrial, list_power_fixed, list_sparse, y_vec, correct, max_iter, tol, method, tuning, verbose, weights)fit_mcglm(list_initial, list_link, list_variance, list_covariance, list_X, list_Z, list_offset, list_Ntrial, list_power_fixed, list_sparse, y_vec, correct, max_iter, tol, method, tuning, verbose, weights)
list_initial |
a list of initial values for regression and covariance parameters. |
list_link |
a list specifying the link function names. |
list_variance |
a list specifying the variance function names.
Options are: |
list_covariance |
a list of covariance function names. Options
are: |
list_X |
a list of design matrices.
See |
list_Z |
a list of knowm matrices to compose the matrix linear predictor. |
list_offset |
a list of offset values. Default NULL. |
list_Ntrial |
a list of number of trials, useful only when analysing binomial data. Default 1. |
list_power_fixed |
a list of logicals indicating if the power
parameters should be estimated or not. Default |
list_sparse |
a list of logicals indicating if the matrices should be set up as sparse matrices. This argument is useful only when using exponential-matrix covariance link function. In the other cases the algorithm detects automatically if the matrix should be sparse or not. |
y_vec |
a vector of the stacked response variables. |
correct |
a logical indicating if the algorithm will use the
correction term or not. Default |
max_iter |
maximum number of iterations. Default |
tol |
a numeric specyfing the tolerance. Default |
method |
a string specyfing the method used to fit the models
( |
tuning |
a numeric value in general close to zero for the rc
method and close to 1 for the chaser method. This argument control
the step-length. Default |
verbose |
a logical if TRUE print the values of the covariance
parameters used on each iteration. Default |
weights |
Vector of weights for model fitting. |
A list with the results of the estimation procedure for multivariate covariance generalized linear models.
The object contains regression parameter estimates, covariance (dispersion) parameter estimates, fitted values, residuals and information about the iterative fitting process, such as the number of iterations, convergence status, sensitivity and variability matrices.
The returned object is intended for internal use. End users should
rely on the output provided by mcglm, which wraps this
function.
Wagner Hugo Bonat, [email protected]
Bonat, W. H. and Jorgensen, B. (2016) Multivariate covariance generalized linear models. Journal of Royal Statistical Society - Series C 65:649–675.
Bonat, W. H. (2018). Multiple Response Variables Regression Models in R: The mcglm Package. Journal of Statistical Software, 84(4):1–30.
mcglm, mc_matrix_linear_predictor,
mc_link_function and mc_variance_function.
Extract fitted (mean) values from a fitted mcglm model.
For multivariate responses, fitted values are returned in matrix
form, with one column per response variable.
## S3 method for class 'mcglm' fitted(object, ...)## S3 method for class 'mcglm' fitted(object, ...)
object |
A fitted object of class |
... |
Additional arguments. Currently ignored and included for
compatibility with the generic
|
A numeric matrix of fitted values. Rows correspond to observations and columns correspond to response variables.
Wagner Hugo Bonat, [email protected]
Extract the pseudo Gaussian log-likelihood (plogLik),
pseudo Akaike Information Criterion (pAIC), pseudo Kullback-Leibler
Information Criterion (pKLIC) and pseudo Bayesian Information Criterion (pBIC)
for objects of mcglm class.
gof(object)gof(object)
object |
an object or a list of objects representing a model
of |
A data frame with the following columns:
Numeric value of the pseudo Gaussian log-likelihood.
Integer giving the number of estimated parameters.
Numeric value of the pseudo Akaike Information Criterion.
Numeric value of the pseudo Kullback–Leibler Information Criterion.
Numeric value of the pseudo Bayesian Information Criterion.
Wagner Hugo Bonat, [email protected]
Bonat, W. H. (2018). Multiple Response Variables Regression Models in R: The mcglm Package. Journal of Statistical Software, 84(4):1–30.
Wang, M. (2014). Generalized Estimating Equations in Longitudinal Data Analysis: A Review and Recent Developments. Advances in Statistics, 1(1)1–13.
plogLik, pAIC, pKLIC and pBIC.
Extract the Gosho Information Criterion (GOSHO)
for an object of mcglm class.
WARNING: This function is limited to models with ONE response variable.
This function is general useful only for longitudinal data analysis.
GOSHO(object, id, verbose = TRUE)GOSHO(object, id, verbose = TRUE)
object |
an object of |
id |
a vector which identifies the clusters or groups. The length and order of id should be the same as the number of observations. Data are assumed to be sorted so that observations on a cluster are contiguous rows for all entities in the formula. |
verbose |
logical. Print or not the GOSHO value. |
An invisible list with a single numeric component:
The Gosho Information Criterion, computed for longitudinal data with a single response variable and scaled by the number of clusters.
Wagner Hugo Bonat, [email protected]
Wang, M. (2014). Generalized Estimating Equations in Longitudinal Data Analysis: A Review and Recent Developments. Advances in Statistics, 1(1)1–13.
gof, plogLik, pAIC, pKLIC,
ESS and RJC.
This dataset contains a case study analyzed in Bonat et al. (2017) regarding animals hunted in the village of Basile Fang, Bioko Norte Province, Bioko Island, Equatorial Guinea. Monthly counts of blue duikers and other small animals shot or snared were collected from a random sample of 52 commercial hunters between August 2010 and September 2013. For each animal caught, the species, sex, capture method, and altitude were recorded. The dataset contains 1216 observations.
data(Hunting)data(Hunting)
A data.frame with 1216 observations and 11 variables:
ALTFactor with five levels indicating the altitude where the animal was caught.
SEXFactor with levels Female and Male.
METHODFactor with levels Escopeta and Trampa indicating the method of capture.
OTMonthly number of other small animals hunted.
BDMonthly number of blue duikers hunted.
OFFSETMonthly number of hunter days.
HUNTERHunter index.
MONTHMonth index.
MONTHCALENDARMonth as calendar number (1 = January, ..., 12 = December).
YEARCalendar year (2010–2013).
HUNTER.MONTHIndex indicating observations taken for the same hunter and month.
Bonat, W. H., et al. (2017). "Modelling the covariance structure in marginal multivariate count models: Hunting in Bioko Island." Journal of Agricultural, Biological and Environmental Statistics, 22(4):446–464.
Bonat, W. H. (2018). "Multiple Response Variables Regression Models in R: The mcglm Package." Journal of Statistical Software, 84(4):1–30.
## Not run: library(mcglm) library(Matrix) data(Hunting, package = "mcglm") formu <- OT ~ METHOD*ALT + SEX + ALT*poly(MONTH, 4) Z0 <- mc_id(Hunting) Z1 <- mc_mixed(~0 + HUNTER.MONTH, data = Hunting) fit <- mcglm(linear_pred = c(formu), matrix_pred = list(c(Z0, Z1)), link = c("log"), variance = c("poisson_tweedie"), power_fixed = c(FALSE), control_algorithm = list(max_iter = 100), offset = list(log(Hunting$OFFSET)), data = Hunting) summary(fit) anova(fit) ## End(Not run)## Not run: library(mcglm) library(Matrix) data(Hunting, package = "mcglm") formu <- OT ~ METHOD*ALT + SEX + ALT*poly(MONTH, 4) Z0 <- mc_id(Hunting) Z1 <- mc_mixed(~0 + HUNTER.MONTH, data = Hunting) fit <- mcglm(linear_pred = c(formu), matrix_pred = list(c(Z0, Z1)), link = c("log"), variance = c("poisson_tweedie"), power_fixed = c(FALSE), control_algorithm = list(max_iter = 100), offset = list(log(Hunting$OFFSET)), data = Hunting) summary(fit) anova(fit) ## End(Not run)
Performs Wald chi-square tests for dispersion (covariance) parameters
by response variable in multivariate covariance generalized linear
models fitted with mcglm. This function is intended for
joint hypothesis testing of dispersion coefficients associated with
categorical covariates with more than two levels. It is not designed
for model comparison.
mc_anova_disp(object, idx_list, names_list, ...)mc_anova_disp(object, idx_list, names_list, ...)
object |
An object of class |
idx_list |
A list of integer vectors indexing dispersion parameters to be jointly tested for each response. |
names_list |
A list of character vectors with covariate names to be displayed in the output tables. |
... |
Currently not used. |
The object is a list of data frames, one per response variable. Each data frame contains the following columns:
Name of the covariate associated with the dispersion parameters being tested.
Wald chi-square test statistic.
Degrees of freedom of the test.
P-value associated with the chi-square test.
Compute bias-corrected standard error for regression
parameters in the context of clustered observations for an
object of mcglm class. It is also robust and has improved
finite sample properties.
mc_bias_corrected_std(object, id)mc_bias_corrected_std(object, id)
object |
an object of |
id |
a vector which identifies the clusters. The length and
order of |
A list with two elements. A vector of standard error and a variance-covariance matrix computed based on a bias corrected approach as described in the reference.
Wagner Hugo Bonat, [email protected]
Nuamah, I. F. and Qu, Y. and Aminu, S. B. (1996). A SAS macro for stepwise correlated binary regression. Computer Methods and Programs in Biomedicine 49, 199–210.
mc_robust_std.
The function mc_car helps to build the components
of the matrix linear predictor used for fitting conditional
autoregressive models. This function is used in general for fitting
spatial areal data using the well known conditional autoregressive
models (CAR). This function depends on a list of neighboors, such a
list can be constructed, for example using the
tri2nb function from the spdep package
based on spatial coordinates. This way to specify the matrix linear
predictor can also be applied for spatial continuous data,
as an approximation.
mc_car(list_neigh, intrinsic = FALSE)mc_car(list_neigh, intrinsic = FALSE)
list_neigh |
list of neighboors. |
intrinsic |
logical. |
A list of a matrix (intrinsic = TRUE) or two matrices
(intrinsic = FALSE). The main use of these matrices are as input in the
mcglm function as linear covariance models in the argument matrix_pred.
Wagner Hugo Bonat, [email protected]
Bonat, W. H. (2018). Multiple Response Variables Regression Models in R: The mcglm Package. Journal of Statistical Software, 84(4):1–30.
mc_id, mc_compute_rho, mc_conditional_test,
mc_dist, mc_ma, mc_rw
and mc_mixed.
The function mc_complete_data completes a data
set with NA values for helping to construct the components of the
matrix linear predictor in models that require equal number of
observations by subjects (id).
mc_complete_data(data, id, index, id.exp)mc_complete_data(data, id, index, id.exp)
data |
a data.frame to be completed with NA. |
id |
name of the column (string) containing the subject id. |
index |
name of the column (string) containing the index to be completed. |
id.exp |
how the index is expected to be for all subjects. |
A data.frame with the same number of observations by subject. It is intended as a helper function to build the linear matrix predictor for models that require the same number of observations by subjects.
Wagner Hugo Bonat, [email protected]
Bonat, W. H. (2018). Multiple Response Variables Regression Models in R: The mcglm Package. Journal of Statistical Software, 84(4):1–30.
mc_dglm, mc_ns, mc_ma and mc_rw.
Compute autocorrelation estimates based on a fitted model
using the mc_car structure. The mcglm approach fits
models using a linear covariance structure, but in general in this
parametrization for spatial models the parameters have no simple
interpretation in terms of spatial autocorrelation.
The function mc_compute_rho computes the autocorrelation
based on a fitted model.
mc_compute_rho(object, level = 0.975)mc_compute_rho(object, level = 0.975)
object |
an object or a list of objects representing a model
of |
level |
the confidence level required. |
Returns a data frame with parameter estimate, standard error and
confidential interval for the spatial autocorrelation parameter. It is used
in the case of spatial models using the mc_car specification.
Wagner Hugo Bonat, [email protected]
Bonat, W. H. (2018). Multiple Response Variables Regression Models in R: The mcglm Package. Journal of Statistical Software, 84(4):1–30.
mc_car and mc_conditional_test.
Compute conditional hypotheses tests for fitted
mcglm model class.
When fitting models with extra power parameters, the standard errors
associated with the dispersion parameters can be large. In that cases,
we suggest to conduct conditional hypotheses test instead of the
orthodox marginal test for the dispersion parameters.
The function mc_conditional_test offers an ease way to
conduct such conditional test. Furthermore, the function is quite
flexible and can be used for any other conditional hypotheses test.
mc_conditional_test(object, parameters, test, fixed)mc_conditional_test(object, parameters, test, fixed)
object |
an object representing a model of |
parameters |
which parameters will be included in the conditional test. |
test |
index indicating which parameters will be tested given the values of the other parameters. |
fixed |
index indicating which parameters should be fixed on the conditional test. |
Returns a data frame with parameter estimates, conditional standard errors and Z-statistics. It is used to compute conditional hypothesis tests in mcglm fitted models.
Wagner Hugo Bonat, [email protected]
Bonat, W. H. (2018). Multiple Response Variables Regression Models in R: The mcglm Package. Journal of Statistical Software, 84(4):1–30.
The function mc_dglm builds the components
of the matrix linear predictor used for fitting double generalized
linear models.
mc_dglm(formula, id, data)mc_dglm(formula, id, data)
formula |
a formula spefying the components of the covariance structure. |
id |
name of the column (string) containing the subject index. (If ts is not repeated measures, use id = 1 for all observations). |
data |
data set. |
A list containing diagonal matrices with entries defined by the
covariates assumed to describe the matrix linear predictor.
Each matrix corresponds to one component of the covariance model and
is intended to be supplied to the matrix_pred argument of
mcglm.
Wagner Hugo Bonat, [email protected]
Bonat, W. H. (2018). Multiple Response Variables Regression Models in R: The mcglm Package. Journal of Statistical Software, 84(4):1–30.
mc_id, mc_dist, mc_ma, mc_rw
and mc_mixed.
The function mc_dist helps to build the components
of the matrix linear predictor using matrices based on distances.
This function is generaly used for the analysis of longitudinal and
spatial data. The idea is to use the inverse of some measure of distance
as for example the Euclidean distance to model the covariance structure
within response variables. The model can also use the inverse of
distance squared or high order power.
mc_dist(id, time, data, method = "euclidean")mc_dist(id, time, data, method = "euclidean")
id |
name of the column (string) containing the subject index.
For spatial data use the same |
time |
name of the column (string) containing the index indicating the time. For spatial data use the same index for all observations. |
data |
data set. |
method |
distance measure to be used. |
The distance measure must be one of "euclidean",
"maximum", "manhattan", "canberra",
"binary" or "minkowski". This function is a customize
call of the dist function.
A list containing a sparse matrix of class dgCMatrix.
This matrix represents the design matrix for the linear predictor and
is intended to be supplied to the matrix_pred argument of
mcglm.
Wagner Hugo Bonat, [email protected]
Bonat, W. H. (2018). Multiple Response Variables Regression Models in R: The mcglm Package. Journal of Statistical Software, 84(4):1–30.
dist, mc_id,
mc_conditional_test, mc_car, mc_ma,
mc_rw and mc_mixed.
id <- rep(1:2, each = 4) time <- rep(1:4, 2) data <- data.frame("id" = id, "time" = time) mc_dist(id = "id", time = "time", data = data) mc_dist(id = "id", time = "time", data = data, method = "canberra")id <- rep(1:2, each = 4) time <- rep(1:4, 2) data <- data.frame("id" = id, "time" = time) mc_dist(id = "id", time = "time", data = data) mc_dist(id = "id", time = "time", data = data, method = "canberra")
Builds an identity matrix to be used as a component of the matrix linear predictor. It is in general the first component of the matrix linear predictor, a kind of intercept matrix.
mc_id(data)mc_id(data)
data |
the data set to be used. |
A list with a single component:
A sparse identity matrix of class ddiMatrix with dimension
equal to the number of observations, representing the intercept component
of the matrix linear predictor.
It is intended to be supplied to the matrix_pred argument of
mcglm.
Wagner Hugo Bonat, [email protected]
Bonat, W. H. (2018). Multiple Response Variables Regression Models in R: The mcglm Package. Journal of Statistical Software, 84(4):1–30.
mc_dist, mc_ma, mc_rw and mc_mixed.
This function provides initial values to be used when
fitting multivariate covariance generalized linear models by using
the function mcglm. In general the users do not need to use
this function, since it is already employed when setting the argument
control_initial = "automatic" in the mcglm function.
However, if the users want to change some of the initial values,
this function can be useful.
mc_initial_values(linear_pred, matrix_pred, link, variance, covariance, offset, Ntrial, contrasts, data)mc_initial_values(linear_pred, matrix_pred, link, variance, covariance, offset, Ntrial, contrasts, data)
linear_pred |
a list of formula see |
matrix_pred |
a list of known matrices to be used on the matrix
linear predictor. |
link |
a list of link functions names, see
|
variance |
a list of variance functions names, see
|
covariance |
a list of covariance link functions names, see
|
offset |
a list of offset values if any. |
Ntrial |
a list of the number of trials on Bernoulli
experiments. It is useful only for |
contrasts |
list of contrasts to be used in the
|
data |
data frame. |
To obtain initial values for multivariate covariance
generalized linear models the function mc_initial_values fits
a generalized linear model (GLM) using the function glm with
the specified linear predictor and link function for each response
variables considering independent observations. The family
argument is always specified as quasi. The link function depends
on the specification of the argument link.
The variance function is always specified as "mu" the only
excession appears when using variance = "constant" then the
family argument in the glm function is specified as
quasi(link = link, variance = "constant"). The estimated value
of the dispersion parameter from the glm function is used as
initial value for the first component of the matrix linear predictor,
for all other components the value zero is used.
For the
cases covariance = "inverse" and covariance = "expm"
the inverse and the logarithm of the estimated dispersion parameter
is used as initial value for the first component of the matrix linear
predictor. The value of the power parameter is always started at 1.
In the cases of multivariate models the correlation between response
variables is always started at 0.
A list containing initial values for model parameters:
A list of numeric vectors with initial values for the regression parameters of each response variable.
A list of numeric vectors with initial values for the power parameters associated with the variance functions.
A list of numeric vectors with initial values for the dispersion and covariance-related parameters in the matrix linear predictor.
A numeric vector with initial values for the correlation parameters between response variables.
Wagner Hugo Bonat, [email protected]
The mc_link_function is a customized call of the
make.link function.
Given the name of a link function, it returns a list with two
elements. The first element is the inverse of the link function
applied on the linear predictor The
second element is the derivative of with respect to the
regression parameters .
It will be useful when computing the quasi-score function.
mc_link_function(beta, X, offset, link) mc_logit(beta, X, offset) mc_probit(beta, X, offset) mc_cauchit(beta, X, offset) mc_cloglog(beta, X, offset) mc_loglog(beta, X, offset) mc_identity(beta, X, offset) mc_log(beta, X, offset) mc_sqrt(beta, X, offset) mc_invmu2(beta, X, offset) mc_inverse(beta, X, offset)mc_link_function(beta, X, offset, link) mc_logit(beta, X, offset) mc_probit(beta, X, offset) mc_cauchit(beta, X, offset) mc_cloglog(beta, X, offset) mc_loglog(beta, X, offset) mc_identity(beta, X, offset) mc_log(beta, X, offset) mc_sqrt(beta, X, offset) mc_invmu2(beta, X, offset) mc_inverse(beta, X, offset)
beta |
a numeric vector of regression parameters. |
X |
a design matrix, see |
offset |
a numeric vector of offset values. It will be sum up on
the linear predictor as a covariate with known regression
parameter equals one ( |
link |
a string specifying the name of the link function.
Options are: |
The link function is an important component of the
multivariate covariance generalized linear models, since it links
the expectation of the response variable with the covariates.
Let be a (p x 1) regression parameter vector and
be an (n x p) design matrix. The expected value of
the response variable is given by
where is the link function and is the linear predictor. Let be a (n x p)
matrix whose entries are given by the derivatives of
with respect to . Such a matrix will be required for the
fitting algorithm. The function mc_link_function returns a
list where the first element is (n x 1) vector
and the second is the D (n x p) matrix.
A user defined function can also be used. It must be a function
with arguments beta, X and offset
(set to NULL if non needed). The function must return a
length 2 named list with mu and D elements as a
vector and a matrix of proper dimensions.
A list with the following components:
A numeric vector of length containing the mean
response values obtained by applying the inverse link function to
the linear predictor.
A numeric matrix of dimension containing
the derivatives of with respect to the regression parameters
.
Wagner Hugo Bonat, [email protected]
x1 <- seq(-1, 1, l = 5) X <- model.matrix(~ x1) mc_link_function(beta = c(1,0.5), X = X, offset = NULL, link = 'log') mc_link_function(beta = c(1,0.5), X = X, offset = rep(10,5), link = 'identity')x1 <- seq(-1, 1, l = 5) X <- model.matrix(~ x1) mc_link_function(beta = c(1,0.5), X = X, offset = NULL, link = 'log') mc_link_function(beta = c(1,0.5), X = X, offset = rep(10,5), link = 'identity')
Builds components of the matrix linear predictor associated with moving average (MA) covariance structures. This function is mainly intended for longitudinal data analysis, but can also be used for time series data
mc_ma(id, time, data, order = 1)mc_ma(id, time, data, order = 1)
id |
name of the column (string) containing the subject index.
Note that this structure was designed to deal with longitudinal data.
For times series data use the same |
time |
name of the column (string) containing the index indicating the time. |
data |
data set. |
order |
An integer specifying the order of the moving average process. |
This function was primarily designed for longitudinal data,
but it can also be used for time series analysis. In this case, the
id argument should contain a single identifier, representing
one observational unit. Internally, the function constructs block-diagonal
band matrices using bandSparse.
A list with the following component:
A sparse matrix of class nsCMatrix representing the
moving average component of the matrix linear predictor. The matrix
has dimension equal to the total number of observations and is
constructed as a block-diagonal matrix, with one block per subject
(or time series), each block encoding a moving average structure of
the specified order.
Wagner Hugo Bonat, [email protected]
Bonat, W. H. (2018). Multiple Response Variables Regression Models in R: The mcglm Package. Journal of Statistical Software, 84(4):1–30.
mc_id, mc_dist, mc_car,
mc_rw and mc_mixed.
id <- rep(1:2, each = 4) time <- rep(1:4, 2) data <- data.frame("id" = id, "time" = time) mc_ma(id = "id", time = "time", data = data, order = 1) mc_ma(id = "id", time = "time", data = data, order = 2)id <- rep(1:2, each = 4) time <- rep(1:4, 2) data <- data.frame("id" = id, "time" = time) mc_ma(id = "id", time = "time", data = data, order = 1) mc_ma(id = "id", time = "time", data = data, order = 2)
Performs a MANOVA-type Wald test for multivariate covariance
generalized linear models fitted using mcglm.
The test is based on quadratic forms of the estimated regression
parameters and their covariance matrix, yielding statistics
analogous to the Hotelling–Lawley trace.
mc_manova(object, ...)mc_manova(object, ...)
object |
An object of class |
... |
Further arguments (currently not used). |
A data frame containing the MANOVA-type test results with the following columns:
Names of the tested model effects.
Degrees of freedom associated with each effect.
Hotelling–Lawley trace statistic.
Chi-square test statistic.
P-values from the chi-square approximation.
Wagner Hugo Bonat, [email protected]
Performs a MANOVA-type Wald test for the dispersion parameters
of multivariate covariance generalized linear models fitted
using mcglm. The test is based on quadratic forms
of the estimated dispersion parameters and their covariance
matrix, yielding statistics analogous to the Hotelling–Lawley trace.
mc_manova_disp(object, idx, effect_names, ...)mc_manova_disp(object, idx, effect_names, ...)
object |
An object of class |
idx |
An integer vector defining the grouping structure of dispersion parameters to be tested. |
effect_names |
A character vector with labels for the tested dispersion effects. |
... |
Further arguments (currently not used). |
A data frame containing the MANOVA-type test results for the dispersion parameters with the following columns:
Names of the tested dispersion effects.
Degrees of freedom associated with each effect.
Hotelling–Lawley trace statistic.
Chi-square test statistic.
P-values from the chi-square approximation.
Wagner Hugo Bonat, [email protected]
mcglm, mc_manova,
coef.mcglm, vcov.mcglm
Computes the matrix linear predictor used in multivariate covariance generalized linear models. The matrix linear predictor is defined as a linear combination of known matrices weighted by dispersion parameters.
mc_matrix_linear_predictor(tau, Z)mc_matrix_linear_predictor(tau, Z)
tau |
A numeric vector of dispersion parameters. |
Z |
A list of known matrices with compatible dimensions. |
Given a list of known matrices and a vector
of dispersion parameters , this function
computes their weighted sum. This object is typically used as a
component of the matrix linear predictor in covariance modeling.
A matrix of class Matrix representing the
matrix linear predictor
The returned matrix has the same dimensions as the elements of Z.
The returned object is intended for internal use only.
Wagner Hugo Bonat, [email protected]
Bonat, W. H. (2018). Multiple Response Variables Regression Models in R: The mcglm Package. Journal of Statistical Software, 84(4):1–30.
Bonat, W. H. and Jorgensen, B. (2016). Multivariate covariance generalized linear models. Journal of the Royal Statistical Society: Series C, 65:649–675.
mc_id, mc_dist, mc_ma, mc_rw,
mc_mixed, mc_car
Z0 <- Matrix::Diagonal(5, 1) Z1 <- Matrix::Matrix(rep(1, 5) %*% t(rep(1, 5))) Z <- list(Z0, Z1) mc_matrix_linear_predictor(tau = c(1, 0.8), Z = Z)Z0 <- Matrix::Diagonal(5, 1) Z1 <- Matrix::Matrix(rep(1, 5) %*% t(rep(1, 5))) Z <- list(Z0, Z1) mc_matrix_linear_predictor(tau = c(1, 0.8), Z = Z)
Constructs the components of the matrix linear predictor associated with mixed-effects covariance structures in multivariate covariance generalized linear models. The function builds symmetric matrices representing variance and covariance components as functions of known covariates, following a linear mixed model formulation.
The mc_mixed function is primarily intended for repeated measures
and longitudinal data, where observations are collected within a fixed
number of groups, subjects, or experimental units.
mc_mixed(formula, data)mc_mixed(formula, data)
formula |
A model formula specifying the structure of the matrix
linear predictor for the dispersion component. The first term must
remove the intercept ( |
data |
A |
The formula argument follows a syntax similar to that used for
linear mixed models. The grouping variable must be provided as the
second term in the formula and must be a factor; no internal
coercion is performed. Covariates specified after the slash
(/) may be continuous or categorical and define additional
variance and covariance components. When only the grouping variable
is specified (e.g., ~ 0 + SUBJECT), the resulting structure
corresponds to the compound symmetry covariance model.
By default, all pairwise interaction terms between components are included in the matrix linear predictor. Interaction terms may be excluded by removing the corresponding components from the resulting list.
A list of symmetric sparse matrices of class "dsCMatrix", each
corresponding to a variance or covariance component of the matrix
linear predictor for the dispersion structure. The list includes
matrices associated with main effects and, by default, their pairwise
interaction terms as implied by the mixed-effects specification in the
formula. These matrices are used internally to construct the linear
predictor of the covariance model in mcglm. It is intended to be
supplied to the matrix_pred argument of mcglm.
Wagner Hugo Bonat, [email protected]
Bonat, W. H. (2018). Multiple Response Variables Regression Models in R: The mcglm Package. Journal of Statistical Software, 84(4), 1–30.
Bonat, W. H., et al. (2016). Modelling the covariance structure in marginal multivariate count models: Hunting in Bioko Island. Journal of Agricultural, Biological, and Environmental Statistics, 22(4), 446–464.
mc_id, mc_conditional_test, mc_dist,
mc_ma, mc_rw, mc_car
SUBJECT <- gl(2, 6) x1 <- rep(1:6, 2) x2 <- rep(gl(2, 3), 2) data <- data.frame(SUBJECT, x1, x2) # Compound symmetry structure mc_mixed(~ 0 + SUBJECT, data = data) # Compound symmetry with random slope for x1 mc_mixed(~ 0 + SUBJECT/x1, data = data) # Compound symmetry with random slopes for x1 and x2 and interactions mc_mixed(~ 0 + SUBJECT/(x1 + x2), data = data)SUBJECT <- gl(2, 6) x1 <- rep(1:6, 2) x2 <- rep(gl(2, 3), 2) data <- data.frame(SUBJECT, x1, x2) # Compound symmetry structure mc_mixed(~ 0 + SUBJECT, data = data) # Compound symmetry with random slope for x1 mc_mixed(~ 0 + SUBJECT/x1, data = data) # Compound symmetry with random slopes for x1 and x2 and interactions mc_mixed(~ 0 + SUBJECT/(x1 + x2), data = data)
Constructs the components of the matrix linear predictor associated with a fully non-structured covariance model in multivariate covariance generalized linear models. This specification allows each pair of observations within a unit to have its own covariance parameter, resulting in a highly flexible but parameter-intensive model.
Due to the quadratic growth in the number of parameters, this structure is typically suitable only for datasets with a small number of repeated measurements per unit.
mc_ns(id, data, group = NULL, marca = NULL)mc_ns(id, data, group = NULL, marca = NULL)
id |
A character string giving the name of the column in
|
data |
A |
group |
An optional character string giving the name of a column in
|
marca |
An optional character string specifying the level of
|
The function requires a balanced design, meaning that all units
identified by id must have the same number of observations.
An error is raised otherwise. When group and marca are
provided, covariance components are generated only for units not
belonging to the specified level marca; for those units, the
corresponding blocks are set to zero.
A list of symmetric block-diagonal matrices, each representing one
covariance component of the non-structured matrix linear predictor.
The length of the list is equal to , where is
the number of observations per unit. Each element of the list is a
sparse matrix of class "dgCMatrix" obtained by stacking unit-
specific covariance blocks along the diagonal. These matrices are used
internally to construct the dispersion linear predictor in
mcglm.
Wagner Hugo Bonat, [email protected]
Bonat, W. H. (2018). Multiple Response Variables Regression Models in R: The mcglm Package. Journal of Statistical Software, 84(4), 1–30.
mc_id, mc_dglm, mc_dist, mc_ma,
mc_rw, mc_mixed
Computes cluster-robust (sandwich-type) standard errors for the
regression parameters of an object of class mcglm, accounting
for within-cluster correlation.
mc_robust_std(object, id)mc_robust_std(object, id)
object |
An object of class |
id |
An integer or factor vector identifying clusters or subjects. Its length and ordering must match the number and ordering of the observations used to fit the model. |
The robust variance–covariance matrix is obtained using an empirical estimator based on clustered residuals and the sensitivity matrix of the estimating equations. The implementation assumes that the data are correctly ordered such that observations belonging to the same cluster are stored in contiguous rows.
A list with two components:
A numeric vector containing the robust standard errors of the regression parameter estimates.
A numeric matrix giving the robust variance–covariance matrix of the regression parameter estimates.
The returned objects are computed under the assumption that the data are in the correct cluster order.
Wagner Hugo Bonat, [email protected]
Nuamah, I. F., Qu, Y., and Aminu, S. B. (1996). A SAS macro for stepwise correlated binary regression. Computer Methods and Programs in Biomedicine, 49, 199–210.
mc_bias_correct_std
Constructs the components of the matrix linear predictor associated with random walk (RW) models for longitudinal or time series data. The user may specify the order of the random walk process.
mc_rw(id, time, data, order = 1, proper = FALSE)mc_rw(id, time, data, order = 1, proper = FALSE)
id |
A character string giving the name of the column in
|
time |
A character string giving the name of the column in
|
data |
A data frame containing the variables specified in
|
order |
A positive integer specifying the order of the random walk model. |
proper |
Logical indicating whether a proper random walk specification should be used. |
This function builds sparse precision matrix components corresponding
to random walk structures of a given order. It is primarily intended
for longitudinal data indexed by a subject identifier and a time
variable. For pure time series data, the same id value should
be used for all observations. When proper = TRUE, the precision
structure is decomposed into diagonal and off-diagonal components.
If proper = FALSE, a list with a single component:
A sparse matrix of class dgCMatrix representing the
random walk precision structure.
If proper = TRUE, a list with two components:
A sparse diagonal matrix of class dgCMatrix.
A sparse off-diagonal matrix of class dgCMatrix.
The matrices are ordered consistently with the original data.
Wagner Hugo Bonat, [email protected]
Bonat, W. H. (2018). Multiple Response Variables Regression Models in R: The mcglm Package. Journal of Statistical Software, 84(4), 1–30.
mc_id, mc_dist, mc_car,
mc_ma, mc_mixed, mc_compute_rho
id <- rep(1:2, each = 4) time <- rep(1:4, 2) data <- data.frame(id = id, time = time) mc_rw(id = "id", time = "time", data = data, order = 1, proper = FALSE) mc_rw(id = "id", time = "time", data = data, order = 1, proper = TRUE) mc_rw(id = "id", time = "time", data = data, order = 2, proper = TRUE)id <- rep(1:2, each = 4) time <- rep(1:4, 2) data <- data.frame(id = id, time = time) mc_rw(id = "id", time = "time", data = data, order = 1, proper = FALSE) mc_rw(id = "id", time = "time", data = data, order = 1, proper = TRUE) mc_rw(id = "id", time = "time", data = data, order = 2, proper = TRUE)
Computes the Score Information Criterion (SIC) for regression
components of a fitted mcglm object. The SIC can be used for
selecting covariates in the linear predictor and supports stepwise
selection procedures.
mc_sic(object, scope, data, response, penalty = 2, weights)mc_sic(object, scope, data, response, penalty = 2, weights)
object |
An object of class |
scope |
A character vector with the names of covariates to be tested for inclusion in the linear predictor. |
data |
A data frame containing all variables involved in the model. |
response |
An integer indicating the response variable for which the SIC is computed. |
penalty |
A numeric penalty term applied to the SIC (default is 2). |
weights |
An optional numeric vector of weights used in model fitting. If not provided, unit weights are assumed. |
The SIC is computed using the quasi-score function associated with the
regression parameters. For each candidate covariate in scope,
the method evaluates its contribution via a score-based test statistic
and applies a penalty for model complexity.
A data frame with the following columns:
Score Information Criterion value.
Name of the candidate covariate.
Degrees of freedom associated with the test.
Total number of regression parameters in the extended model.
Score-based test statistic.
Reference chi-squared quantile with 95% confidence level.
Bonat, W. H. (2018). Multiple Response Variables Regression Models in R: The mcglm Package. Journal of Statistical Software, 84(4), 1–30.
Bonat, W. H., et al. (2016). Modelling the covariance structure in marginal multivariate count models: Hunting in Bioko Island. Journal of Agricultural, Biological and Environmental Statistics, 22(4), 446–464.
set.seed(123) x1 <- runif(100, -1, 1) x2 <- gl(2, 50) beta <- c(5, 0, 3) X <- model.matrix(~ x1 + x2) y <- rnorm(100, mean = X %*% beta, sd = 1) data <- data.frame(y, x1, x2) Z0 <- mc_id(data) fit0 <- mcglm( linear_pred = c(y ~ 1), matrix_pred = list(Z0), data = data ) mc_sic(fit0, scope = c("x1", "x2"), data = data, response = 1)set.seed(123) x1 <- runif(100, -1, 1) x2 <- gl(2, 50) beta <- c(5, 0, 3) X <- model.matrix(~ x1 + x2) y <- rnorm(100, mean = X %*% beta, sd = 1) data <- data.frame(y, x1, x2) Z0 <- mc_id(data) fit0 <- mcglm( linear_pred = c(y ~ 1), matrix_pred = list(Z0), data = data ) mc_sic(fit0, scope = c("x1", "x2"), data = data, response = 1)
Computes the Score Information Criterion (SIC) for covariance
components of a fitted mcglm object. The SIC-covariance is used
to select components of the matrix linear predictor and can be
employed in stepwise selection procedures.
mc_sic_covariance(object, scope, idx, data, penalty = 2, response, weights)mc_sic_covariance(object, scope, idx, data, penalty = 2, response, weights)
object |
An object of class |
scope |
A list of matrices to be tested for inclusion in the matrix linear predictor. |
idx |
An integer vector indicating which matrices in |
data |
A data frame containing all variables involved in the model. |
penalty |
A numeric penalty term applied to the SIC (default is 2). |
response |
An integer indicating the response variable for which the SIC-covariance is computed. |
weights |
An optional numeric vector of weights used in model fitting. If not provided, unit weights are assumed. |
The SIC-covariance is computed using the Pearson estimating function.
For each group of matrices defined by idx, a score-based test
statistic is calculated to assess the contribution of the associated
covariance components, penalized by model complexity.
A data frame with the following columns:
Score Information Criterion value.
Degrees of freedom associated with the test.
Total number of covariance parameters in the extended model.
Score-based test statistic.
Reference chi-squared quantile with 95% confidence level.
Bonat, W. H., et al. (2016). Modelling the covariance structure in marginal multivariate count models: Hunting in Bioko Island. Journal of Agricultural, Biological and Environmental Statistics, 22(4), 446–464.
Bonat, W. H. (2018). Multiple Response Variables Regression Models in R: The mcglm Package. Journal of Statistical Software, 84(4), 1–30.
set.seed(123) SUBJECT <- gl(10, 10) y <- rnorm(100) data <- data.frame(y, SUBJECT) Z0 <- mc_id(data) Z1 <- mc_mixed(~ 0 + SUBJECT, data = data) fit0 <- mcglm( linear_pred = c(y ~ 1), matrix_pred = list(Z0), data = data ) mc_sic_covariance( fit0, scope = Z1, idx = 1, data = data, response = 1 )set.seed(123) SUBJECT <- gl(10, 10) y <- rnorm(100) data <- data.frame(y, SUBJECT) Z0 <- mc_id(data) Z1 <- mc_mixed(~ 0 + SUBJECT, data = data) fit0 <- mcglm( linear_pred = c(y ~ 1), matrix_pred = list(Z0), data = data ) mc_sic_covariance( fit0, scope = Z1, idx = 1, data = data, response = 1 )
Constructs the components of the matrix linear predictor for twin data analysis under ACDE-type models. The function generates covariance structures suitable for monozygotic (MZ) and dizygotic (DZ) twins and supports several biologically motivated and flexible model parameterizations.
mc_twin(id, twin.id, type, replicate = NULL, structure, data) mc_twin_bio(id, twin.id, type, replicate = NULL, structure, data) mc_twin_full(id, twin.id, type, replicate, formula, data)mc_twin(id, twin.id, type, replicate = NULL, structure, data) mc_twin_bio(id, twin.id, type, replicate = NULL, structure, data) mc_twin_full(id, twin.id, type, replicate, formula, data)
id |
A string indicating the name of the column in |
twin.id |
A string indicating the name of the column in |
type |
A string indicating the name of the column in |
replicate |
An optional string indicating the name of the column in |
structure |
A string specifying the covariance structure to be constructed.
Available options are |
data |
A data frame containing all variables referenced by the model. |
formula |
Internal argument used to define flexible and unstructured covariance models. Not intended for direct user specification. |
For biologically motivated structures ("ACE", "ADE",
"AE", "CE", "E"), the function builds covariance
matrices based on classical twin modeling assumptions. For flexible
and unstructured options ("full", "flex", "uns"),
the covariance structure is constructed using matrix linear predictors.
A list of sparse matrices of class dgCMatrix, representing the
components of the matrix linear predictor to be used in the
matrix_pred argument of mcglm.
Wagner Hugo Bonat, [email protected]
Bonat, W. H. (2018). Multiple Response Variables Regression Models in R: The mcglm Package. Journal of Statistical Software, 84(4), 1–30.
mc_id, mc_dist, mc_car,
mc_rw, mc_ns, mc_dglm,
mc_mixed.
id <- rep(1:5, each = 4) id.twin <- rep(1:2, 10)id <- rep(1:5, each = 4) id.twin <- rep(1:2, 10)
Computes the variance function and its derivatives with respect to regression, dispersion, and power parameters. This function supports standard power variance functions as well as binomial responses. Intended primarily for internal use in model fitting.
mc_variance_function( mu, power, Ntrial, variance, inverse, derivative_power, derivative_mu ) mc_power(mu, power, inverse, derivative_power, derivative_mu) mc_binomialP(mu, power, inverse, Ntrial, derivative_power, derivative_mu) mc_binomialPQ(mu, power, inverse, Ntrial, derivative_power, derivative_mu)mc_variance_function( mu, power, Ntrial, variance, inverse, derivative_power, derivative_mu ) mc_power(mu, power, inverse, derivative_power, derivative_mu) mc_binomialP(mu, power, inverse, Ntrial, derivative_power, derivative_mu) mc_binomialPQ(mu, power, inverse, Ntrial, derivative_power, derivative_mu)
mu |
Numeric vector of expected values. Typically obtained from
|
power |
Numeric value (for |
Ntrial |
Positive integer or numeric. Number of trials for binomial response variables. |
variance |
Character string specifying the variance function type:
|
inverse |
Logical. If |
derivative_power |
Logical. If |
derivative_mu |
Logical. If |
The function computes the variance function and its derivatives used in
the estimation of generalized linear models for multiple response variables.
For binomial responses, it accounts for the number of trials and supports
both single (binomialP) and double (binomialPQ) power specifications.
A named list containing one or more of the following elements, depending on the combination of logical arguments:
Square root of the variance function.
Inverse square root of the variance function.
Derivative of V_sqrt with respect to the power parameter.
Derivative of V_inv_sqrt with respect to the power parameter.
Derivative of V_sqrt with respect to mu.
Derivative of V_inv_sqrt with respect to mu.
Wagner Hugo Bonat
Bonat, W. H. and Jorgensen, B. (2016) Multivariate covariance generalized linear models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 65:649–675.
x1 <- seq(-1, 1, length.out = 5) X <- model.matrix(~x1) mu <- mc_link_function(beta = c(1, 0.5), X = X, offset = NULL, link = "logit") mc_variance_function(mu = mu$mu, power = c(2, 1), Ntrial = 1, variance = "binomialPQ", inverse = FALSE, derivative_power = TRUE, derivative_mu = TRUE)x1 <- seq(-1, 1, length.out = 5) X <- model.matrix(~x1) mu <- mc_link_function(beta = c(1, 0.5), X = X, offset = NULL, link = "logit") mc_variance_function(mu = mu$mu, power = c(2, 1), Ntrial = 1, variance = "binomialPQ", inverse = FALSE, derivative_power = TRUE, derivative_mu = TRUE)
Fits multivariate covariance generalized linear models. Models are specified through lists defining the linear predictors and matrix linear predictors. The user can choose among different link, variance, and covariance functions. Model fitting is based on an estimating function approach, combining quasi-score functions for regression parameters and Pearson estimating functions for covariance parameters.
mcglm(linear_pred, matrix_pred, link, variance, covariance, offset, Ntrial, power_fixed, data, control_initial, contrasts, weights, control_algorithm)mcglm(linear_pred, matrix_pred, link, variance, covariance, offset, Ntrial, power_fixed, data, control_initial, contrasts, weights, control_algorithm)
linear_pred |
A list of model formulas, one for each response.
See |
matrix_pred |
A list of known matrices defining the matrix linear
predictor for the covariance structure. See
|
link |
A list of link function names, one for each response.
Possible values are |
variance |
A list of variance function names.
Possible values are |
covariance |
A list of covariance link function names.
Possible values are |
offset |
A list of numeric vectors specifying offsets for each
response. Use |
Ntrial |
A list of numeric vectors specifying the number of trials
for binomial responses. Only used for |
power_fixed |
A list of logical values indicating whether the power
parameter should be fixed ( |
data |
a data frame. |
control_initial |
A list of initial values for the fitting algorithm.
If set to |
contrasts |
An optional list of contrasts passed to
|
weights |
A list of numeric vectors of observation weights.
Each element must have length equal to the number of observations.
Missing observations should be coded as |
control_algorithm |
A list of control parameters passed to the
fitting algorithm. See |
An object of class "mcglm" representing a fitted multivariate
covariance generalized linear model.
The returned object is a list produced by the fitting routine
fit_mcglm, augmented with additional components used by
post–estimation methods. The main components include:
A list of character vectors giving the names of the regression coefficients for each response variable.
A list of logical values indicating whether the power parameters were fixed or estimated.
A list of initial values used in the fitting algorithm.
An integer giving the number of observations.
A list of link functions used in the model.
A list of variance functions used in the model.
A list of covariance link functions used in the model.
A list of formulas defining the linear predictors.
A list of matrices defining the matrix linear predictors.
A list of design matrices corresponding to the linear predictors.
A matrix of observed response values, with rows corresponding to observations and columns to response variables.
A list containing the number of trials for each response variable, when applicable.
A list of offset vectors used in the model.
A list of logical values indicating whether each matrix linear predictor is treated as sparse.
A numeric vector of weights used in the fitting process.
The data frame used to fit the model.
A list of control parameters used by the fitting algorithm.
Additional components may be present for internal use by methods such as
print, summary and predict.
Wagner Hugo Bonat, [email protected]
Bonat, W. H. and Jorgensen, B. (2016) Multivariate covariance generalized linear models. Journal of Royal Statistical Society - Series C 65:649–675.
Bonat, W. H. (2018). Multiple Response Variables Regression Models in R: The mcglm Package. Journal of Statistical Software, 84(4):1–30.
fit_mcglm, mc_link_function and
mc_variance_function.
The NewBorn dataset is from a prospective study assessing the effect of respiratory physiotherapy on cardiopulmonary function in ventilated preterm newborn infants with birth weight less than 1500 g. The dataset was collected by the nursing team of Waldemar Monastier Hospital, Campo Largo, PR, Brazil, and analyzed in Bonat and Jorgensen (2016) as an example of mixed outcomes regression models.
data(NewBorn)data(NewBorn)
A data.frame with 270 observations and 21 variables:
SexFactor with levels Female and Male.
GAGestational age in weeks.
BWBirth weight in grams.
APGAR1MAPGAR index at the first minute of life.
APGAR5MAPGAR index at the fifth minute of life.
PREFactor indicating prematurity (YES/NO).
HDFactor indicating Hansen's disease (YES/NO).
SURFactor indicating surfactant administration (YES/NO).
JAUFactor indicating jaundice (YES/NO).
PNEFactor indicating pneumonia (YES/NO).
PDAFactor indicating persistence of ductus arteriosus (YES/NO).
PPIFactor indicating primary pulmonary infection (YES/NO).
OTHERSFactor indicating other diseases (YES/NO).
DAYSAge in days.
AUXFactor indicating type of respiratory auxiliary (HOOD/OTHERS).
RRRespiratory rate (continuous).
HRHeart rate (continuous).
SPO2Oxygen saturation (bounded).
TREATFactor with three levels: Respiratory physiotherapy, Evaluation 1, Evaluation 2, Evaluation 3.
NBINewborn index.
TIMEDays of treatment.
Bonat, W. H. and Jorgensen, B. (2016). "Multivariate covariance generalized linear models." Journal of Royal Statistical Society, Series C, 65:649–675.
## Not run: library(mcglm) library(Matrix) data(NewBorn, package = "mcglm") # Linear predictor example formu <- SPO2 ~ Sex + APGAR1M + APGAR5M + PRE + HD + SUR Z0 <- mc_id(NewBorn) fit <- mcglm( linear_pred = c(formu), matrix_pred = list(Z0), link = "logit", variance = "binomialP", power_fixed = TRUE, data = NewBorn, control_algorithm = list(verbose = FALSE, tuning = 0.5) ) summary(fit) ## End(Not run)## Not run: library(mcglm) library(Matrix) data(NewBorn, package = "mcglm") # Linear predictor example formu <- SPO2 ~ Sex + APGAR1M + APGAR5M + PRE + HD + SUR Z0 <- mc_id(NewBorn) fit <- mcglm( linear_pred = c(formu), matrix_pred = list(Z0), link = "logit", variance = "binomialP", power_fixed = TRUE, data = NewBorn, control_algorithm = list(verbose = FALSE, tuning = 0.5) ) summary(fit) ## End(Not run)
Computes the pseudo Akaike information criterion (pAIC) for fitted multivariate covariance generalized linear models. The pAIC is defined as
where is the pseudo log-likelihood and
denotes the effective number of parameters in the model.
This criterion is intended for model comparison within the class of
mcglm models fitted to the same data.
pAIC(object, verbose = TRUE)pAIC(object, verbose = TRUE)
object |
An object of class |
verbose |
Logical indicating whether the pAIC value should be
printed to the console. Defaults to |
The pAIC is based on the pseudo log-likelihood returned by
plogLik and should be used with caution, as it does not
correspond to a true likelihood-based information criterion.
Comparisons are meaningful only for models fitted to the same response
data.
An (invisible) named list with a single element:
A numeric value giving the pseudo Akaike information criterion associated with the fitted model(s).
Wagner Hugo Bonat, [email protected]
Bonat, W. H. (2018). Multiple Response Variables Regression Models in R: The mcglm Package. Journal of Statistical Software, 84(4), 1–30.
gof, plogLik, ESS, pKLIC,
GOSHO, RJC
Computes the pseudo Bayesian information criterion (pBIC) for fitted multivariate covariance generalized linear models. The pBIC is defined as
where is the pseudo log-likelihood, is the
effective number of parameters, and is the total number of
observed responses.
This criterion provides a more strongly penalized alternative to
pAIC, favoring more parsimonious models when comparing
mcglm fits to the same data.
pBIC(object, verbose = TRUE)pBIC(object, verbose = TRUE)
object |
An object of class |
verbose |
Logical indicating whether the pBIC value should be
printed to the console. Defaults to |
The sample size used in the penalty term corresponds to the
total number of observed responses, obtained from the
observed component of the fitted mcglm object(s).
As the pBIC is based on a pseudo log-likelihood, it should be used
cautiously and only for relative comparisons among models fitted to
the same data set.
An (invisible) named list with a single element:
A numeric value giving the pseudo Bayesian information criterion associated with the fitted model(s).
Wagner Hugo Bonat, [email protected]
Bonat, W. H. (2018). Multiple Response Variables Regression Models in R: The mcglm Package. Journal of Statistical Software, 84(4), 1–30.
gof, plogLik, ESS, pAIC,
pKLIC, GOSHO, RJC
Extract the pseudo Kullback-Leibler information criterion
(pKLIC) for objects of mcglm class.
pKLIC(object, verbose = TRUE)pKLIC(object, verbose = TRUE)
object |
an object or a list of objects representing a model
of |
verbose |
logical. Print or not the pKLIC value. |
An invisible list with a single numeric component:
The pseudo Kullback–Leibler Information Criterion computed from the pseudo log-likelihood and a penalty term based on the sensitivity and variability matrices.
Wagner Hugo Bonat, [email protected]
Bonat, W. H. (2018). Multiple Response Variables Regression Models in R: The mcglm Package. Journal of Statistical Software, 84(4):1–30.
gof, plogLik, ESS, pAIC,
GOSHO and RJC.
Computes the Gaussian pseudo-loglikelihood for fitted multivariate
covariance generalized linear models. The pseudo-loglikelihood is
obtained by assuming a multivariate normal distribution for the
stacked response vector, using the estimated mean vector and
covariance matrix from the fitted mcglm object.
plogLik(object, verbose = TRUE)plogLik(object, verbose = TRUE)
object |
An object of class |
verbose |
Logical indicating whether the pseudo-loglikelihood value
should be printed to the console. Defaults to |
The Gaussian pseudo-loglikelihood is computed as
where is the stacked vector of observed responses, is
the stacked vector of fitted means, and is the estimated
covariance matrix. For a list of mcglm objects, block-diagonal
covariance matrices are constructed internally.
This quantity is used mainly for model comparison purposes and as a
building block for pseudo-information criteria such as pAIC
and pBIC. It is not a true likelihood unless the Gaussian
assumption holds.
An invisible list with the following components:
A numeric value giving the Gaussian pseudo-loglikelihood.
An integer giving the total number of estimated parameters (degrees of freedom) used in the model.
Wagner Hugo Bonat, [email protected]
pAIC, pBIC, ESS, pKLIC, GOSHO,
RJC
Produces diagnostic plots for fitted mcglm objects.
Available diagnostics include residual analysis, inspection of
the fitting algorithm convergence and partial residual plots.
## S3 method for class 'mcglm' plot(x, type = c("residuals", "algorithm", "partial_residuals"), ...)## S3 method for class 'mcglm' plot(x, type = c("residuals", "algorithm", "partial_residuals"), ...)
x |
A fitted object of class |
type |
Character string specifying the type of diagnostic plot to be produced.
Possible values are |
... |
Additional arguments. Currently ignored and included for compatibility
with the generic |
No return value, called for its side effects (diagnostic plots).
Wagner Hugo Bonat, [email protected]
Prints a concise summary of a fitted mcglm object, including
the model call, link and variance functions, regression coefficients
and dispersion parameters for each response variable.
## S3 method for class 'mcglm' print(x, ...)## S3 method for class 'mcglm' print(x, ...)
x |
A fitted object of class |
... |
Further arguments passed to or from other methods. |
No return value, called for its side effects.
Wagner Hugo Bonat, [email protected]
Computes residuals for a fitted mcglm object. Different types
of residuals can be extracted, depending on the specified argument
type.
## S3 method for class 'mcglm' residuals(object, type = c("raw", "pearson", "standardized"), ...)## S3 method for class 'mcglm' residuals(object, type = c("raw", "pearson", "standardized"), ...)
object |
An object of class |
type |
A character string specifying the type of residuals to be returned. Options are:
|
... |
Further arguments passed to or from other methods. Currently ignored. |
A numeric matrix of class Matrix with dimensions
, where is the number of observations and
is the number of response variables.
Wagner Hugo Bonat, [email protected]
Computes the Rotnitzky–Jewell information criterion (RJC) for objects of
class mcglm. This criterion is based on quasi-likelihood theory
and is intended for model assessment in marginal models.
RJC(object, id, verbose = TRUE)RJC(object, id, verbose = TRUE)
object |
An object of class |
id |
An integer or factor vector identifying the clusters. Its length and ordering must match the number and ordering of the observations used to fit the model. |
verbose |
Logical. If |
The RJC is defined using the sensitivity and variability structures of the estimating equations and measures the discrepancy between them. The implementation assumes that the data are correctly ordered such that observations belonging to the same cluster are stored in contiguous rows.
Warning: This function is restricted to models with a single response variable.
An invisible list with a single component:
A numeric scalar giving the value of the Rotnitzky–Jewell information criterion.
Wagner Hugo Bonat, [email protected]
Wang, M. (2014). Generalized estimating equations in longitudinal data analysis: A review and recent developments. Advances in Statistics, 1(1), 1–13.
gof, plogLik, pAIC, pKLIC,
ESS, GOSHO
Soil chemistry properties measured on a regular grid of 10 × 25 points, spaced by 5 meters. Each record contains the chemical composition and coordinates of the soil sample.
data(soil)data(soil)
A data.frame with 250 observations and 9 variables:
COORD.XX coordinate of the sampling point.
COORD.YY coordinate of the sampling point.
SANDProportion of sand in the soil sample.
SILTProportion of silt in the soil sample.
CLAYProportion of clay in the soil sample.
PHWATERSoil pH measured in water.
CACalcium content of the soil.
MGMagnesium content of the soil.
KPotassium content of the soil.
Bonat, W. H. (2018). "Multiple Response Variables Regression Models in R: The mcglm Package." Journal of Statistical Software, 84(4):1–30.
## Not run: library(mcglm) data(soil, package = "mcglm") # Spatial model (tri2nb could be used but takes long) Z1 <- mc_id(soil) # Linear predictor example form.ca <- CA ~ COORD.X*COORD.Y + SAND + SILT + CLAY + PHWATER fit.ca <- mcglm( linear_pred = c(form.ca), matrix_pred = list(Z1), link = "log", variance = "tweedie", covariance = "inverse", power_fixed = TRUE, data = soil, control_algorithm = list( max_iter = 1000, tuning = 0.1, verbose = FALSE, tol = 1e-03 ) ) ## End(Not run)## Not run: library(mcglm) data(soil, package = "mcglm") # Spatial model (tri2nb could be used but takes long) Z1 <- mc_id(soil) # Linear predictor example form.ca <- CA ~ COORD.X*COORD.Y + SAND + SILT + CLAY + PHWATER fit.ca <- mcglm( linear_pred = c(form.ca), matrix_pred = list(Z1), link = "log", variance = "tweedie", covariance = "inverse", power_fixed = TRUE, data = soil, control_algorithm = list( max_iter = 1000, tuning = 0.1, verbose = FALSE, tol = 1e-03 ) ) ## End(Not run)
Dataset from an experiment conducted in a vegetation house with soybeans.
Each plot contained two plants and the experiment involved three levels of
soil water (water) and five levels of potassium fertilization (pot),
arranged in five blocks (block). Three response variables are recorded:
grain yield, number of seeds, and number of viable peas per plant. The dataset
contains 75 observations and 7 variables.
data(soya)data(soya)
A data.frame with 75 observations and 7 variables:
potFactor with five levels of potassium fertilization.
waterFactor with three levels of amount of water in the soil.
blockFactor with five levels representing experimental blocks.
grainContinuous variable representing grain yield per plant.
seedsCount variable representing number of seeds per plant.
viablepeasBinomial variable representing number of viable peas per plant.
totalpeasBinomial variable representing total number of peas per plant.
Bonat, W. H. (2018). Multiple Response Variables Regression Models in R: The mcglm Package. Journal of Statistical Software, 84(4):1–30.
## Not run: library(mcglm) library(Matrix) data(soya, package = "mcglm") # Linear predictor example formu <- grain ~ block + factor(water) * factor(pot) Z0 <- mc_id(soya) fit <- mcglm(linear_pred = c(formu), matrix_pred = list(Z0), data = soya) anova(fit) ## End(Not run)## Not run: library(mcglm) library(Matrix) data(soya, package = "mcglm") # Linear predictor example formu <- grain ~ block + factor(water) * factor(pot) Z0 <- mc_id(soya) fit <- mcglm(linear_pred = c(formu), matrix_pred = list(Z0), data = soya) anova(fit) ## End(Not run)
Produces a summary of a fitted mcglm object, including estimates,
standard errors, Wald statistics and p-values for regression,
dispersion, power and correlation parameters.
## S3 method for class 'mcglm' summary( object, verbose = TRUE, print = c("Regression", "power", "Dispersion", "Correlation"), ... )## S3 method for class 'mcglm' summary( object, verbose = TRUE, print = c("Regression", "power", "Dispersion", "Correlation"), ... )
object |
An object of class |
verbose |
Logical indicating whether the summary should be printed to the console.
If |
print |
A character vector specifying which components of the model summary
should be printed. Possible values are |
... |
Further arguments passed to or from other methods. Currently ignored. |
Invisibly returns a list containing summary tables for each response variable, and optionally a correlation summary when applicable.
Wagner Hugo Bonat, [email protected]
Extracts the variance-covariance matrix of the estimated parameters
from a fitted mcglm object.
## S3 method for class 'mcglm' vcov(object, ...)## S3 method for class 'mcglm' vcov(object, ...)
object |
An object of class |
... |
Further arguments passed to or from other methods. Currently ignored. |
A numeric matrix representing the variance-covariance matrix of all estimated model parameters. Row and column names correspond to the parameter identifiers.
Wagner Hugo Bonat, [email protected]