A Data Scientist's blog: R: Generalized Extreme Value Distribution Family Function

gev {VGAM}

R Documentation

Generalized Extreme Value Distribution Family Function

Description

Maximum likelihood estimation of the 3-parameter generalized extreme value (GEV) distribution.

Usage

gev(llocation = "identity", lscale = "loge", lshape = "logoff",
    elocation = list(), escale = list(),
    eshape = if(lshape=="logoff") list(offset=0.5) else
    if(lshape=="elogit") list(min=-0.5, max=0.5) else list(),
    percentiles = c(95, 99),
    iscale=NULL, ishape = NULL,
    method.init = 1, gshape=c(-0.45, 0.45), tshape0=0.001, zero = 3)
egev(llocation = "identity", lscale = "loge", lshape = "logoff",
     elocation = list(), escale = list(),
     eshape = if(lshape=="logoff") list(offset=0.5) else
     if(lshape=="elogit") list(min=-0.5, max=0.5) else list(),
     percentiles = c(95, 99),
     iscale=NULL,  ishape = NULL,
     method.init=1, gshape=c(-0.45, 0.45), tshape0=0.001, zero = 3)

Arguments

`llocation, lscale, lshape`	Parameter link functions for mu, sigma and xi respectively. See `Links` for more choices.
`elocation, escale, eshape`	List. Extra argument for the respective links. See `earg` in `Links` for general information. For the shape parameter, if the `logoff` link is chosen then the offset is called A below; and then the linear/additive predictor is log(xi+A) which means that xi > -A. For technical reasons (see Details) it is a good idea for A=0.5.
`percentiles`	Numeric vector of percentiles used for the fitted values. Values should be between 0 and 100. However, if `percentiles=NULL`, then the mean mu + sigma (gamma(1-xi)-1)/xi* is returned, and this is only defined if xi<1.
`iscale, ishape`	Numeric. Initial value for sigma and xi. A `NULL` means a value is computed internally. The argument `ishape` is more important than the other two because they are initialized from the initial xi. If a failure to converge occurs, or even to obtain initial values occurs, try assigning `ishape` some value (positive or negative; the sign can be very important). Also, in general, a larger value of `iscale` is better than a smaller value.
`method.init`	Initialization method. Either the value 1 or 2. Method 1 involves choosing the best xi on a course grid with endpoints `gshape`. Method 2 is similar to the method of moments. If both methods fail try using `ishape`.
`gshape`	Numeric, of length 2. Range of xi used for a grid search for a good initial value for xi. Used only if `method.init` equals 1.
`tshape0`	Positive numeric. Threshold/tolerance value for resting whether xi is zero. If the absolute value of the estimate of xi is less than this value then it will be assumed zero and Gumbel derivatives etc. will be used.
`zero`	An integer-valued vector specifying which linear/additive predictors are modelled as intercepts only. The values must be from the set {1,2,3} corresponding respectively to mu, sigma, xi. If `zero=NULL` then all linear/additive predictors are modelled as a linear combination of the explanatory variables. For many data sets having `zero=3` is a good idea.

Details

The GEV distribution function can be written

G(y) = exp( -[ (y- mu)/ sigma ]_{+}^{- 1/ xi})

where sigma > 0, -Inf < mu < Inf, and 1 + xi*(y-mu)/sigma > 0. Here, x_+ = max(x,0). The mu, sigma, xi are known as the location, scale and shape parameters respectively. The cases xi>0, xi<0, xi=0 correspond to the Frechet, Weibull, and Gumbel types respectively. It can be noted that the Gumbel (or Type I) distribution accommodates many commonly-used distributions such as the normal, lognormal, logistic, gamma, exponential and Weibull.
For the GEV distribution, the kth moment about the mean exists if xi < 1/k. Provided they exist, the mean and variance are given by mu + sigma { Gamma(1-xi)-1} / xi and sigma^2 { Gamma(1-2 xi) - Gamma^2 (1- xi) } / xi^2 respectively, where Gamma is the gamma function.
Smith (1985) established that when xi > -0.5, the maximum likelihood estimators are completely regular. To have some control over the estimated xi try using lshape="logoff" and the eshape=list(offset=0.5), say, or lshape="elogit" and eshape=list(min=-0.5, max=0.5), say.

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm, and vgam.

Warning

Currently, if an estimate of xi is too close to zero then an error will occur for gev() with multivariate responses. In general, egev() is more reliable than gev().
Fitting the GEV by maximum likelihood estimation can be numerically fraught. If 1 + xi*(y-mu)/sigma <= 0 then some crude evasive action is taken but the estimation process can still fail. This is particularly the case if vgam with s is used; then smoothing is best done with vglm with regression splines (bs or ns) because vglm implements half-stepsizing whereas vgam doesn't (half-stepsizing helps handle the problem of straying outside the parameter space.)

Note

The VGAM family function gev can handle a multivariate (matrix) response. If so, each row of the matrix is sorted into descending order and NAs are put last. With a vector or one-column matrix response using egev will give the same result but be faster and it handles the xi=0 case. The function gev implements Tawn (1988) while egev implements Prescott and Walden (1980).
The shape parameter xi is difficult to estimate accurately unless there is a lot of data. Convergence is slow when xi is near -0.5. Given many explanatory variables, it is often a good idea to make sure zero=3. The range restrictions of the parameter xi are not enforced; thus it is possible for a violation to occur.
Successful convergence often depends on having a reasonably good initial value for xi. If failure occurs try various values for the argument ishape, and if there are covariates, having zero=3 is advised.

Author(s)

T. W. Yee

References

Yee, T. W. and Stephenson, A. G. (2007) Vector generalized linear and additive extreme value models. Extremes, 10, 1–19.
Tawn, J. A. (1988) An extreme-value theory model for dependent observations. Journal of Hydrology, 101, 227–250.
Prescott, P. and Walden, A. T. (1980) Maximum likelihood estimation of the parameters of the generalized extreme-value distribution. Biometrika, 67, 723–724.
Smith, R. L. (1985) Maximum likelihood estimation in a class of nonregular cases. Biometrika, 72, 67–90.

Examples

# Multivariate example
data(venice)
y = as.matrix(venice[,paste("r", 1:10, sep="")])
fit1 = vgam(y[,1:2] ~ s(year, df=3), gev(zero=2:3), venice, trace=TRUE)
coef(fit1, matrix=TRUE)
fitted(fit1)[1:4,]
## Not run: 
par(mfrow=c(1,2), las=1)
plot(fit1, se=TRUE, lcol="blue", scol="forestgreen",
     main="Fitted mu(year) function (centered)")
attach(venice)
matplot(year, y[,1:2], ylab="Sea level (cm)", col=1:2,
        main="Highest 2 annual sealevels and fitted 95 percentile")
lines(year, fitted(fit1)[,1], lty="dashed", col="blue")
detach(venice)
## End(Not run)

# Univariate example
data(oxtemp)
(fit = vglm(maxtemp ~ 1, egev, data=oxtemp, trace=TRUE))
fitted(fit)[1:3,]
coef(fit, mat=TRUE)
Coef(fit)
vcov(fit)
vcov(fit, untransform=TRUE)
sqrt(diag(vcov(fit)))   # Approximate standard errors
## Not run:  rlplot(fit)

[Package VGAM version 0.7-4 Index]

A Data Scientist's blog

Search This Blog

Sunday, April 01, 2012

R: Generalized Extreme Value Distribution Family Function