A Data Scientist's blog: EM algorithm for incomplete categorical data

Sunday, February 12, 2012

EM algorithm for incomplete categorical data

em.cat {cat}

R Documentation

EM algorithm for incomplete categorical data

Description

Finds ML estimate or posterior mode of cell probabilities under the saturated multinomial model.

Usage

em.cat(s, start, prior=1, showits=T, maxits=1000,
eps=0.0001)

Arguments

`s`	summary list of an incomplete categorical dataset produced by the function `prelim.cat`.
`start`	optional starting value of the parameter. This is an array with dimensions `s$d` whose elements sum to one. The default starting value is a uniform array (equal probabilities in all cells). If structural zeros appear in the table, `start` should contain zeros in those positions and nonzero (e.g. uniform) values elsewhere.
`prior`	optional vector of hyperparameters for a Dirichlet prior distribution. The default is a uniform prior distribution (all hyperparameters = 1) on the cell probabilities, which will result in maximum likelihood estimation. If structural zeros appear in the table, a prior should be supplied with `NA`s in those cells.
`showits`	if `TRUE`, reports the iterations of EM so the user can monitor the progress of the algorithm.
`maxits`	maximum number of iterations performed. The algorithm will stop if the parameter still has not converged after this many iterations.
`eps`	convergence criterion. This is the largest proportional change in an expected cell count from one iteration to the next. Any expected cell count that drops below 1E-07 times the average cell probability (1/number of non-structural zero cells) is set to zero during the iterations.

Value

array of dimension s$d containing the ML estimate or posterior mode, assuming that EM has converged by maxits iterations.

Note

If zero cell counts occur in the observed-data table, the maximum likelihood estimate may not be unique, and the algorithm may converge to different stationary values depending on the starting value. Also, if zero cell counts occur in the observed-data table, the ML estimate may lie on the boundary of the parameter space. Supplying a prior with hyperparameters greater than one will give a unique posterior mode in the interior of the parameter space. Estimated probabilities for structural zero cells will always be zero.

References

Schafer (1996) Analysis of Incomplete Multivariate Data. Chapman & Hall, Section 7.3.

Examples

data(crimes)
crimes
s <- prelim.cat(crimes[,1:2],crimes[,3])     # preliminary manipulations
thetahat <- em.cat(s)                        # mle under saturated model
logpost.cat(s,thetahat)                      # loglikelihood at thetahat

[Package Contents]

A Data Scientist's blog

Search This Blog

Sunday, February 12, 2012

EM algorithm for incomplete categorical data

EM algorithm for incomplete categorical data

Description

Usage

Arguments

Value

Note

References

See Also

Examples

No comments:

Post a Comment