Search This Blog

Tuesday, March 20, 2012

R: Multivariate independence test based on the empirical copula process


empcop.test {copula}R Documentation

Multivariate independence test based on the empirical copula process

Description

Multivariate independence test based on the empirical copula process as proposed by Christian Genest and Bruno R�millard. The test can be seen as composed of three steps: (i) a simulation step, which consists in simulating the distribution of the test statistics under independence for the sample size under consideration; (ii) the test itself, which consists in computing the approximate p-values of the test statistics with respect to the empirical distributions obtained in step (i); and (iii) the display of a graphic, called a dependogram, enabling to understand the type of departure from independence, if any. More details can be found in the articles cited in the reference section.

Usage

empcop.simulate(n, p, m, N = 2000)
empcop.test(x, d, m, alpha = 0.05)
dependogram(test, pvalues = FALSE)

Arguments

n Sample size when simulating the distribution of the test statistics under independence.
p Dimension of the data when simulating the distribution of the test statistics under independence.
m Maximum cardinality of the subsets for which a test statistic is to be computed. It makes sense to consider m << p especially when p is large. In this case, the null hypothesis corresponds to the independence of all sub-random vectors composed of at most m variables.
N Number of repetitions when simulating under independence.
x Data frame or data matrix containing realizations (one per line) of the random vector whose independence is to be tested.
d Object of class empcop.distribution as returned by the function empcop.simulate. It can be regarded as the empirical distribution of the test statistics under independence.
alpha Significance level used in the computation of the critical values for the test statistics.
test Object of class empcop.test as return by he function empcop.test.
pvalues Logical indicating whether the dependogram should be drawn from test statistics or the corresponding p-values.

Details

See the references below for more details, especially the third one.

Value

The function empcop.simulate returns an object of class empcop.distribution whose attributes are: sample.size, data.dimension, max.card.subsets, number.repetitons, subsets (list of the subsets for which test statistics have been computed), subsets.binary (subsets in binary 'integer' notation) and dist.statistics.independence (a N line matrix containing the values of the test statistics for each subset and each repetition).
The function empcop.test returns an object of class empcop.test whose attributes are: subsets, statistics, critical.values, pvalues, fisher.pvalue (a p-value resulting from a combination � la Fisher of the previous p-values), tippett.pvalue (a p-value resulting from a combination � la Tippett of the previous p-values), alpha (global significance level of the test) and beta (1 - beta is the significance level per statistic).

Author(s)

Ivan Kojadinovic, ivan.kojadinovic@univ-nantes.fr

References

P. Deheuvels (1979), La fonction de d�pendance empirique et ses propri�t�s: un test non param�trique d'ind�pendance, Acad. Roy. Belg. Bull. Cl. Sci. 5th Ser. 65, 274-292.
P. Deheuvels (1981), A non parametric test for independence, Publ. Inst. Statist. Univ. Paris 26, 29-50.
C. Genest and B. R�millard (2004). Tests of independence and randomness based on the empirical copula process. Test, 13, 335-369.
C. Genest, J.-F. Quessy and B. R�millard (2006). Local efficiency of a Cramer-von Mises test of independence. Journal of Multivariate Analysis, 97, 274-294.
C. Genest, J.-F. Quessy and B. R�millard (2007). Asymptotic local efficiency of Cramer-von Mises tests for multivariate independence. The Annals of Statistics, 35, in press.

Examples

## Consider the following example taken from
## Genest and Remillard (2004), p 352:

x <- matrix(rnorm(500),100,5)
x[,1] <- abs(x[,1]) * sign(x[,2] * x[,3])
x[,5] <- x[,4]/2 + sqrt(3) * x[,5]/2

## In order to test for independence "within" x, the first step consists
## in simulating the distribution of the test statistics under
## independence for the same sample size and dimension,
## i.e. n=100 and p=5. As we are going to consider all the subsets of
## {1,...,5} whose cardinality is between 2 and 5, we set m=5.
## This may take a while...

d <- empcop.simulate(100,5,5)

## The next step consists in performing the test itself:

test <- empcop.test(x,d,5)

## Let us see the results:

test

## Display the dependogram:

dependogram(test)

## We could have tested for a weaker form of independence, for instance,
## by only computing statistics for subsets whose cardinality is between 2
## and 3:

test <- empcop.test(x,d,3)
test
dependogram(test)

## Consider now the following data:
y <- matrix(runif(500),100,5)
## and perform the test:
test <- empcop.test(y,d,3)
test
dependogram(test)

## NB: In order to save d for future use, the save function can be used.

[Package copula version 0.5-3 Index]

R: Bivariate Empirical Copulae


EmpiricalCopulae {fCopulae}R Documentation

Bivariate Empirical Copulae

Description

A collection and description of functions to investigate bivariate empirical copulae.

Empirical Copulae Functions:
pempiricalCopula computes empirical copula probability,
dempiricalCopula computes empirical copula density.

Usage

pempiricalCopula(u, v, N = 10)
dempiricalCopula(u, v, N = 10) 

Arguments

N [empiricalCopula] -
... .
u, v [*evCopula][*archmCopula] -
two numeric values or vectors of the same length at which the copula will be computed. If u is a list then the the $x and $y elements will be used as u and v. If u is a two column matrix then the first column will be used as u and the the second as v.

Value

Th functions *Spec return an S4 object of class "fCOPULA". The object contains the following slots:
@call the function call.
@copula the name of the copula.
@param a list whose elements specify the model parameters.
@title a character string with the name of the copula. This can be overwritten specifying a user defined input argument.
@description a character string with an optional user defined description. By default just the current date when the test was applied will be returned.
The function pcopula returns a numeric matrix of probabilities computed at grid positions x|y.

The function parchmCopula returns a numeric matrix with values computed for the Archemedean copula.

The function darchmCopula returns a numeric matrix with values computed for thedensity of the Archemedean copula.

The functions Phi* return a numeric vector with the values computed from the Archemedean generator, its derivatives, or its inverse.

The functions cK and cKInv return a numeric vector with the values of the density and inverse for Archimedian copulae.

Author(s)

Diethelm Wuertz for the Rmetrics R-port.

[Package fCopulae version 2110.78 Index]

R: Empirical Copulas and Hedge Basis Risk

Empirical Copulas and Hedge Basis Risk

October 10, 2011
The recently introduced proxy hedge model and corresponding empirical proxy quantiles share an implicit dependence on the joint covariation between underlying and proxy hedge. Of particular interest is understanding the dynamics of basis risk under extreme scenarios (both up and down), which are driven by time-varying stochastic joint covariation.
This post quantifies and visualizes such joint covariation and basis risk via copulas, including modeling and empirically fitting both marginal and joint distributions using fat-tailed student-t distributions. Copulas exploit multidimensional sample ranking, and thus are thematically similar to empirical quantiles. This analysis also seeks to exemplify practical use of R for copula analysis.

Brief review of the shortcomings of classic dependence statistics (such as correlation and covariance) motivates use of copulas and related techniques:
  • Normality: assumption of joint normality in one guise or another, irrespective of suitability
  • Summary statistics: point estimates which reduce complex covariation relationships into single numbers
  • Marginal-joint conflation: considerations of both marginal and joint are conflated, rather than providing clean and independent separation
  • Poor visualization: few visualization techniques exist, reducing potential for geometric and topological intuition
These shortcomings were resolved by the following beautiful decomposition, due to Sklar (1959):
   H(x, y) = C( F(x), G(y) )
where F and G are univariate distributions, H is a two-dimensional distribution function with marginal distributions F and G, and C is a copula. Note that any or all of F, G, and C may be fitted empirically. Trivedi and Zimmer (2005) or Cherubini et al. (2004) are suggested for readers unfamiliar with copulas.
Conceptually, this technique provides several useful benefits for analyzing proxy hedging basis risk:
  • Mechanics: Any joint distribution can be bi-directionally “glued together” by two marginals and a copula
  • Uniqueness: Copula is unique, under conditions reasonable for current purposes
  • Completeness: Joint covariation can be fully characterized by a copula, independent from the marginals
  • Visualization: Copulas can graphically visualized, in both contours and density plots
Without further ado, the following plots visualize the daily joint covariation of well-known tech stock and QQQ linear returns over the longitudinal period via an empirical proxy copula (1254 daily observations), as introduced in Empirical Quantiles and Proxy Selection. Note these plots illustrate joint covariation independent from the marginal densities of CRM / QQQ:
Proxy Copula Summary
The top left plot illustrates scatter of ranked pseudo observations; top right illustrates scatter of 1000 random samples from the fitted copula; bottom left illustrates empirical copula contour; and bottom right illustrates the empirical copula perspective. Compare this diversity of visualization versus a single number (e.g. correlation statistic, which happens to equal 0.777).
Diving into the marginal and copula distribution is necessary to understand this relationship further. Consistent with standard convention, all distributions are assumed to be student-t with empirically fitted degrees of freedom. The parameters of the marginals are:
CRM location 0.0003 scale 0.0218 df 3.489
QQQ location 0.001 scale 0.0100 df 2.767

Indicating the marginal distributions diverge strongly from normality with fat trails, due to small degrees of freedom. This matches the 3 df estimate by Schoeffel in his recent (2011) article on futures (note difference in frequency and log returns).
Similarly, the copula is assumed to be distributed student-t with estimated df of 3.975 and \rho of 0.6868. The bivariate association measures for the empirical proxy copula are:
tau 0.481
rho 0.669
tail index: 0.381 0.381

Indicating the copula also strongly diverges from normality with strongly fat tails.
In summary: these plots and fitted distributions confirm observed conclusions from the previous post: although CRM and QQQ covary, there is high basis risk—including numerous observations with nearly inverse correlation. In other words, a QQQ proxy is likely to result in fairly costly hedging errors.

R code to generate the above empirical proxy copula analysis (and more, possibly to be covered in a subsequent post):
001require("copula")
002require("fSeries")
003 
004exploreProxyDist <- function(p, doExcess=TRUE, partitions=1)
005{
006  # Analyze distribution and copula of proxy daily returns.
007  #
008  # Args:
009  #   p: matrix of instrument price data, including valid colnames
010  #   doExcess: flag indicating whether to perform analysis on excess returns,
011  #             in addition to raw returns
012  #   partitions: if not 1, partition the returns and perform subanalysis
013  #
014  # Returns: None
015   
016  oldpar <- par(mfrow=c(2,2))
017  n <- nrow(p)
018 
019  # first differences (not logged)
020  pROC <- ROC(p, type="discrete", na.pad=FALSE)
021  exploreProxyDistROC(pROC)
022   
023  if (partitions > 1)
024  {
025    frac <- floor(n / partitions)
026    sapply(c(0:(partitions-1)), function(p) { cat("\n",
027                                                (p+1),"-th partition:",((p*frac)+1),
028                                                ((p+1)*frac),"\n");
029                                                partition <- pROC[((p*frac)+1):((p+1)*frac),]
030                                                exploreProxyDistROC(partition) } )
031  }
032   
033  if (doExcess)
034  {
035    cat("\nExcess Copula\n")
036     
037    # calculate excess returns, subtracting off market
038    excess <- pROC[,1] - pROC[,2]
039    excessROC <- cbind(excess, pROC[,2])
040     
041    par(oldpar)
042    plot(cumprod(1+excess), main="Excess Cumulative Returns", ylab="Return")
043    oldpar <- par(mfrow=c(2,2))
044   
045    exploreProxyDistROC(excessROC)
046     
047    if (partitions > 1)
048    {
049      frac <- floor(n / partitions)
050      sapply(c(0:(partitions-1)), function(p) { cat("\n",
051                                                  (p+1),"-th Excess Partition:",((p*frac)+1),
052                                                  ((p+1)*frac),"\n");
053                                                  partition  <- excessROC[((p*frac)+1):((p+1)*frac),]
054                                                  exploreProxyDistROC(partition) } )
055    }
056  }
057     
058  par(oldpar)
059}
060 
061exploreProxyDistROC <- function(pROC)
062{
063  # Analyze distribution and copula of proxy daily returns.
064  #
065  # Args:
066  #   p: matrix of instrument price data, including valid colnames
067  #
068  # Returns: list of copula fit and empirical copula
069 
070  n <- nrow(pROC)
071  cnames <- colnames(pROC)
072   
073  # t-distribution fits
074  p1Fit <- fitdistr(pROC[,1], "t")$estimate
075  p2Fit <- fitdistr(pROC[,2], "t")$estimate
076   
077  cat(cnames[1], "location", p1Fit[1], "scale", p1Fit[2], "df", p1Fit[3], "\n")
078  cat(cnames[2], "location", p2Fit[1], "scale", p2Fit[2], "df", p2Fit[3], "\n")
079   
080  # empirical copula
081  tau <- cor(pROC, method="kendall")[2]
082  t.cop <- tCopula(tau, dim=2, dispstr="un", df=3)
083  psuedo <- apply(pROC, 2, rank) / (n + 1)
084  plot(psuedo, main="Empirical Scatterplot", xlab=cnames[1], ylab=cnames[2])
085   
086  fit.mpl <- fitCopula(t.cop, psuedo, method="mpl", estimate.variance=FALSE)
087  print(fit.mpl)
088 
089   
090  # build empirical copula and plot
091  empiricalCopula <- tCopula(fit.mpl@estimate[1], dim=2, dispstr="un", df=fit.mpl@estimate[2])
092  plot(rcopula(empiricalCopula, 1000), main="Sampled Scatterplot", xlab=cnames[1], ylab=cnames[2])
093  contour(empiricalCopula, pcopula, main="Empirical Contour", xlab=cnames[1], ylab=cnames[2])
094  persp(empiricalCopula, dcopula, main="Empirical Perspective",
095    xlab=cnames[1], ylab=cnames[2], zlab="Density")
096   
097  cat("Empirical tau:", kendallsTau(empiricalCopula), "\n")
098  cat("Empirical rho:", spearmansRho(empiricalCopula), "\n")
099  cat("Empirical tail index:", tailIndex(empiricalCopula), "\n")
100   
101  return (list(fit=fit.mpl, copula=empiricalCopula))
102}

R: Gauss Copula Simulation

rcopula.gauss {QRMlib}R Documentation

Gauss Copula Simulation

Description

generates a random sample from the Gaussian copula

Usage

rcopula.gauss(n, Sigma=equicorr(d, rho), d=2, rho=0.7)

Arguments

n number of observations
Sigma correlation matrix
d dimension of copula
rho correlation parameter for specifying an equicorrelation structure

Details

This function is set up to allow quick simulation of Gauss copulas with an equicorrelation structure. Simply enter a value for the dimension d and the correlation parameter rho. For more general correlation matrices specify Sigma.

Value

a matrix with n rows and d columns

See Also

rAC, rcopula.gumbel, rcopula.clayton, rcopula.frank, rcopula.t

Examples

data <- rcopula.gauss(2000,d=6,rho=0.7); 
pairs(data); 

Results

R version 2.11.1 (2010-05-31)
Copyright (C) 2010 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(QRMlib)
> png(filename="rcopula.gauss_%03d_med.png", width=480, height=480)
> ### Name: rcopula.gauss
> ### Title: Gauss Copula Simulation
> ### Aliases: rcopula.gauss
> ### Keywords: distribution
> 
> ### ** Examples
> 
> data <- rcopula.gauss(2000,d=6,rho=0.7); 
> pairs(data); 
> 
> 
> 
> 
> dev.off()
null device 
          1 
> 
> 

Image(s)

R: rcopula.gauss problem with sum(diag(correlationmatrix))

I keep getting rcopula.gauss to error out due to machine/R significant digits.
The test in the methods sum(diag(sigma))==d fails because the one of my diag
elements is not quite 1.  I had to execute a ceiling function on the diag
elements of the correlation matrix to get the method not to fail.  

Is this a problem for anyone else?

My example is
             
             
              require(VGAM)
              require(QRMlib)
              require(gamlss)
              require(fCopulae)
              require(fGarch)
              require(copula)
              require(date)
              require(rgl)
             
              df<-7.5
              sig1<-.5
              var2<-2.5
              seed<-1234
              set.seed(seed);
              #base on sas code from web site
              #generate t distributed data
              tdata<-data.frame(date=seq(as.numeric(as.date('06/01/2001')),
                as.numeric(as.date('11/01/2002')),1))
              tdata$tdata<-.05*tdata$date+rTF(length(tdata$date),mu=5000,
              sigma=sig1,nu=df)
             
              #estimate model data~a * date + constant
                  fam<-TF
                  y1<-gamlss(tdata~date+1,data=tdata,family=fam,na.rm=T,
                    nu.start=16.58,sigma.formula=~1
                    ,nu.formula=~1)
                  y1<-gamlss(tdata~date+1,data=tdata,family=
                  TF(mu.link = identity,sigma.link = log, nu.link = log))
                  #plot(y1,na.rm=T)
                 
               
               #generate normal garch distributed data
               set.seed(12345)
                ndata<-data.frame(date=seq(as.numeric(as.date('06/01/2001')),
                  as.numeric(as.date('11/01/2002')),1))
                ndata$rnd<-rnorm(length(ndata$date))
                ndata$v<-NaN
                ndata$e<-NaN
                ndata$v[1]<-.0001+.25*var2^2 +.75* var2
                ndata$e[1]<-sqrt(ndata$v[1])*ndata$rnd[1]
                for (i in 2:length(ndata$date) ) {
                  ndata$v[i]<- 0.0001 + 0.2 * ndata$e[(i - 1)]^2 + 0.75 *
                    ndata$v[(i - 1)]
                  ndata$e[i]<-sqrt(ndata$v[i])+ndata$rnd[i]
                }
                gs<-garchSim(model = list( mu=25,omega = .0001, alpha = .2,
                beta = .75), n = length(ndata$date),
                  n.start = 100, presample = NULL, cond.dist = c("rnorm"),
                  rseed = 12345)
             
                # add constant 25
                ndata$ndata<-as.numeric(gs); #25+ndata$e
                # estimate garch 1,1 model of normally distributed data
                yn<-garchFit(ndata~garch(1,1),data=ndata,trace=F)
               
               #y = rcauchy(519, loc=-4, scale=1)
               # generate cauchy data
               set.seed(seed);
               cdata<-data.frame(date=seq(as.numeric(as.date('06/01/2001')),
                as.numeric(as.date('11/01/2002')),1))
               cauch<-unlist(-4+tan(runif(length(cdata$date))-.5)*pi)
               set.seed(seed);
               cdata$cdata<-rcauchy(length(cdata$date),loc=-4,scale=1)
               #estimate cauchy model
               yc<-vglm(cdata ~ 1 , data=cdata,cauchy1(lloc="identity"),
               trace=TRUE, crit="c")
               
               
               
               #create transformed uniform residuals for all estimated
               #series' residuals
               resids<-data.frame(date=tdata$date,tresids=y1$residuals,
                nresids=yn@residuals,cresids=yc@residuals)
               resids$tresids<-pnorm(pTF(resids$tresids/
                exp(y1$sigma.coefficient),y1$nu.coefficients))
               resids$cresids<-pnorm(pcauchy(resids$cresids,yc@coefficients[1]))
               resids$nresids<-pnorm(resids$nresids,mean(resids$nresids),
                sd=sd(resids$nresids))
               yall<-merge(tdata,merge(cdata,ndata,by.x="date",by.y="date",
                all=T),by.x="date",by.y="date",all=T)
               rnds<-data.frame(rmvnorm(2000,sigma=cov(resids[-1]),
                mean=mean(resids[-1])))
               names(rnds)<-c('T',"Normal",'Cauchy')
              #fit a normal copula to the uniform residuals
               cop.fit<-fit.norm(resids[,2:4])
              #simulate the normal copula
               cop.sim1<-rmnorm(2000, Sigma=cop.fit$Sigma, mu=cop.fit$mu)
               
 ## to get rcopula.gauss to not fail
# diag(cop.fit$cor)=ceil(diag(cop.fit$cor))
 #uncomment the above line to get rcopula.gauss to work
 cop.sim<-rcopula.gauss(2000, Sigma=cop.fit$cor)
 
           ####################
           sims<-data.frame(cop.sim)
           names(sims)<- c('T',"Normal",'Cauchy')
           # for rmnorm
           #sims$T<-qTF(pnorm(sims$T),mu=y1$mu.coefficients[1],
           # sigma=exp(y1$sigma.coefficients), nu=y1$nu.coefficients)
           #sims$Normal<-qnorm(pnorm(sims$Normal),mean(resids$nresids),
           #sd=sd(resids$nresids))
           #sims$Cauchy<-qcauchy(pnorm(sims$Cauchy),location=yc@coefficients[1])
           #sims$T<-seq(as.numeric(as.date('11/01/2002'))+1,
           #as.numeric(as.date('11/01/2002'))+2000,1) *
           #  y1$mu.coefficients[2] + sims$T
           # for rcopula.gauss
           sims$T<-qTF(sims$T,mu=y1$mu.coefficients[1],
            sigma=exp(y1$sigma.coefficients), nu=y1$nu.coefficients)
           sims$Normal<-qnorm(sims$Normal,mean(resids$nresids),
            sd=sd(resids$nresids))
           sims$Cauchy<-qcauchy(sims$Cauchy,location=yc@coefficients[1])
           sims$T<-seq(as.numeric(as.date('11/01/2002'))+1,
            as.numeric(as.date('11/01/2002'))+2000,1) *
             y1$mu.coefficients[2] + sims$T
         
          f2<-kde2d(sims$T,sims$Cauchy,lims=c(range(sims$T),c(-150,150)))
          open3d()
          bg3d("white")
          material3d(col="black")
           persp3d(x=f2$x,y=f2$y,z=f2$z, aspect=c(1, 1, 0.5), col = "lightblue",
            xlab='T',ylab='Cauchy')

R: Does R have a package for generating random numbers in multi-dimensional space? For example, suppose I want to generate 1000 points inside a cuboid or a sphere.

I have some functions for hypercube and n-sphere selection that generate dataframes with cartesian coordinates and guarantee a uniform distribution through the hypercube or n-sphere for an arbitrary amount of dimensions :
GenerateCubiclePoints <- function(nrPoints,nrDim,center=rep(0,nrDim),l=1){

    x <-  matrix(runif(nrPoints*nrDim,-1,1),ncol=nrDim)
    x <-  as.data.frame(
            t(apply(x*(l/2),1,'+',center))
          )
    names(x) <- make.names(seq_len(nrDim))
    x}
is in a cube/hypercube of nrDim dimensions with a center and l the length of one side.
For an n-sphere with nrDim dimensions, you can do something similar, where r is the radius :
GenerateSpherePoints <- function(nrPoints,nrDim,center=rep(0,nrDim),r=1){
    #generate the polar coordinates!
    x <-  matrix(runif(nrPoints*nrDim,-pi,pi),ncol=nrDim)
    x[,nrDim] <- x[,nrDim]/2
    #recalculate them to cartesians
    sin.x <- sin(x)
    cos.x <- cos(x)
    cos.x[,nrDim] <- 1  # see the formula for n.spheres

    y <- sapply(1:nrDim, function(i){
        if(i==1){
          cos.x[,1]
        } else {
          cos.x[,i]*apply(sin.x[,1:(i-1),drop=F],1,prod)
        }
    })*sqrt(runif(nrPoints,0,r^2))

    y <-  as.data.frame(
            t(apply(y,1,'+',center))
          )

    names(y) <- make.names(seq_len(nrDim))
    y}
in 2 dimensions, these give :
enter image description here
From code :
 T1 <- GenerateCubiclePoints(10000,2,c(4,3),5)
 T2 <- GenerateSpherePoints(10000,2,c(-5,3),2)
 op <- par(mfrow=c(1,2))
 plot(T1)
 plot(T2)
 par(op)
link|improve this answer


@Richie: thx for the tweak – Joris Meys Feb 16 '11 at 22:20

Thank you very much for your help and answers... – Pradeep Feb 17 '11 at 6:26

@Pradeep -- If this answers your question better, you should consider accepting this answer rather than the one you did. – Prasad Chalasani Jun 14 '11 at 19:14
feedback
Cuboid:
df <- data.frame(
    x = runif(1000),
    y = runif(1000),
    z = runif(1000)
)

head(df)

          x           y         z1 0.7522104 0.579833314 0.7878651
2 0.2846864 0.520284731 0.8435828
3 0.2240340 0.001686003 0.2143208
4 0.4933712 0.250840233 0.4618258
5 0.6749785 0.298335804 0.4494820
6 0.7089414 0.141114804 0.3772317
Sphere:
df <- data.frame(
    radius = runif(1000),
    inclination = 2*pi*runif(1000),
    azimuth = 2*pi*runif(1000)
)


head(df)

     radius inclination  azimuth1 0.1233281    5.363530 1.747377
2 0.1872865    5.309806 4.933985
3 0.2371039    5.029894 6.160549
4 0.2438854    2.962975 2.862862
5 0.5300013    3.340892 1.647043
6 0.6972793    4.777056 2.381325
Note: edited to include code for sphere
link|improve this answer

1  
I want to to generate points within the boundary of a specified shape.. say a cuboid ,cube ,sphere ??.... is there any way to generate such datasets ?? – Pradeep Feb 16 '11 at 13:46

Code added in edit. – Andrie Feb 16 '11 at 13:56
feedback
A couple of years ago, I made a package called geozoo. It is available on CRAN.
install.packages("geozoo")
library(geozoo)
It has many different functions to produce objects in N-dimensions.
p = 4
n = 1000
# Cube with points on it's face.  
# A 3D version would be a box with solid walls and a hollow interior.
cube.face(p)
# Hollow sphere
sphere.hollow(p, n)

# Solid cube
cube.solid.random(p, n)
cube.solid.grid(p, 10) # evenly spaced points
# Solid Sphere
sphere.solid.random(p, n)
sphere.solid.grid(p, 10) # evenly spaced points
One of my favorite ones to watch animate is a cube with points along its edges, because it was one of the first objects that I made. It also gives you a sense of distance between vertices.
# Cube with points along it's edges.  
cube.dotline(4)
Also, check out the website: http://streaming.stat.iastate.edu/~dicook/geometric-data/. It contains pictures and downloadable data sets.
Hope it meets your needs!

R: example of how to use a Guassian Copula to create random draws from normally distributed data

# example of how to use a Guassian Copula to create random draws from
# normally distributed data
require("reshape")
require("plyr")
require("QRMlib")
require("Matrix")
#make this reproducable
set.seed(2)
#how many draws in our starting data set?
n <- 1e4
# how many draws do we want from this distribution?
drawCount <- 1e4
# ourData will be my starting data which we'll
# base our model on
myData <- rnorm(n, 5, .6)
yourData <- myData + rnorm(n, 8, .25)
hisData <- myData + rnorm(n, 6, .4)
herData <- hisData + rnorm(n, 8, .35)
ourData <- data.frame(myData, yourData, hisData, herData)
# now we have raw correlations in the 70s, 80s, and 90s. Funny how that
# works out
cor(ourData)
#set up some simple functions for normalizing (Z Score) and
#denormalizing
normalMoments <- function(t) {
as.list(c(mean=mean(t), sd=sd(t)))
}
normalize <- function(x) {
   (x - mean(x)) / sd(x)
}
deNormalize <- function(x, sampleMean, sampleSd) {
    (x * sampleSd + sampleMean)
}
# create an object with the mean and sd of each
# of the margins... you can do this without plyr
# using apply() if you're into that sort of thing
# but alply makes it so easy to work with the results
# we will use these later to denormalize
normalMomentList <- alply(ourData, 2, normalMoments)
# normalize i.e. create Z score
# I think you don't HAVE to do this with gaussian marginals,
# but you DO with non-gaussian... so I'm in the habit.
ourDataZ <- data.frame(apply(ourData, 2, normalize))
# now ourDataZ is a data frame of Z scores.
# prove this to yourself by checking the
# mean and SD of each margin
apply(ourDataZ, 2, mean)
apply(ourDataZ, 2, sd)
#looks like mean of 0 and sd of 1 to me
# this fixes positive def, fits a copula, and makes draws
# in one line of code. Try that in Excel, bitches.
myDraws <- rcopula.gauss(n=drawCount,
                         Sigma=as.matrix(nearPD(Spearman(ourDataZ),corr=TRUE)$mat),
                         d=ncol(ourData))
# rcopula.gauss spits out points from 0-1 (i.e. q values) so we need to turn those into
# Z scores by doing a little norm inversion.
myDraws <- qnorm(myDraws)
#check the mean and sd
apply(myDraws, 2, mean)
apply(myDraws, 2, sd)
#should be 0, 1... and they are close..
# but we can do better than that... we know statistics!
# let's do a Kolmogorov-Smirnov test to see if these
# samples come from the same distribution
# KS does not check correlation,
# it only tests if two sets of samples
# came from same dist.. we'll check each column
for (i in 1:ncol(ourData)){
   print(ks.test(myDraws[[i]], ourDataZ[[i]]))
}
# if the p-value of the KS test is < .05 then we
# reject that the distributions are equal
# they all look > .05 to me.
# Kolmogorov-Smirnov makes me want to drink, for some odd reason
# If your starting sample is small, you'll notice that a couple of the variables
# fail or just barely pass the KS test. This is common because KS is non-parametric and
# the starting sample will be 'lumpy' and not that big. If starting n gets up
# to, say, 10K, then they all do better. That's why I started with 10K and still
# it can be hit or miss
# so we have a metric shit ton of random draws. But they are Z scores
# and we want them put back in their original shapes. So let's do that:
myDrawsDenorm <- myDraws
for (i in 1:ncol(myDrawsDenorm)) {
   myDrawsDenorm[,i] <- deNormalize(myDraws[,i],
                                     normalMomentList[[i]][[1]],
                                     normalMomentList[[i]][[2]])
}
myDrawsDenorm <- data.frame(myDrawsDenorm)
names(myDrawsDenorm) <- names(ourData)
#let's look at the mean and standard dev of the starting data
apply(ourData, 2, mean)
apply(ourData, 2, sd)
#compare that with our sample data
apply(myDrawsDenorm, 2, mean)
apply(myDrawsDenorm, 2, sd)
# so myDrawsDenorm contains the final draws
# let's check Kolmogorov-Smirnov between the starting data
# and the final draws
for (i in 1:ncol(ourData)){
   print(ks.test(myDrawsDenorm[[i]], ourData[[i]]))
}
# holy shit. it works!
#look at the correlation matrices
cor(myDrawsDenorm)
cor(ourData)
#it's fun to plot the variables and see if the PDFs line up
#you could do this for each variable. It's a good sanity check.
plot(density(myDrawsDenorm$myData))
lines(density(ourData$myData), col="red")
# there's a test to see if the corr matricices are the same
# but I'm too lazy to google for it