Title: | Learning Bayesian Inference |
---|---|
Description: | Contains functions for summarizing basic one and two parameter posterior distributions and predictive distributions. It contains MCMC algorithms for summarizing posterior distributions defined by the user. It also contains functions for regression models, hierarchical models, Bayesian tests, and illustrations of Gibbs sampling. |
Authors: | Jim Albert |
Maintainer: | Jim Albert <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.17 |
Built: | 2024-11-11 02:58:05 UTC |
Source: | https://github.com/bayesball/learnbayes |
Achievement data for a group of Austrian school children
achievement
achievement
A data frame with 109 observations on the following 7 variables.
gender of child where 0 is male and 1 is female
age in months
iq score
test score on mathematics computation
test score on mathematics problem solving
test score on reading speed
test score on reading comprehension
Abraham, B., and Ledolter, J. (2006), Introduction to Regression Modeling, Duxbury.
Head to head records for all teams in the 1964 National League baseball season. Teams are coded as Cincinnati (1), Chicago (2), Houston (3), Los Angeles (4), Milwaukee (5), New York (6), Philadelphia (7), Pittsburgh (8), San Francisco (9), and St. Louis (10).
baseball.1964
baseball.1964
A data frame with 45 observations on the following 4 variables.
Number of team 1
Number of team 2
Number of games won by team 1
Number of games won by team 2
www.baseball-reference.com website.
Computes probability intervals for the log precision parameter K in a beta-binomial model for all "leave one out" models using sampling importance resampling
bayes.influence(theta,data)
bayes.influence(theta,data)
theta |
matrix of simulated draws from the posterior of (logit eta, log K) |
data |
matrix with columns of counts and sample sizes |
summary |
vector of 5th, 50th, 95th percentiles of log K for complete sample posterior |
summary.obs |
matrix where the ith row contains the 5th, 50th, 95th percentiles of log K for posterior when the ith observation is removed |
Jim Albert
data(cancermortality) start=array(c(-7,6),c(1,2)) fit=laplace(betabinexch,start,cancermortality) tpar=list(m=fit$mode,var=2*fit$var,df=4) theta=sir(betabinexch,tpar,1000,cancermortality) intervals=bayes.influence(theta,cancermortality)
data(cancermortality) start=array(c(-7,6),c(1,2)) fit=laplace(betabinexch,start,cancermortality) tpar=list(m=fit$mode,var=2*fit$var,df=4) theta=sir(betabinexch,tpar,1000,cancermortality) intervals=bayes.influence(theta,cancermortality)
Using Zellner's G priors, computes the log marginal density for all possible regression models
bayes.model.selection(y, X, c, constant=TRUE)
bayes.model.selection(y, X, c, constant=TRUE)
y |
vector of response values |
X |
matrix of covariates |
c |
parameter of the G prior |
constant |
logical variable indicating if a constant term is in the matrix X |
mod.prob |
data frame specifying the model, the value of the log marginal density and the value of the posterior model probability |
converge |
logical vector indicating if the laplace algorithm converged for each model |
Jim Albert
data(birdextinct) logtime=log(birdextinct$time) X=cbind(1,birdextinct$nesting,birdextinct$size,birdextinct$status) bayes.model.selection(logtime,X,100)
data(birdextinct) logtime=log(birdextinct$time) X=cbind(1,birdextinct$nesting,birdextinct$size,birdextinct$status) bayes.model.selection(logtime,X,100)
Gives a simulated sample from the joint posterior distribution of the regression vector for a binary response regression model with a probit link and a informative normal(beta, P) prior. Also computes the log marginal likelihood when a subjective prior is used.
bayes.probit(y,X,m,prior=list(beta=0,P=0))
bayes.probit(y,X,m,prior=list(beta=0,P=0))
y |
vector of binary responses |
X |
covariate matrix |
m |
number of simulations desired |
prior |
list with components beta, the prior mean, and P, the prior precision matrix |
beta |
matrix of simulated draws of regression vector beta where each row corresponds to one draw |
log.marg |
simulation estimate at log marginal likelihood of the model |
Jim Albert
response=c(0,1,0,0,0,1,1,1,1,1) covariate=c(1,2,3,4,5,6,7,8,9,10) X=cbind(1,covariate) prior=list(beta=c(0,0),P=diag(c(.5,10))) m=1000 s=bayes.probit(response,X,m,prior)
response=c(0,1,0,0,0,1,1,1,1,1) covariate=c(1,2,3,4,5,6,7,8,9,10) X=cbind(1,covariate) prior=list(beta=c(0,0),P=diag(c(.5,10))) m=1000 s=bayes.probit(response,X,m,prior)
Computes the posterior probabilities that Bayesian residuals exceed a cutoff value for a linear regression model with a noninformative prior
bayesresiduals(lmfit,post,k)
bayesresiduals(lmfit,post,k)
lmfit |
output of the regression function lm |
post |
list with components beta, matrix of simulated draws of regression parameter, and sigma, vector of simulated draws of sampling standard deviation |
k |
cut-off value that defines an outlier |
vector of posterior outlying probabilities
Jim Albert
chirps=c(20,16.0,19.8,18.4,17.1,15.5,14.7,17.1,15.4,16.2,15,17.2,16,17,14.1) temp=c(88.6,71.6,93.3,84.3,80.6,75.2,69.7,82,69.4,83.3,78.6,82.6,80.6,83.5,76.3) X=cbind(1,chirps) lmfit=lm(temp~X) m=1000 post=blinreg(temp,X,m) k=2 bayesresiduals(lmfit,post,k)
chirps=c(20,16.0,19.8,18.4,17.1,15.5,14.7,17.1,15.4,16.2,15,17.2,16,17,14.1) temp=c(88.6,71.6,93.3,84.3,80.6,75.2,69.7,82,69.4,83.3,78.6,82.6,80.6,83.5,76.3) X=cbind(1,chirps) lmfit=lm(temp~X) m=1000 post=blinreg(temp,X,m) k=2 bayesresiduals(lmfit,post,k)
Yields of bermuda grass for a factorial design of nutrients nitrogen, phosphorus, and potassium.
bermuda.grass
bermuda.grass
A data frame with 64 observations on the following 4 variables.
yield of bermuda grass in tons per acre
level of nitrogen
level of phosphorus
level of potassium
McCullagh, P., and Nelder, J. (1989), Generalized Linear Models, Chapman and Hall.
Finds the shape parameters of a beta density that matches knowledge of two quantiles of the distribution.
beta.select(quantile1, quantile2)
beta.select(quantile1, quantile2)
quantile1 |
list with components p, the value of the first probability, and x, the value of the first quantile |
quantile2 |
list with components p, the value of the second probability, and x, the value of the second quantile |
vector of shape parameters of the matching beta distribution
Jim Albert
# person believes the median of the prior is 0.25 # and the 90th percentile of the prior is 0.45 quantile1=list(p=.5,x=0.25) quantile2=list(p=.9,x=0.45) beta.select(quantile1,quantile2)
# person believes the median of the prior is 0.25 # and the 90th percentile of the prior is 0.45 quantile1=list(p=.5,x=0.25) quantile2=list(p=.9,x=0.45) beta.select(quantile1,quantile2)
Computes the log posterior density of logit mean and log precision for a Binomial/beta exchangeable model
betabinexch(theta,data)
betabinexch(theta,data)
theta |
vector of parameter values of logit eta and log K |
data |
a matrix with columns y (counts) and n (sample sizes) |
value of the log posterior
Jim Albert
n=c(20,20,20,20,20) y=c(1,4,3,6,10) data=cbind(y,n) theta=c(-1,0) betabinexch(theta,data)
n=c(20,20,20,20,20) y=c(1,4,3,6,10) data=cbind(y,n) theta=c(-1,0) betabinexch(theta,data)
Computes the log posterior density of mean and precision for a Binomial/beta exchangeable model
betabinexch0(theta,data)
betabinexch0(theta,data)
theta |
vector of parameter values of eta and K |
data |
a matrix with columns y (counts) and n (sample sizes) |
value of the log posterior
Jim Albert
n=c(20,20,20,20,20) y=c(1,4,3,6,10) data=cbind(y,n) theta=c(.1,10) betabinexch0(theta,data)
n=c(20,20,20,20,20) y=c(1,4,3,6,10) data=cbind(y,n) theta=c(.1,10) betabinexch0(theta,data)
Computes the logarithm of the integral of the Bayes factor for testing homogeneity of a set of proportions
bfexch(theta,datapar)
bfexch(theta,datapar)
theta |
value of the logit of the prior mean hyperparameter |
datapar |
list with components data, matrix with columns y (counts) and n (sample sizes), and K, prior precision hyperparameter |
value of the logarithm of the integral
Jim Albert
y=c(1,3,2,4,6,4,3) n=c(10,10,10,10,10,10,10) data=cbind(y,n) K=20 datapar=list(data=data,K=K) theta=1 bfexch(theta,datapar)
y=c(1,3,2,4,6,4,3) n=c(10,10,10,10,10,10,10) data=cbind(y,n) K=20 datapar=list(data=data,K=K) theta=1 bfexch(theta,datapar)
Computes a Bayes factor against independence for a two-way contingency table assuming a "close to independence" alternative model
bfindep(y,K,m)
bfindep(y,K,m)
y |
matrix of counts |
K |
Dirichlet precision hyperparameter |
m |
number of simulations |
bf |
value of the Bayes factor against hypothesis of independence |
nse |
estimate of the simulation standard error of the computed Bayes factor |
Jim Albert
y=matrix(c(10,4,6,3,6,10),c(2,3)) K=20 m=1000 bfindep(y,K,m)
y=matrix(c(10,4,6,3,6,10),c(2,3)) K=20 m=1000 bfindep(y,K,m)
Computes the parameters and mixing probabilities for a binomial sampling problem where the prior is a discrete mixture of beta densities.
binomial.beta.mix(probs,betapar,data)
binomial.beta.mix(probs,betapar,data)
probs |
vector of probabilities of the beta components of the prior |
betapar |
matrix where each row contains the shape parameters for a beta component of the prior |
data |
vector of number of successes and number of failures |
probs |
vector of probabilities of the beta components of the posterior |
betapar |
matrix where each row contains the shape parameters for a beta component of the posterior |
Jim Albert
probs=c(.5, .5) beta.par1=c(15,5) beta.par2=c(10,10) betapar=rbind(beta.par1,beta.par2) data=c(20,15) binomial.beta.mix(probs,betapar,data)
probs=c(.5, .5) beta.par1=c(15,5) beta.par2=c(10,10) betapar=rbind(beta.par1,beta.par2) data=c(20,15) binomial.beta.mix(probs,betapar,data)
Measurements on breedings pairs of landbird species were collected from 16 islands about Britain over several decades.
birdextinct
birdextinct
A data frame with 62 observations on the following 5 variables.
name of bird species
average time of extinction on the islands
average number of nesting pairs
size of the species, 1 or 0 if large or small
staus of the species, 1 or 0 if resident or migrant
Pimm, S., Jones, H., and Diamond, J. (1988), On the risk of extinction, American Naturalists, 132, 757-785.
Dobson describes a study where one is interested in predicting a baby's birthweight based on the gestational age and the baby's gender.
birthweight
birthweight
A data frame with 24 observations on the following 3 variables.
gestational age in weeks
gender of the baby where 0 (1) is male (female)
birthweight of baby in grams
Dobson, A. (2001), An Introduction to Generalized Linear Models, New York: Chapman and Hall.
Gives a simulated sample from the joint posterior distribution of the regression vector and the error standard deviation for a linear regression model with a noninformative or g prior.
blinreg(y,X,m,prior=NULL)
blinreg(y,X,m,prior=NULL)
y |
vector of responses |
X |
design matrix |
m |
number of simulations desired |
prior |
list with components c0 and beta0 of Zellner's g prior |
beta |
matrix of simulated draws of beta where each row corresponds to one draw |
sigma |
vector of simulated draws of the error standard deviation |
Jim Albert
chirps=c(20,16.0,19.8,18.4,17.1,15.5,14.7,17.1,15.4,16.2,15,17.2,16,17,14.1) temp=c(88.6,71.6,93.3,84.3,80.6,75.2,69.7,82,69.4,83.3,78.6,82.6,80.6,83.5,76.3) X=cbind(1,chirps) m=1000 s=blinreg(temp,X,m)
chirps=c(20,16.0,19.8,18.4,17.1,15.5,14.7,17.1,15.4,16.2,15,17.2,16,17,14.1) temp=c(88.6,71.6,93.3,84.3,80.6,75.2,69.7,82,69.4,83.3,78.6,82.6,80.6,83.5,76.3) X=cbind(1,chirps) m=1000 s=blinreg(temp,X,m)
Simulates draws of the posterior distribution of an expected response for a linear regression model with a noninformative prior
blinregexpected(X1,theta.sample)
blinregexpected(X1,theta.sample)
X1 |
matrix where each row corresponds to a covariate set |
theta.sample |
list with components beta, matrix of simulated draws of regression vector, and sigma, vector of simulated draws of sampling error standard deviation |
matrix where a column corresponds to the simulated draws of the expected response for a given covariate set
Jim Albert
chirps=c(20,16.0,19.8,18.4,17.1,15.5,14.7,17.1,15.4,16.2,15,17.2,16,17,14.1) temp=c(88.6,71.6,93.3,84.3,80.6,75.2,69.7,82,69.4,83.3,78.6,82.6,80.6,83.5,76.3) X=cbind(1,chirps) m=1000 theta.sample=blinreg(temp,X,m) covset1=c(1,15) covset2=c(1,20) X1=rbind(covset1,covset2) blinregexpected(X1,theta.sample)
chirps=c(20,16.0,19.8,18.4,17.1,15.5,14.7,17.1,15.4,16.2,15,17.2,16,17,14.1) temp=c(88.6,71.6,93.3,84.3,80.6,75.2,69.7,82,69.4,83.3,78.6,82.6,80.6,83.5,76.3) X=cbind(1,chirps) m=1000 theta.sample=blinreg(temp,X,m) covset1=c(1,15) covset2=c(1,20) X1=rbind(covset1,covset2) blinregexpected(X1,theta.sample)
Simulates draws of the predictive distribution of a future response for a linear regression model with a noninformative prior
blinregpred(X1,theta.sample)
blinregpred(X1,theta.sample)
X1 |
matrix where each row corresponds to a covariate set |
theta.sample |
list with components beta, matrix of simulated draws of regression vector, and sigma, vector of simulated draws of sampling error standard deviation |
matrix where a column corresponds to the simulated draws of the predicted response for a given covariate set
Jim Albert
chirps=c(20,16.0,19.8,18.4,17.1,15.5,14.7,17.1,15.4,16.2,15,17.2,16,17,14.1) temp=c(88.6,71.6,93.3,84.3,80.6,75.2,69.7,82,69.4,83.3,78.6,82.6,80.6,83.5,76.3) X=cbind(1,chirps) m=1000 theta.sample=blinreg(temp,X,m) covset1=c(1,15) covset2=c(1,20) X1=rbind(covset1,covset2) blinregpred(X1,theta.sample)
chirps=c(20,16.0,19.8,18.4,17.1,15.5,14.7,17.1,15.4,16.2,15,17.2,16,17,14.1) temp=c(88.6,71.6,93.3,84.3,80.6,75.2,69.7,82,69.4,83.3,78.6,82.6,80.6,83.5,76.3) X=cbind(1,chirps) m=1000 theta.sample=blinreg(temp,X,m) covset1=c(1,15) covset2=c(1,20) X1=rbind(covset1,covset2) blinregpred(X1,theta.sample)
Gives a simulated sample for fitted probabilities for a binary response regression model with a probit link and noninformative prior.
bprobit.probs(X1,fit)
bprobit.probs(X1,fit)
X1 |
matrix where each row corresponds to a covariate set |
fit |
simulated matrix of draws of the regression vector |
matrix of simulated draws of the fitted probabilities, where a column corresponds to a particular covariate set
Jim Albert
response=c(0,1,0,0,0,1,1,1,1,1) covariate=c(1,2,3,4,5,6,7,8,9,10) X=cbind(1,covariate) m=1000 fit=bayes.probit(response,X,m) x1=c(1,3) x2=c(1,8) X1=rbind(x1,x2) fittedprobs=bprobit.probs(X1,fit$beta)
response=c(0,1,0,0,0,1,1,1,1,1) covariate=c(1,2,3,4,5,6,7,8,9,10) X=cbind(1,covariate) m=1000 fit=bayes.probit(response,X,m) x1=c(1,3) x2=c(1,8) X1=rbind(x1,x2) fittedprobs=bprobit.probs(X1,fit$beta)
Computes the log posterior density of the talent parameters and the log standard deviation for a Bradley Terry model with normal random effects
bradley.terry.post(theta,data)
bradley.terry.post(theta,data)
theta |
vector of talent parameters and log standard deviation |
data |
data matrix with columns team1, team2, wins by team1, and wins by team2 |
value of the log posterior
Jim Albert
data(baseball.1964) team.strengths=rep(0,10) log.sigma=0 bradley.terry.post(c(team.strengths,log.sigma),baseball.1964)
data(baseball.1964) team.strengths=rep(0,10) log.sigma=0 bradley.terry.post(c(team.strengths,log.sigma),baseball.1964)
Collett (1994) describes a study to evaluate the effectiveness of a histochemical marker in predicting the survival experience of women with breast cancer.
breastcancer
breastcancer
A data frame with 45 observations on the following 3 variables.
survival time in months
censoring indicator where 1 (0) indicates a complete (censored) survival time
indicates by a 0 (1) if tumor was negatively (positively) stained
Collett, D. (1994), Modelling Survival Data in Medical Research, London: Chapman and Hall.
Grades and other variables collected for a sample of calculus students.
calculus.grades
calculus.grades
A data frame with 100 observations on the following 3 variables.
indicates if student received a A or B in class
indicates if student received a A in prerequisite math class
score on the ACT math test
Collected by a colleague of the author at his university.
Number of cancer deaths and number at risk for 20 cities in Missouri.
cancermortality
cancermortality
A data frame with 20 observations on the following 2 variables.
number of cancer deaths
number at risk
Tsutakawa, R., Shoop, G., and Marienfeld, C. (1985), Empirical Bayes Estimation of Cancer Mortality Rates, Statistics in Medicine, 4, 201-212.
Setups the data matrices for the use of WinBUGS in the career trajectory application.
careertraj.setup(data)
careertraj.setup(data)
data |
data matrix for ballplayers with variables Player, Year, Age, G, AB, R, H, X2B, X3B, HR, RBI, BB, SO |
player.names |
vector of player names |
y |
matrix of home runs for players where a row corresponds to the home runs for a player during all the years of his career |
n |
matrix of AB-SO for all players |
x |
matrix of ages for all players for all years of their careers |
T |
vector of number of seasons for all players |
N |
number of players |
Jim Albert
data(sluggerdata) careertraj.setup(sluggerdata)
data(sluggerdata) careertraj.setup(sluggerdata)
Computes the log posterior density of (M,log S) when a sample is taken from a Cauchy density with location M and scale S and a uniform prior distribution is taken on (M, log S)
cauchyerrorpost(theta,data)
cauchyerrorpost(theta,data)
theta |
vector of parameter values of M and log S |
data |
vector containing sample of observations |
value of the log posterior
Jim Albert
data=c(108, 51, 7, 43, 52, 54, 53, 49, 21, 48) theta=c(40,1) cauchyerrorpost(theta,data)
data=c(108, 51, 7, 43, 52, 54, 53, 49, 21, 48) theta=c(40,1) cauchyerrorpost(theta,data)
Edmunson et al (1979) studied the effect of different chemotherapy treatments following surgical treatment of ovarian cancer.
chemotherapy
chemotherapy
A data frame with 26 observations on the following 5 variables.
patient number
survival time in days following treatment
indicates if time is censored (0) or actually observed (1)
control group (0) or treatment group (1)
age of the patient
Edmonson, J., Felming, T., Decker, D., Malkasian, G., Jorgensen, E., Jefferies, J.,Webb, M., and Kvols, L. (1979), Different chemotherapeutic sensitivities and host factors affecting prognosis in advanced ovarian carcinoma versus minimal residual disease, Cancer Treatment Reports, 63, 241-247.
Computes a Bayes factor against independence for a two-way contingency table assuming uniform prior distributions
ctable(y,a)
ctable(y,a)
y |
matrix of counts |
a |
matrix of prior hyperparameters |
value of the Bayes factor against independence
Jim Albert
y=matrix(c(10,4,6,3,6,10),c(2,3)) a=matrix(rep(1,6),c(2,3)) ctable(y,a)
y=matrix(c(10,4,6,3,6,10),c(2,3)) a=matrix(rep(1,6),c(2,3)) ctable(y,a)
Fifteen differences of the heights of cross and self fertilized plants quoted by Fisher (1960)
darwin
darwin
A data frame with 15 observations on the following 1 variable.
difference of heights of two types of plants
Fisher, R. (1960), Statistical Methods for Research Workers, Edinburgh: Oliver and Boyd.
Computes a highest probability interval for a discrete probability distribution
discint(dist, prob)
discint(dist, prob)
dist |
probability distribution written as a matrix where the first column contain the values and the second column the probabilities |
prob |
probability content of interest |
prob |
exact probability content of interval |
set |
set of values of the probability interval |
Jim Albert
x=0:10 probs=dbinom(x,size=10,prob=.3) dist=cbind(x,probs) pcontent=.8 discint(dist,pcontent)
x=0:10 probs=dbinom(x,size=10,prob=.3) dist=cbind(x,probs) pcontent=.8 discint(dist,pcontent)
Computes the posterior distribution for an arbitrary one parameter distribution for a discrete prior distribution.
discrete.bayes(df,prior,y,...)
discrete.bayes(df,prior,y,...)
df |
name of the function defining the sampling density |
prior |
vector defining the prior density; names of the vector define the parameter values and entries of the vector define the prior probabilities |
y |
vector of data values |
... |
any further fixed parameter values used in the sampling density function |
prob |
vector of posterior probabilities |
pred |
scalar with prior predictive probability |
Jim Albert
prior=c(.25,.25,.25,.25) names(prior)=c(.2,.25,.3,.35) y=5 n=10 discrete.bayes(dbinom,prior,y,size=n)
prior=c(.25,.25,.25,.25) names(prior)=c(.2,.25,.3,.35) y=5 n=10 discrete.bayes(dbinom,prior,y,size=n)
Computes the posterior distribution for an arbitrary two parameter distribution for a discrete prior distribution.
discrete.bayes.2(df,prior,y=NULL,...)
discrete.bayes.2(df,prior,y=NULL,...)
df |
name of the function defining the sampling density of two parameters |
prior |
matrix defining the prior density; the row names and column names of the matrix define respectively the values of parameter 1 and values of parameter 2 and the entries of the matrix give the prior probabilities |
y |
y is a matrix of data values, where each row corresponds to a single observation |
... |
any further fixed parameter values used in the sampling density function |
prob |
matrix of posterior probabilities |
pred |
scalar with prior predictive probability |
Jim Albert
p1 = seq(0.1, 0.9, length = 9) p2 = p1 prior = matrix(1/81, 9, 9) dimnames(prior)[[1]] = p1 dimnames(prior)[[2]] = p2 discrete.bayes.2(twoproplike,prior)
p1 = seq(0.1, 0.9, length = 9) p2 = p1 prior = matrix(1/81, 9, 9) dimnames(prior)[[1]] = p1 dimnames(prior)[[2]] = p2 discrete.bayes.2(twoproplike,prior)
Computes the density of a multivariate normal distribution
dmnorm(x, mean = rep(0, d), varcov, log = FALSE)
dmnorm(x, mean = rep(0, d), varcov, log = FALSE)
x |
vector of length d or matrix with d columns, giving the coordinates of points where density is to evaluated |
mean |
numeric vector giving the location parameter of the distribution |
varcov |
a positive definite matrix representing the scale matrix of the distribution |
log |
a logical value; if TRUE, the logarithm of the density is to be computed |
vector of density values
Jim Albert
mu <- c(1,12,2) Sigma <- matrix(c(1,2,0,2,5,0.5,0,0.5,3), 3, 3) x <- c(2,14,0) f <- dmnorm(x, mu, Sigma)
mu <- c(1,12,2) Sigma <- matrix(c(1,2,0,2,5,0.5,0,0.5,3), 3, 3) x <- c(2,14,0) f <- dmnorm(x, mu, Sigma)
Computes the density of a multivariate t distribution
dmt(x, mean = rep(0, d), S, df = Inf, log=FALSE)
dmt(x, mean = rep(0, d), S, df = Inf, log=FALSE)
x |
vector of length d or matrix with d columns, giving the coordinates of points where density is to evaluated |
mean |
numeric vector giving the location parameter of the distribution |
S |
a positive definite matrix representing the scale matrix of the distribution |
df |
degrees of freedom |
log |
a logical value; if TRUE, the logarithm of the density is to be computed |
vector of density values
Jim Albert
mu <- c(1,12,2) Sigma <- matrix(c(1,2,0,2,5,0.5,0,0.5,3), 3, 3) df <- 4 x <- c(2,14,0) f <- dmt(x, mu, Sigma, df)
mu <- c(1,12,2) Sigma <- matrix(c(1,2,0,2,5,0.5,0,0.5,3), 3, 3) df <- 4 x <- c(2,14,0) f <- dmt(x, mu, Sigma, df)
Data contains the age, gender and survival status for 45 members of the Donner Party who experienced difficulties in crossing the Sierra Nevada mountains in California.
donner
donner
A data frame with 45 observations on the following 3 variables.
age of person
gender that is 1 (0) if person is male (female)
survival status, 1 or 0 if person survived or died
Grayson, D. (1960), Donner party deaths: a demographic assessment, Journal of Anthropological Assessment, 46, 223-242.
For each of the Florida counties in the 2000 presidential election, the number of votes for George Bush, Al Gore, and Pat Buchanan is recorded. Also the number of votes for the minority candidate Ross Perot in the 1996 presidential election is recorded.
election
election
A data frame with 67 observations on the following 5 variables.
name of Florida county
number of votes for Ross Perot in 1996 election
number of votes for Al Gore in 2000 election
number of votes for George Bush in 2000 election
number of votes for Pat Buchanan in 2000 election
Results of recent state polls in the 2008 United States Presidential Election between Barack Obama and John McCain.
election.2008
election.2008
A data frame with 51 observations on the following 4 variables.
name of the state
percentage of poll survey for McCain
precentage of poll survey for Obama
number of electoral votes
Data collected by author in November 2008 from www.cnn.com website.
Game outcomes and point spreads for 672 professional American football games.
footballscores
footballscores
A data frame with 672 observations on the following 8 variables.
year of game
indicates if favorite is the home team
score of favorite team
score of underdog team
point spread
name of favorite team
name of underdog team
week number of the season
Gelman, A., Carlin, J., Stern, H., and Rubin, D. (2003), Bayesian Data Analysis, Chapman and Hall.
Implements a Metropolis-within-Gibbs sampling algorithm for an arbitrary real-valued posterior density defined by the user
gibbs(logpost,start,m,scale,...)
gibbs(logpost,start,m,scale,...)
logpost |
function defining the log posterior density |
start |
array with a single row that gives the starting value of the parameter vector |
m |
the number of iterations of the chain |
scale |
vector of scale parameters for the random walk Metropolis steps |
... |
data that is used in the function logpost |
par |
a matrix of simulated values where each row corresponds to a value of the vector parameter |
accept |
vector of acceptance rates of the Metropolis steps of the algorithm |
Jim Albert
data=c(6,2,3,10) start=array(c(1,1),c(1,2)) m=1000 scale=c(2,2) s=gibbs(logctablepost,start,m,scale,data)
data=c(6,2,3,10) start=array(c(1,1),c(1,2)) m=1000 scale=c(2,2) s=gibbs(logctablepost,start,m,scale,data)
Computes the log posterior density of (M,log S) for normal sampling where the data is observed in grouped form
groupeddatapost(theta,data)
groupeddatapost(theta,data)
theta |
vector of parameter values M and log S |
data |
list with components int.lo, a vector of left endpoints, int.hi, a vector of right endpoints, and f, a vector of bin frequencies |
value of the log posterior
Jim Albert
int.lo=c(-Inf,10,15,20,25) int.hi=c(10,15,20,25,Inf) f=c(2,5,8,4,2) data=list(int.lo=int.lo,int.hi=int.hi,f=f) theta=c(20,1) groupeddatapost(theta,data)
int.lo=c(-Inf,10,15,20,25) int.hi=c(10,15,20,25,Inf) f=c(2,5,8,4,2) data=list(int.lo=int.lo,int.hi=int.hi,f=f) theta=c(20,1) groupeddatapost(theta,data)
The number of deaths within 30 days of heart transplant surgery for 94 U.S. hospitals that performed at least 10 heart transplant surgeries. Also the exposure, the expected number of deaths, is recorded for each hospital.
hearttransplants
hearttransplants
A data frame with 94 observations on the following 2 variables.
expected number of deaths (the exposure)
observed number of deaths within 30 days of heart transplant surgery
Christiansen, C. and Morris, C. (1995), Fitting and checking a two-level Poisson model: modeling patient mortality rates in heart transplant patients, in Berry, D. and Stangl, D., eds, Bayesian Biostatistics, Marcel Dekker.
Implements Gibbs sampling for estimating a two-way table of means under a hierarchical regression model.
hiergibbs(data,m)
hiergibbs(data,m)
data |
data matrix with columns observed sample means, sample sizes, and values of two covariates |
m |
number of cycles of Gibbs sampling |
beta |
matrix of simulated values of regression vector |
mu |
matrix of simulated values of cell means |
var |
vector of simulated values of second-stage prior variance |
Jim Albert
data(iowagpa) m=1000 s=hiergibbs(iowagpa,m)
data(iowagpa) m=1000 s=hiergibbs(iowagpa,m)
Computes the density of a probability distribution defined on a set of equal-width intervals
histprior(p,midpts,prob)
histprior(p,midpts,prob)
p |
vector of values for which density is to be computed |
midpts |
vector of midpoints of the intervals |
prob |
vector of probabilities of the intervals |
vector of values of the probability density
Jim Albert
midpts=c(.1,.3,.5,.7,.9) prob=c(.2,.2,.4,.1,.1) p=seq(.01,.99,by=.01) plot(p,histprior(p,midpts,prob),type="l")
midpts=c(.1,.3,.5,.7,.9) prob=c(.2,.2,.4,.1,.1) p=seq(.01,.99,by=.01) plot(p,histprior(p,midpts,prob),type="l")
Computes the logarithm of a dependent prior on two proportions proposed by Howard in a Statistical Science paper in 1998.
howardprior(xy,par)
howardprior(xy,par)
xy |
vector of proportions p1 and p2 |
par |
vector containing parameter values alpha, beta, gamma, delta, sigma |
value of the log posterior
Jim Albert
param=c(1,1,1,1,2) p=c(.1,.5) howardprior(p,param)
param=c(1,1,1,1,2) p=c(.1,.5) howardprior(p,param)
Implements importance sampling to compute the posterior mean of a function using a multivariate t proposal density
impsampling(logf,tpar,h,n,data)
impsampling(logf,tpar,h,n,data)
logf |
function that defines the logarithm of the density of interest |
tpar |
list of parameters of t proposal density including the mean m, scale matrix var, and degrees of freedom df |
h |
function that defines h(theta) |
n |
number of simulated draws from proposal density |
data |
data and or parameters used in the function logf |
est |
estimate at the posterior mean |
se |
simulation standard error of estimate |
theta |
matrix of simulated draws from proposal density |
wt |
vector of importance sampling weights |
Jim Albert
data(cancermortality) start=c(-7,6) fit=laplace(betabinexch,start,cancermortality) tpar=list(m=fit$mode,var=2*fit$var,df=4) myfunc=function(theta) return(theta[2]) theta=impsampling(betabinexch,tpar,myfunc,1000,cancermortality)
data(cancermortality) start=c(-7,6) fit=laplace(betabinexch,start,cancermortality) tpar=list(m=fit$mode,var=2*fit$var,df=4) myfunc=function(theta) return(theta[2]) theta=impsampling(betabinexch,tpar,myfunc,1000,cancermortality)
Simulates iterates of an independence Metropolis chain with a normal proposal density for an arbitrary real-valued posterior density defined by the user
indepmetrop(logpost,proposal,start,m,stuff)
indepmetrop(logpost,proposal,start,m,stuff)
logpost |
function defining the log posterior density |
proposal |
a list containing mu, an estimated mean and var, an estimated variance-covariance matrix, of the normal proposal density |
start |
vector containing the starting value of the parameter |
m |
the number of iterations of the chain |
stuff |
data and prior info that is used in the function logpost |
par |
a matrix of simulated values where each row corresponds to a value of the vector parameter |
accept |
the acceptance rate of the algorithm |
Jim Albert
data=c(6,2,3,10) proposal=list(mu=array(c(2.3,-.1),c(2,1)),var=diag(c(1,1))) start=array(c(0,0),c(1,2)) m=1000 fit=indepmetrop(logctablepost,proposal,start,m,data)
data=c(6,2,3,10) proposal=list(mu=array(c(2.3,-.1),c(2,1)),var=diag(c(1,1))) start=array(c(0,0),c(1,2)) m=1000 fit=indepmetrop(logctablepost,proposal,start,m,data)
Students at a major university are categorized with respect to their high school rank and their ACT score. For each combination of high school rank and ACT score, one records the mean grade point average (GPA).
iowagpa
iowagpa
A data frame with 40 observations on the following 4 variables.
mean grade point average
sample size
high school rank
act score
Albert, J. (1994), A Bayesian approach to estimation of GPA's of University of Iowa freshmen under order restrictions, Journal of Educational Statistics, 19, 1-22.
Batting data for the baseball player Derek Jeter for all 154 games in the 2004 season.
jeter2004
jeter2004
A data frame with 154 observations on the following 10 variables.
the game number
the number of at-bats
the number of runs scored
the number of hits
the number of doubles
the number of triples
the number of home runs
the number of runs batted in
the number of walks
the number of strikeouts
Collected from game log data from www.retrosheet.org.
For a general posterior density, computes the posterior mode, the associated variance-covariance matrix, and an estimate at the logarithm at the normalizing constant.
laplace(logpost,mode,stuff)
laplace(logpost,mode,stuff)
logpost |
function that defines the logarithm of the posterior density |
mode |
vector that is a guess at the posterior mode |
stuff |
data and parameters associated with the function logpost |
mode |
current estimate at the posterior mode |
var |
current estimate at the associated variance-covariance matrix |
int |
estimate at the logarithm of the normalizing constant |
converge |
indication (TRUE or FALSE) if the algorithm converged |
stuff |
data and parameters associated with the function logpost |
logpost |
function that defines the log posterior |
Jim Albert
logpost=function(theta,data) { s=5 sum(-log(1+(data-theta)^2/s^2)) } data=c(10,12,14,13,12,15) start=10 laplace(logpost,start,data)
logpost=function(theta,data) { s=5 sum(-log(1+(data-theta)^2/s^2)) } data=c(10,12,14,13,12,15) start=10 laplace(logpost,start,data)
Computes the logarithm of a bivariate normal density
lbinorm(xy,par)
lbinorm(xy,par)
xy |
vector of values of two variables x and y |
par |
list with components m, a vector of means, and v, a variance-covariance matrix |
value of the kernel of the log density
Jim Albert
mean=c(0,0) varcov=diag(c(1,1)) value=c(1,1) param=list(m=mean,v=varcov) lbinorm(value,param)
mean=c(0,0) varcov=diag(c(1,1)) value=c(1,1) param=list(m=mean,v=varcov) lbinorm(value,param)
Computes the log posterior density for the difference and sum of logits in a 2x2 contingency table for independent binomial samples and uniform prior placed on the logits
logctablepost(theta,data)
logctablepost(theta,data)
theta |
vector of parameter values "difference of logits" and "sum of logits") |
data |
vector containing number of successes and failures for first sample, and then second sample |
value of the log posterior
Jim Albert
s1=6; f1=2; s2=3; f2=10 data=c(s1,f1,s2,f2) theta=c(2,4) logctablepost(theta,data)
s1=6; f1=2; s2=3; f2=10 data=c(s1,f1,s2,f2) theta=c(2,4) logctablepost(theta,data)
Computes the log posterior density of (beta0, beta1) when yi are independent binomial(ni, pi) and logit(pi)=beta0+beta1*xi and a uniform prior is placed on (beta0, beta1)
logisticpost(beta,data)
logisticpost(beta,data)
beta |
vector of parameter values beta0 and beta1 |
data |
matrix of columns of covariate values x, sample sizes n, and number of successes y |
value of the log posterior
Jim Albert
x = c(-0.86,-0.3,-0.05,0.73) n = c(5,5,5,5) y = c(0,1,3,5) data = cbind(x, n, y) beta=c(2,10) logisticpost(beta,data)
x = c(-0.86,-0.3,-0.05,0.73) n = c(5,5,5,5) y = c(0,1,3,5) data = cbind(x, n, y) beta=c(2,10) logisticpost(beta,data)
Computes the logarithm of the posterior density of a Poisson log mean with a gamma prior
logpoissgamma(theta,datapar)
logpoissgamma(theta,datapar)
theta |
vector of values of the log mean parameter |
datapar |
list with components data, vector of observations, and par, vector of parameters of the gamma prior |
vector of values of the log posterior for all values in theta
Jim Albert
data=c(2,4,3,6,1,0,4,3,10,2) par=c(1,1) datapar=list(data=data,par=par) theta=c(-1,0,1,2) logpoissgamma(theta,datapar)
data=c(2,4,3,6,1,0,4,3,10,2) par=c(1,1) datapar=list(data=data,par=par) theta=c(-1,0,1,2) logpoissgamma(theta,datapar)
Computes the logarithm of the posterior density of a Poisson log mean with a normal prior
logpoissnormal(theta,datapar)
logpoissnormal(theta,datapar)
theta |
vector of values of the log mean parameter |
datapar |
list with components data, vector of observations, and par, vector of parameters of the normal prior |
vector of values of the log posterior for all values in theta
Jim Albert
data=c(2,4,3,6,1,0,4,3,10,2) par=c(0,1) datapar=list(data=data,par=par) theta=c(-1,0,1,2) logpoissnormal(theta,datapar)
data=c(2,4,3,6,1,0,4,3,10,2) par=c(0,1) datapar=list(data=data,par=par) theta=c(-1,0,1,2) logpoissnormal(theta,datapar)
Running times in minutes for twenty male runners between the ages 20 and 29 who ran the New York Marathon.
marathontimes
marathontimes
A data frame with 20 observations on the following 1 variable.
running time
www.nycmarathon.org website.
Computes a Bayesian test of the hypothesis that a normal mean is less than or equal to a specified value
mnormt.onesided(m0,normpar,data)
mnormt.onesided(m0,normpar,data)
m0 |
value of the normal mean to be tested |
normpar |
vector of mean and standard deviation of the normal prior distribution |
data |
vector of sample mean, sample size, and known value of the population standard deviation |
BF |
Bayes factor in support of the null hypothesis |
prior.odds |
prior odds of the null hypothesis |
post.odds |
posterior odds of the null hypothesis |
postH |
posterior probability of the null hypothesis |
Jim Albert
y=c(182,172,173,176,176,180,173,174,179,175) pop.s=3 data=c(mean(y),length(data),pop.s) m0=175 normpar=c(170,1000) mnormt.onesided(m0,normpar,data)
y=c(182,172,173,176,176,180,173,174,179,175) pop.s=3 data=c(mean(y),length(data),pop.s) m0=175 normpar=c(170,1000) mnormt.onesided(m0,normpar,data)
Bayesian test that a normal mean is equal to a specified value using a normal prior
mnormt.twosided(m0, prob, t, data)
mnormt.twosided(m0, prob, t, data)
m0 |
value of the mean to be tested |
prob |
prior probability of the hypothesis |
t |
vector of values of the prior standard deviation under the alternative hypothesis |
data |
vector containing the sample mean, the sample size, and the known value of the population standard deviation |
bf |
vector of values of the Bayes factor in support of the null hypothesis |
post |
vector of posterior probabilities of the null hypothesis |
Jim Albert
m0=170 prob=.5 tau=c(.5,1,2,4,8) samplesize=10 samplemean=176 popsd=3 data=c(samplemean,samplesize,popsd) mnormt.twosided(m0,prob,tau,data)
m0=170 prob=.5 tau=c(.5,1,2,4,8) samplesize=10 samplemean=176 popsd=3 data=c(samplemean,samplesize,popsd) mnormt.twosided(m0,prob,tau,data)
For a general two parameter density, draws a contour graph where the contour lines are drawn at 10 percent, 1 percent, and .1 percent of the height at the mode.
mycontour(logf,limits,data,...)
mycontour(logf,limits,data,...)
logf |
function that defines the logarithm of the density |
limits |
limits (xlo, xhi, ylo, yhi) where the graph is to be drawn |
data |
vector or list of parameters associated with the function logpost |
... |
further arguments to pass to contour |
A contour graph of the density is drawn
Jim Albert
m=array(c(0,0),c(2,1)) v=array(c(1,.6,.6,1),c(2,2)) normpar=list(m=m,v=v) mycontour(lbinorm,c(-4,4,-4,4),normpar)
m=array(c(0,0),c(2,1)) v=array(c(1,.6,.6,1),c(2,2)) normpar=list(m=m,v=v) mycontour(lbinorm,c(-4,4,-4,4),normpar)
Computes the parameters and mixing probabilities for a normal sampling problem, variance known, where the prior is a discrete mixture of normal densities.
normal.normal.mix(probs,normalpar,data)
normal.normal.mix(probs,normalpar,data)
probs |
vector of probabilities of the normal components of the prior |
normalpar |
matrix where each row contains the mean and variance parameters for a normal component of the prior |
data |
vector of observation and sampling variance |
probs |
vector of probabilities of the normal components of the posterior |
normalpar |
matrix where each row contains the mean and variance parameters for a normal component of the posterior |
Jim Albert
probs=c(.5, .5) normal.par1=c(0,1) normal.par2=c(2,.5) normalpar=rbind(normal.par1,normal.par2) y=1; sigma2=.5 data=c(y,sigma2) normal.normal.mix(probs,normalpar,data)
probs=c(.5, .5) normal.par1=c(0,1) normal.par2=c(2,.5) normalpar=rbind(normal.par1,normal.par2) y=1; sigma2=.5 data=c(y,sigma2) normal.normal.mix(probs,normalpar,data)
Finds the mean and standard deviation of a normal density that matches knowledge of two quantiles of the distribution.
normal.select(quantile1, quantile2)
normal.select(quantile1, quantile2)
quantile1 |
list with components p, the value of the first probability, and x, the value of the first quantile |
quantile2 |
list with components p, the value of the second probability, and x, the value of the second quantile |
mean |
mean of the matching normal distribution |
sigma |
standard deviation of the matching normal distribution |
Jim Albert
# person believes the 15th percentile of the prior is 100 # and the 70th percentile of the prior is 150 quantile1=list(p=.15,x=100) quantile2=list(p=.7,x=150) normal.select(quantile1,quantile2)
# person believes the 15th percentile of the prior is 100 # and the 70th percentile of the prior is 150 quantile1=list(p=.15,x=100) quantile2=list(p=.7,x=150) normal.select(quantile1,quantile2)
Computes the log of the posterior density of a mean M and a variance S2 when a sample is taken from a normal density and a standard noninformative prior is used.
normchi2post(theta,data)
normchi2post(theta,data)
theta |
vector of parameter values M and S2 |
data |
vector containing the sample observations |
value of the log posterior
Jim Albert
parameter=c(25,5) data=c(20, 32, 21, 43, 33, 21, 32) normchi2post(parameter,data)
parameter=c(25,5) data=c(20, 32, 21, 43, 33, 21, 32) normchi2post(parameter,data)
Computes the log posterior density of mean and log standard deviation for a Normal/Normal exchangeable model where (mean, log sd) is given a uniform prior.
normnormexch(theta,data)
normnormexch(theta,data)
theta |
vector of parameter values of mu and log tau |
data |
a matrix with columns y (observations) and v (sampling variances) |
value of the log posterior
Jim Albert
s.var <- c(0.05, 0.05, 0.05, 0.05, 0.05) y.means <- c(1, 4, 3, 6,10) data=cbind(y.means, s.var) theta=c(-1, 0) normnormexch(theta,data)
s.var <- c(0.05, 0.05, 0.05, 0.05, 0.05) y.means <- c(1, 4, 3, 6,10) data=cbind(y.means, s.var) theta=c(-1, 0) normnormexch(theta,data)
Given simulated draws from the posterior from a normal sampling model, outputs simulated draws from the posterior predictive distribution of a statistic of interest.
normpostpred(parameters,sample.size,f=min)
normpostpred(parameters,sample.size,f=min)
parameters |
list of simulated draws from the posterior where mu contains the normal mean and sigma2 contains the normal variance |
sample.size |
size of sample of future sample |
f |
function defining the statistic |
simulated sample of the posterior predictive distribution of the statistic
Jim Albert
# finds posterior predictive distribution of the min statistic of a future sample of size 15 data(darwin) s=normpostsim(darwin$difference) sample.size=15 sim.stats=normpostpred(s,sample.size,min)
# finds posterior predictive distribution of the min statistic of a future sample of size 15 data(darwin) s=normpostsim(darwin$difference) sample.size=15 sim.stats=normpostpred(s,sample.size,min)
Gives a simulated sample from the joint posterior distribution of the mean and variance for a normal sampling prior with a noninformative or informative prior. The prior assumes mu and sigma2 are independent with mu assigned a normal prior with mean mu0 and variance tau2, and sigma2 is assigned a inverse gamma prior with parameters a and b.
normpostsim(data,prior=NULL,m=1000)
normpostsim(data,prior=NULL,m=1000)
data |
vector of observations |
prior |
list with components mu, a vector with the prior mean and variance, and sigma2, a vector of the inverse gamma parameters |
m |
number of simulations desired |
mu |
vector of simulated draws of normal mean |
sigma2 |
vector of simulated draws of normal variance |
Jim Albert
data(darwin) s=normpostsim(darwin$difference)
data(darwin) s=normpostsim(darwin$difference)
Implements Gibbs sampling for estimating a two-way table of means under a order restriction.
ordergibbs(data,m)
ordergibbs(data,m)
data |
data matrix with first two columns observed sample means and sample sizes |
m |
number of cycles of Gibbs sampling |
matrix of simulated draws of the normal means where each row represents one simulated draw
Jim Albert
data(iowagpa) m=1000 s=ordergibbs(iowagpa,m)
data(iowagpa) m=1000 s=ordergibbs(iowagpa,m)
Computes predictive distribution for number of successes of future binomial experiment with a beta prior distribution for the proportion.
pbetap(ab, n, s)
pbetap(ab, n, s)
ab |
vector of parameters of the beta prior |
n |
size of future binomial sample |
s |
vector of number of successes for future binomial experiment |
vector of predictive probabilities for the values in the vector s
Jim Albert
ab=c(3,12) n=10 s=0:10 pbetap(ab,n,s)
ab=c(3,12) n=10 s=0:10 pbetap(ab,n,s)
Bayesian test that a proportion is equal to a specified value using a beta prior
pbetat(p0,prob,ab,data)
pbetat(p0,prob,ab,data)
p0 |
value of the proportion to be tested |
prob |
prior probability of the hypothesis |
ab |
vector of parameter values of the beta prior under the alternative hypothesis |
data |
vector containing the number of successes and number of failures |
bf |
the Bayes factor in support of the null hypothesis |
post |
the posterior probability of the null hypothesis |
Jim Albert
p0=.5 prob=.5 ab=c(10,10) data=c(5,15) pbetat(p0,prob,ab,data)
p0=.5 prob=.5 ab=c(10,10) data=c(5,15) pbetat(p0,prob,ab,data)
Computes the posterior distribution for a proportion for a discrete prior distribution.
pdisc(p, prior, data)
pdisc(p, prior, data)
p |
vector of proportion values |
prior |
vector of prior probabilities |
data |
vector consisting of number of successes and number of failures |
vector of posterior probabilities
Jim Albert
p=c(.2,.25,.3,.35) prior=c(.25,.25,.25,.25) data=c(5,10) pdisc(p,prior,data)
p=c(.2,.25,.3,.35) prior=c(.25,.25,.25,.25) data=c(5,10) pdisc(p,prior,data)
Computes predictive distribution for number of successes of future binomial experiment with a discrete distribution for the proportion.
pdiscp(p, probs, n, s)
pdiscp(p, probs, n, s)
p |
vector of proportion values |
probs |
vector of probabilities |
n |
size of future binomial sample |
s |
vector of number of successes for future binomial experiment |
vector of predictive probabilities for the values in the vector s
Jim Albert
p=c(.1,.2,.3,.4,.5,.6,.7,.8,.9) prob=c(0.05,0.10,0.10,0.15,0.20,0.15,0.10,0.10,0.05) n=10 s=0:10 pdiscp(p,prob,n,s)
p=c(.1,.2,.3,.4,.5,.6,.7,.8,.9) prob=c(0.05,0.10,0.10,0.15,0.20,0.15,0.10,0.10,0.05) n=10 s=0:10 pdiscp(p,prob,n,s)
plot.posterior constructs a plot of the posterior density of one or two variables. For 2-parameter problems, a contour plot is produced.
## S3 method for class 'posterior' plot(x, ...)
## S3 method for class 'posterior' plot(x, ...)
x |
posterior object |
... |
optional arguments; use exact=TRUE to plot the exact posterior density and the scale argument indicates the range of the axes of the plot |
Jim Albert
data <- c(2,4,3,6,1,0,4,3,10,2) par <- c(0,1) datapar <- list(data=data,par=par) fit <- laplace(logpoissnormal, 1, datapar) plot(fit, exact=TRUE, scale=4)
data <- c(2,4,3,6,1,0,4,3,10,2) par <- c(0,1) datapar <- list(data=data,par=par) fit <- laplace(logpoissnormal, 1, datapar) plot(fit, exact=TRUE, scale=4)
Computes the log posterior density of log alpha and log mu for a Poisson/gamma exchangeable model
poissgamexch(theta,datapar)
poissgamexch(theta,datapar)
theta |
vector of parameter values of log alpha and log mu |
datapar |
list with components data, a matrix with columns e and y, and z0, prior hyperparameter |
value of the log posterior
Jim Albert
e=c(532,584,672,722,904) y=c(0,0,2,1,1) data=cbind(e,y) theta=c(-4,0) z0=.5 datapar=list(data=data,z0=z0) poissgamexch(theta,datapar)
e=c(532,584,672,722,904) y=c(0,0,2,1,1) data=cbind(e,y) theta=c(-4,0) z0=.5 datapar=list(data=data,z0=z0) poissgamexch(theta,datapar)
Computes the parameters and mixing probabilities for a Poisson sampling problem where the prior is a discrete mixture of gamma densities.
poisson.gamma.mix(probs,gammapar,data)
poisson.gamma.mix(probs,gammapar,data)
probs |
vector of probabilities of the gamma components of the prior |
gammapar |
matrix where each row contains the shape and rate parameters for a gamma component of the prior |
data |
list with components y, vector of counts, and t, vector of time intervals |
probs |
vector of probabilities of the gamma components of the posterior |
gammapar |
matrix where each row contains the shape and rate parameters for a gamma component of the posterior |
Jim Albert
probs=c(.5, .5) gamma.par1=c(1,1) gamma.par2=c(10,2) gammapar=rbind(gamma.par1,gamma.par2) y=c(1,3,2,4,10); t=c(1,1,1,1,1) data=list(y=y,t=t) poisson.gamma.mix(probs,gammapar,data)
probs=c(.5, .5) gamma.par1=c(1,1) gamma.par2=c(10,2) gammapar=rbind(gamma.par1,gamma.par2) y=c(1,3,2,4,10); t=c(1,1,1,1,1) data=list(y=y,t=t) poisson.gamma.mix(probs,gammapar,data)
For a proportion problem with a beta prior, plots the prior predictive distribution of the number of successes in n trials and displays the observed number of successes.
predplot(prior,n,yobs)
predplot(prior,n,yobs)
prior |
vector of parameters for beta prior |
n |
sample size |
yobs |
observed number of successes |
Jim Albert
prior=c(3,10) # proportion has a beta(3, 10) prior n=20 # sample size yobs=10 # observed number of successes predplot(prior,n,yobs)
prior=c(3,10) # proportion has a beta(3, 10) prior n=20 # sample size yobs=10 # observed number of successes predplot(prior,n,yobs)
Constructs a discrete uniform prior distribution for two parameters
prior.two.parameters(parameter1, parameter2)
prior.two.parameters(parameter1, parameter2)
parameter1 |
vector of values of first parameter |
parameter2 |
vector of values of second parameter |
matrix of uniform probabilities where the rows and columns are labelled with the parameter values
Jim Albert
prior.two.parameters(c(1,2,3,4),c(2,4,7))
prior.two.parameters(c(1,2,3,4),c(2,4,7))
Measurements on breedings of the common puffin on different habits at Great Island, Newfoundland.
puffin
puffin
A data frame with 38 observations on the following 5 variables.
nesting frequency (burrows per 9 square meters)
grass cover (percentage)
mean soil depth (in centimeters)
angle of slope (in degrees)
distance from cliff edge (in meters)
Peck, R., Devore, J., and Olsen, C. (2005), Introduction to Statistics And Data Analysis, Thomson Learning.
Simulates a sample from a Dirichlet distribution
rdirichlet(n,par)
rdirichlet(n,par)
n |
number of simulations required |
par |
vector of parameters of the Dirichlet distribution |
matrix of simulated draws where each row corresponds to a single draw
Jim Albert
par=c(2,5,4,10) n=10 rdirichlet(n,par)
par=c(2,5,4,10) n=10 rdirichlet(n,par)
Computes the log posterior of (beta, log sigma) for a normal regression model with a g prior with parameters beta0 and c0.
reg.gprior.post(theta, dataprior)
reg.gprior.post(theta, dataprior)
theta |
vector of components of beta and log sigma |
dataprior |
list with components data and prior; data is a list with components y and X, prior is a list with components b0 and c0 |
value of the log posterior
Jim Albert
data(puffin) data=list(y=puffin$Nest, X=cbind(1,puffin$Distance)) prior=list(b0=c(0,0), c0=10) reg.gprior.post(c(20,-.5,1),list(data=data,prior=prior))
data(puffin) data=list(y=puffin$Nest, X=cbind(1,puffin$Distance)) prior=list(b0=c(0,0), c0=10) reg.gprior.post(c(20,-.5,1),list(data=data,prior=prior))
Collapses a matrix by summing over a specific number of rows
regroup(data,g)
regroup(data,g)
data |
a matrix |
g |
a positive integer beween 1 and the number of rows of data |
reduced matrix found by summing over rows
Jim Albert
data=matrix(c(1:20),nrow=4,ncol=5) g=2 regroup(data,2)
data=matrix(c(1:20),nrow=4,ncol=5) g=2 regroup(data,2)
Implements a rejection sampling algorithm for a probability density using a multivariate t proposal density
rejectsampling(logf,tpar,dmax,n,data)
rejectsampling(logf,tpar,dmax,n,data)
logf |
function that defines the logarithm of the density of interest |
tpar |
list of parameters of t proposal density including the mean m, scale matrix var, and degrees of freedom df |
dmax |
logarithm of the rejection sampling constant |
n |
number of simulated draws from proposal density |
data |
data and or parameters used in the function logf |
matrix of simulated draws from density of interest
Jim Albert
data(cancermortality) start=c(-7,6) fit=laplace(betabinexch,start,cancermortality) tpar=list(m=fit$mode,var=2*fit$var,df=4) theta=rejectsampling(betabinexch,tpar,-569.2813,1000,cancermortality)
data(cancermortality) start=c(-7,6) fit=laplace(betabinexch,start,cancermortality) tpar=list(m=fit$mode,var=2*fit$var,df=4) theta=rejectsampling(betabinexch,tpar,-569.2813,1000,cancermortality)
Simulates from a inverse gamma (a, b) distribution with density proportional to $y^(-a-1) exp(-b/y)$
rigamma(n, a, b)
rigamma(n, a, b)
n |
number of random numbers to be generated |
a |
inverse gamma shape parameter |
b |
inverse gamma rate parameter |
vector of n simulated draws
Jim Albert
a=10 b=5 n=20 rigamma(n,a,b)
a=10 b=5 n=20 rigamma(n,a,b)
Simulates from a multivariate normal distribution
rmnorm(n = 1, mean = rep(0, d), varcov)
rmnorm(n = 1, mean = rep(0, d), varcov)
n |
number of random numbers to be generated |
mean |
numeric vector giving the mean of the distribution |
varcov |
a positive definite matrix representing the variance-covariance matrix of the distribution |
matrix of n rows of random vectors
Jim Albert
mu <- c(1,12,2) Sigma <- matrix(c(1,2,0,2,5,0.5,0,0.5,3), 3, 3) x <- rmnorm(10, mu, Sigma)
mu <- c(1,12,2) Sigma <- matrix(c(1,2,0,2,5,0.5,0,0.5,3), 3, 3) x <- rmnorm(10, mu, Sigma)
Simulates from a multivariate t distribution
rmt(n = 1, mean = rep(0, d), S, df = Inf)
rmt(n = 1, mean = rep(0, d), S, df = Inf)
n |
number of random numbers to be generated |
mean |
numeric vector giving the location parameter of the distribution |
S |
a positive definite matrix representing the scale matrix of the distribution |
df |
degrees of freedom |
matrix of n rows of random vectors
Jim Albert
mu <- c(1,12,2) Sigma <- matrix(c(1,2,0,2,5,0.5,0,0.5,3), 3, 3) df <- 4 x <- rmt(10, mu, Sigma, df)
mu <- c(1,12,2) Sigma <- matrix(c(1,2,0,2,5,0.5,0,0.5,3), 3, 3) df <- 4 x <- rmt(10, mu, Sigma, df)
Implements Gibbs sampling for a robust t sampling model with location mu, scale sigma, and degrees of freedom v
robustt(y,v,m)
robustt(y,v,m)
y |
vector of data values |
v |
degrees of freedom for t model |
m |
the number of cycles of the Gibbs sampler |
mu |
vector of simulated values of mu |
s2 |
vector of simulated values of sigma2 |
lam |
matrix of simulated draws of lambda, where each row corresponds to a single draw |
Jim Albert
data=c(-67,-48,6,8,14,16,23,24,28,29,41,49,67,60,75) fit=robustt(data,4,1000)
data=c(-67,-48,6,8,14,16,23,24,28,29,41,49,67,60,75) fit=robustt(data,4,1000)
Simulates a sample from a truncated distribution where the functions for the cdf and inverse cdf are available.
rtruncated(n,lo,hi,pf,qf,...)
rtruncated(n,lo,hi,pf,qf,...)
n |
size of simulated sample |
lo |
low truncation point |
hi |
high truncation point |
pf |
function containing cdf of untruncated distribution |
qf |
function containing inverse cdf of untruncated distribution |
... |
parameters used in the functions pf and qf |
vector of simulated draws from distribution
Jim Albert
# want a sample of 10 from normal(2, 1) distribution truncated below by 3 n=10 lo=3 hi=Inf rtruncated(n,lo,hi,pnorm,qnorm,mean=2,sd=1) # want a sample of 20 from beta(2, 5) distribution truncated to (.3, .8) n=20 lo=0.3 hi=0.8 rtruncated(n,lo,hi,pbeta,qbeta,2,5)
# want a sample of 10 from normal(2, 1) distribution truncated below by 3 n=10 lo=3 hi=Inf rtruncated(n,lo,hi,pnorm,qnorm,mean=2,sd=1) # want a sample of 20 from beta(2, 5) distribution truncated to (.3, .8) n=20 lo=0.3 hi=0.8 rtruncated(n,lo,hi,pbeta,qbeta,2,5)
Simulates iterates of a random walk Metropolis chain for an arbitrary real-valued posterior density defined by the user
rwmetrop(logpost,proposal,start,m,stuff)
rwmetrop(logpost,proposal,start,m,stuff)
logpost |
function defining the log posterior density |
proposal |
a list containing var, an estimated variance-covariance matrix, and scale, the Metropolis scale factor |
start |
vector containing the starting value of the parameter |
m |
the number of iterations of the chain |
stuff |
data and prior info that is used in the function logpost |
par |
a matrix of simulated values where each row corresponds to a value of the vector parameter |
accept |
the acceptance rate of the algorithm |
Jim Albert
data=c(6,2,3,10) varcov=diag(c(1,1)) proposal=list(var=varcov,scale=2) start=array(c(1,1),c(1,2)) m=1000 s=rwmetrop(logctablepost,proposal,start,m,data)
data=c(6,2,3,10) varcov=diag(c(1,1)) proposal=list(var=varcov,scale=2) start=array(c(1,1),c(1,2)) m=1000 s=rwmetrop(logctablepost,proposal,start,m,data)
Batting statistics for the baseball player Mike Schmidt during all the seasons of his career.
schmidt
schmidt
A data frame with 18 observations on the following 14 variables.
year of the season
Schmidt's age that season
games played
at-bats
runs scored
number of hits
number of doubles
number of triples
number of home runs
number of runs batted in
number of stolen bases
number of times caught stealing
number of walks
number of strikeouts
Sean Lahman's baseball database from www.baseball1.com.
For a general two parameter density defined on a grid, simulates a random sample.
simcontour(logf,limits,data,m)
simcontour(logf,limits,data,m)
logf |
function that defines the logarithm of the density |
limits |
limits (xlo, xhi, ylo, yhi) that cover the joint probability density |
data |
vector or list of parameters associated with the function logpost |
m |
size of simulated sample |
x |
vector of simulated draws of the first parameter |
y |
vector of simulated draws of the second parameter |
Jim Albert
m=array(c(0,0),c(2,1)) v=array(c(1,.6,.6,1),c(2,2)) normpar=list(m=m,v=v) s=simcontour(lbinorm,c(-4,4,-4,4),normpar,1000) plot(s$x,s$y)
m=array(c(0,0),c(2,1)) v=array(c(1,.6,.6,1),c(2,2)) normpar=list(m=m,v=v) s=simcontour(lbinorm,c(-4,4,-4,4),normpar,1000) plot(s$x,s$y)
simulate.posterior will simulate draws from a posterior density.
## S3 method for class 'posterior' simulate(object, nsim=1000, seed=NULL, ...)
## S3 method for class 'posterior' simulate(object, nsim=1000, seed=NULL, ...)
object |
posterior object |
nsim |
number of simulate iterates |
seed |
starting seed of random number generator |
... |
optional arguments; use exact=TRUE to simulate from the exact posterior density and the scale argument indicates the scale value for the Metropolis random walk algorithm |
sample |
matrix of simulated draws of posterior |
arate |
acceptance rate if the Metropolis random walk algorithm is used |
Jim Albert
data <- c(2,4,3,6,1,0,4,3,10,2) par <- c(0,1) datapar <- list(data=data,par=par) fit <- laplace(logpoissnormal, 1, datapar) simulate(fit, exact=TRUE, scale=4)
data <- c(2,4,3,6,1,0,4,3,10,2) par <- c(0,1) datapar <- list(data=data,par=par) fit <- laplace(logpoissnormal, 1, datapar) simulate(fit, exact=TRUE, scale=4)
Implements sampling importance resampling for a multivariate t proposal density.
sir(logf,tpar,n,data)
sir(logf,tpar,n,data)
logf |
function defining logarithm of density of interest |
tpar |
list of parameters of multivariate t proposal density including the mean m, the scale matrix var, and the degrees of freedom df |
n |
number of simulated draws from the posterior |
data |
data and parameters used in the function logf |
matrix of simulated draws from the posterior where each row corresponds to a single draw
Jim Albert
data(cancermortality) start=c(-7,6) fit=laplace(betabinexch,start,cancermortality) tpar=list(m=fit$mode,var=2*fit$var,df=4) theta=sir(betabinexch,tpar,1000,cancermortality)
data(cancermortality) start=c(-7,6) fit=laplace(betabinexch,start,cancermortality) tpar=list(m=fit$mode,var=2*fit$var,df=4) theta=sir(betabinexch,tpar,1000,cancermortality)
Career hitting statistics for ten great baseball players
sluggerdata
sluggerdata
A data frame with 199 observations on the following 13 variables.
names of the ballplayer
season played
age of the player during the season
games played
number of at-bats
number of runs scored
number of hits
number of doubles
number of triples
number of home runs
runs batted in
number of base on balls
number of strikeouts
Sean Lahman's baseball database from www.baseball1.com.
Number of goals scored by a single professional soccer team during the 2006 Major League Soccer season
soccergoals
soccergoals
A data frame with 35 observations on the following 1 variable.
number of goals scored
Collected by author from the www.espn.com website.
Heart transplant data for 82 patients from Stanford Heart Transplanation Program
stanfordheart
stanfordheart
A data frame with 82 observations on the following 4 variables.
survival time in months
variable that is 1 or 0 if patient had transplant or not
time a transplant patient waits for operation
variable that is 1 or 0 if time is censored or not
Turnbull, B., Brown, B. and Hu, M. (1974), Survivorship analysis of heart transplant data, Journal of the American Statistical Association, 69, 74-80.
For all professional baseball players in the 2004 season, dataset gives the number of strikeouts and at-bats when runners are in scoring position and when runners are not in scoring position.
strikeout
strikeout
A data frame with 438 observations on the following 4 variables.
number of strikeouts of player when runners are not in scoring position
number of at-bats of player when runners are not in scoring position
number of strikeouts of player when runners are in scoring position
number of at-bats of player when runners are in scoring position
Collected from www.espn.com website.
Answers to a sheet of questions given to a large number of students in introductory statistics classes
studentdata
studentdata
A data frame with 657 observations on the following 11 variables.
student number
height in inches
gender
number of pairs of shoes owned
number chosen between 1 and 10
name of movie dvds owned
time the person went to sleep the previous night (hours past midnight)
time the person woke up the next morning
cost of last haircut including tip
number of hours working on a job per week
usual drink at suppertime among milk, water, and pop
Collected by the author during the Fall 2006 semester.
summary.posterior will display univariate summaries from a posterior density.
## S3 method for class 'posterior' summary(object, ...)
## S3 method for class 'posterior' summary(object, ...)
object |
posterior object |
... |
any other arguments needed |
Jim Albert
data <- c(2,4,3,6,1,0,4,3,10,2) par <- c(0,1) datapar <- list(data=data,par=par) fit <- laplace(logpoissnormal, 1, datapar) summary(fit)
data <- c(2,4,3,6,1,0,4,3,10,2) par <- c(0,1) datapar <- list(data=data,par=par) fit <- laplace(logpoissnormal, 1, datapar) summary(fit)
Computes the log posterior density of (log tau, log lambda, log p) for a Pareto model for survival data
transplantpost(theta,data)
transplantpost(theta,data)
theta |
vector of parameter values of log tau, log lambda, and log p |
data |
data matrix with columns survival time, transplant indicator, time to transplant, and censoring indicator |
value of the log posterior
Jim Albert
data(stanfordheart) theta=c(0,3,-1) transplantpost(theta,stanfordheart)
data(stanfordheart) theta=c(0,3,-1) transplantpost(theta,stanfordheart)
For a proportion problem with a beta prior, plots the prior, likelihood and posterior on one graph.
triplot(prior,data,where="topright")
triplot(prior,data,where="topright")
prior |
vector of parameters for beta prior |
data |
vector consisting of number of successes and number of failures |
where |
the location of the legend for the plot |
Jim Albert
prior=c(3,10) # proportion has a beta(3, 10) prior data=c(10,6) # observe 10 successes and 6 failures triplot(prior,data)
prior=c(3,10) # proportion has a beta(3, 10) prior data=c(10,6) # observe 10 successes and 6 failures triplot(prior,data)
Computes the log posterior density of (log sigma, mu, beta) for a Weibull proportional odds regression model
weibullregpost(theta,data)
weibullregpost(theta,data)
theta |
vector of parameter values log sigma, mu, and beta |
data |
data matrix with columns survival time, censoring variable, and covariate matrix |
value of the log posterior
Jim Albert
data(chemotherapy) attach(chemotherapy) d=cbind(time,status,treat-1,age) theta=c(-.6,11,.6,0) weibullregpost(theta,d)
data(chemotherapy) attach(chemotherapy) d=cbind(time,status,treat-1,age) theta=c(-.6,11,.6,0) weibullregpost(theta,d)