Title: | Probability and Bayesian Modeling |
---|---|
Description: | Functions and datasets to accompany J. Albert and J. Hu, "Probability and Bayesian Modeling", CRC Press, (2019, ISBN: 1138492566). |
Authors: | Jim Albert <[email protected]> |
Maintainer: | Jim Albert <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.1 |
Built: | 2025-03-03 05:03:55 UTC |
Source: | https://github.com/bayesball/probbayes |
Ratings for a set of 2010 animation movies
animation_ratings
animation_ratings
A data frame with 55 observations on the following 6 variables.
user ID
movie ID
numerical rating
time when the rating was recorded
name of the movie
numerical ID of movie
MovieLens by GroupLens Research
Arm span and height measurements for a sample of students
arm_height
arm_height
A data frame with 20 observations on the following 2 variables.
length of arm span in cm
height in cm
Sample of college students
Constructs frequency bar plot of a vector of numeric data or a vector of character data
bar_plot(y, ...)
bar_plot(y, ...)
y |
vector of outcomes |
... |
title of the graph |
A ggplot2 object containing the bar graph.
Jim Albert
s <- spinner_data(c(1, 2, 2, 1), nsim=100) bar_plot(s, "Spinner Data") y <- c(rep("a", 10), rep("b", 5), rep("c", 8), rep("d", 4)) bar_plot(y)
s <- spinner_data(c(1, 2, 2, 1), nsim=100) bar_plot(s, "Spinner Data") y <- c(rep("a", 10), rep("b", 5), rep("c", 8), rep("d", 4)) bar_plot(y)
Batting statistics collected for all players during the first month and remainder of 2018 baseball season
batting_2018
batting_2018
A data frame with 549 observations on the following 5 variables.
name of player
number of at bats in first month
number of hits in first month
number of at bats in remainder of season
number of hits in remainder of season
Data collected from Retrosheet.org.
Given a data table with columns Prior and Likelihood, computes posterior probabilities
bayesian_crank(d)
bayesian_crank(d)
d |
data frame with columns Prior and Likelihood |
data frame with new columns Product and Posterior
Jim Albert
df <- data.frame(p=c(.1, .3, .5, .7, .9), Prior=rep(1/5, 5)) y <- 5 n <- 10 df$Likelihood <- dbinom(y, prob=df$p, size=n) df <- bayesian_crank(df)
df <- data.frame(p=c(.1, .3, .5, .7, .9), Prior=rep(1/5, 5)) y <- 5 n <- 10 df$Likelihood <- dbinom(y, prob=df$p, size=n) df <- bayesian_crank(df)
Trend Estimates for 28 Grassland Bird Species
BBS_survey
BBS_survey
A data frame with 28 observations on the following 4 variables.
name of bird species
trend estimate
standard error of estimate
number of observations at site
North American Breeding Bird Survey
Computes and Displays Areas Under a Beta Curve
beta_area(lo, hi, shape_par, Color = "orange")
beta_area(lo, hi, shape_par, Color = "orange")
lo |
lower bound of interval |
hi |
upper bound of interval |
shape_par |
vector of shape parameters of the beta curve |
Color |
color of shading in the graph |
ggplot2 object containing the graphical display.
Jim Albert
lo <- .2 hi <- .4 shape_par <- c(2, 5) beta_area(lo, hi, shape_par)
lo <- .2 hi <- .4 shape_par <- c(2, 5) beta_area(lo, hi, shape_par)
Simulate random data from a beta curve
beta_data(shape_par, nsim=1000)
beta_data(shape_par, nsim=1000)
shape_par |
vector of shape parameters of the beta curve |
nsim |
number of simulations |
A vector of random draws from the beta distribution
Jim Albert
shape_par <- c(12, 8) beta_data(shape_par, 10)
shape_par <- c(12, 8) beta_data(shape_par, 10)
Draw a Beta Curve
beta_draw(shape_pars)
beta_draw(shape_pars)
shape_pars |
vector of shape parameters of the beta curve |
ggplot2 object containing the graphical display.
Jim Albert
shape_pars <- c(2, 5) beta_draw(shape_pars)
shape_pars <- c(2, 5) beta_draw(shape_pars)
Computes Probability Interval for a Beta Curve
beta_interval(prob, shape_par, Color = "orange")
beta_interval(prob, shape_par, Color = "orange")
prob |
value of coverage probability |
shape_par |
vector of shape parameters of the beta curve |
Color |
color of shading in the graph |
ggplot2 object containing the graphical display.
Jim Albert
shape_par <- c(2, 5) beta_interval(.5, shape_par)
shape_par <- c(2, 5) beta_interval(.5, shape_par)
Plot of Prior and Posterior Beta Curves
beta_prior_post(prior_shapes, post_shapes)
beta_prior_post(prior_shapes, post_shapes)
prior_shapes |
vector of shape parameters of the beta prior |
post_shapes |
vector of shape parameters of the beta posterior |
ggplot2 object containing the graphical display.
Jim Albert
prior_shapes <- c(4, 6) post_shapes <- c(19, 16) beta_prior_post(prior_shapes, post_shapes)
prior_shapes <- c(4, 6) post_shapes <- c(19, 16) beta_prior_post(prior_shapes, post_shapes)
Displays a Quantile of a Beta Curve
beta_quantile(prob, shape_par, Color = "orange")
beta_quantile(prob, shape_par, Color = "orange")
prob |
probability value of interest |
shape_par |
vector of shape parameters of the beta curve |
Color |
color of shading in the graph |
ggplot2 object containing the graphical display.
Jim Albert
# find the .50 quantile (the median) prob <- 0.5 shape_par <- c(2, 5) beta_quantile(prob, shape_par) # find the .90 quantile (90th percentile) prob <- 0.9 beta_quantile(prob, shape_par)
# find the .50 quantile (the median) prob <- 0.5 shape_par <- c(2, 5) beta_quantile(prob, shape_par) # find the .90 quantile (90th percentile) prob <- 0.9 beta_quantile(prob, shape_par)
Text statistics for a collection of books sold at Amazon.com
book_stats
book_stats
A data frame with 21 observations on the following 3 variables.
name of book
percentage of words in the book with three or more syllables
number of years of formal education required to read and understand a passage of text
Data collected from Amazon.com website.
Total snowfall in inches for 20 Januarys in Buffalo, New York
buffalo_jan
buffalo_jan
A data frame with 20 observations on the following 2 variables.
Season
inches of total snowfall
National Weather Service, www.weather.gov
Season on-base statistics for collection of MLB baseball players who were born in 1978
career_1978
career_1978
A data frame with 399 observations on the following 6 variables.
last name of player
id of player
age of player
deviation of age from 30
number of plate appearances
number of on-base events
Data collected from Lahman database.
Centers and increases font size of a ggplot2 graphic title
centertitle(Color = "blue")
centertitle(Color = "blue")
Color |
color of the text in the ggplot2 title |
ggplot2 theme code to center the title
Jim Albert
df <- data.frame(p=c(.1, .3, .5, .7, .9), Prior=rep(1/5, 5)) ggplot(df, aes(p, Prior)) + geom_point() + ggtitle("My Prior") + centertitle()
df <- data.frame(p=c(.1, .3, .5, .7, .9), Prior=rep(1/5, 5)) ggplot(df, aes(p, Prior)) + geom_point() + ggtitle("My Prior") + centertitle()
Expeditures of U.S. Households
CEsample
CEsample
A data frame with 1000 observations on the following 3 variables.
urban/rural status of CU - 1 = urban and 2 = rural
amount of CU income before taxes in the last 12 months
CU's total expenditure in the last quarter
U.S. Bureau of Labor Statistics
Interactively choose beta curve by selecting the .5 and .9 quantiles
ChooseBeta()
ChooseBeta()
None
Jim Albert
Variables on a sample of personal computers
ComputerPriceSample
ComputerPriceSample
A data frame with 500 observations on the following 5 variables.
sales price
clock speed in MHz
size of hard drive in MB
size of Ram in MB
premium status of manufacturer
Unknown
Data from study to learn about personality determinants of volunteering
Cowles
Cowles
A data frame with 1421 observations on the following 5 variables.
subject number
measurement of neuroticism
measurement of extraversion
male or female
no or yes
Unknown.
Reported deaths from heart attack for hospitals in New York City
DeathHeartAttackDataNYCfull
DeathHeartAttackDataNYCfull
A data frame with 45 observations on the following 5 variables.
name of hospital
borough in New York City
type of hospital
number of heart attach cases
number of deaths
New York State Department of Health
Reported deaths from heart attack for hospitals in Manhattan in New York City
DeathHeartAttackManhattan
DeathHeartAttackManhattan
A data frame with 13 observations on the following 4 variables.
name of hospital
type of hospital
number of heart attach cases
number of deaths
New York State Department of Health
Constructs a graph of the probability distribution of two proportions
draw_two_p(prob_matrix, ...)
draw_two_p(prob_matrix, ...)
prob_matrix |
matrix of probabilities of two proportions with the rows and columns labeled by the values |
... |
other arguments such as the title of the plot |
ggplot2 object containing the graphical display.
Jim Albert
prob_matrix <- testing_prior() draw_two_p(prob_matrix, title="Testing Prior")
prob_matrix <- testing_prior() draw_two_p(prob_matrix, title="Testing Prior")
Hypergeometric sampling density
dsampling(sample_b, pop_N, pop_B, sample_n)
dsampling(sample_b, pop_N, pop_B, sample_n)
sample_b |
number of black balls in sample |
pop_N |
number of balls in population |
pop_B |
number of black balls in population |
sample_n |
number of balls in sample |
Value of hypergeometric sampling probability
Jim Albert
pop_N <- 10 pop_B <- 4 sample_n <- 3 sample_b <- 2 dsampling(sample_b, pop_N, pop_B, sample_n)
pop_N <- 10 pop_B <- 4 sample_n <- 3 sample_b <- 2 dsampling(sample_b, pop_N, pop_B, sample_n)
Computes likelihoods for spinner outcomes
dspinner(x, Prob)
dspinner(x, Prob)
x |
vector of spinner observations |
Prob |
matrix of spinner probabilities where each row corresponds to a different spinner |
column vector consisting of the likelihoods for the different spinners
Jim Albert
Prob <- matrix(c(.25, .25, .25, .25, .50, .125, .125, .5, .25, .5, .25, 0), 3, 4, byrow=TRUE) x <- c(1, 2, 1, 3, 4) dspinner(x, Prob)
Prob <- matrix(c(.25, .25, .25, .25, .50, .125, .125, .5, .25, .5, .25, 0), 3, 4, byrow=TRUE) x <- c(1, 2, 1, 3, 4) dspinner(x, Prob)
Electricity bills collected for all months for five years
electricbills
electricbills
A data frame with 62 observations on the following 3 variables.
year
number of month
electicity bill in dollars
Data collected for one household in Ohio
Frequency use of words for Federalist Papers written by either Alexander Hamilton or James Madison
federalist_word_study
federalist_word_study
A data frame with 56853 observations on the following 7 variables.
name of Federalist paper
total number of words
word that is counted
frequency of the word
fraction of words with that word
author of paper
is authorship disputed?
http://www.gutenberg.org/ebooks/18
Measurements of time to serve for 20 serves of the tennis player Roger Federer
federer_time_to_serve
federer_time_to_serve
A data frame with 20 observations on the following one variable.
time to serve in seconds
https://github.com/JeffSackmann
The number of fire calls and building fires for ten zip codes in Montgomery County, Pennsylvania
fire_calls
fire_calls
A data frame with 10 observations on the following 3 variables.
zip code
number of fire calls
number of building fires
kaggle.com
Field goal attempt data for three seasons of professional football
football_field_goals
football_field_goals
A data frame with 3025 observations on the following 5 variables.
name of team
football season
last name of kicker
distance in feet of attempt
attempt was successful (1) or not (0)
Data collected by Michael Lopez.
Measurements of average temperature and natural gas bill for each month in 2017
gas2017
gas2017
A data frame with 12 observations on the following 3 variables.
abbreviation of month
average temperature
natural gas bill in dollars
Personal data collected by a homeowner in Ohio
Implements Gibbs sampling of the beta-binomial distribution
gibbs_betabin(n, a, b, p = 0.5, iter = 1000)
gibbs_betabin(n, a, b, p = 0.5, iter = 1000)
n |
binomial sample size |
a |
first beta shape parameter |
b |
second beta shape parameter |
p |
starting value of proportion in algorithm |
iter |
number of iterations |
matrix of simulated draws from the algorithm
Jim Albert
sp <- gibbs_betabin(20, 5, 5, 100)
sp <- gibbs_betabin(20, 5, 5, 100)
Implements Gibbs sampling for an arbitrary bivariate discrete distribution
gibbs_discrete(p, i = 1, iter = 1000)
gibbs_discrete(p, i = 1, iter = 1000)
p |
matrix defining the probabiity distribution |
i |
starting row of the matrix |
iter |
number of cycles of algorithm |
matrix of simulated draws from algorithm
Jim Albert
p <- matrix(c(4, 3, 2, 1, 3, 4, 3, 2, 2, 3, 4, 3, 1, 2, 3, 4) / 40, 4, 4, byrow = TRUE) out <- gibbs_discrete(p, 1, 100)
p <- matrix(c(4, 3, 2, 1, 3, 4, 3, 2, 2, 3, 4, 3, 1, 2, 3, 4) / 40, 4, 4, byrow = TRUE) out <- gibbs_discrete(p, 1, 100)
Implements Gibbs sampling for normal sampling with independent priors on the mean and precision
gibbs_normal(s, P = 0.002, iter = 1000)
gibbs_normal(s, P = 0.002, iter = 1000)
s |
a list with components y, the observed data, mu0, the prior mean of mu, sigma0, the prior standard deviation of mu, a, the shape parameter of the gamma prior on P, b, the rate parameter of the gamma prior on P |
P |
starting value of the precision parameter |
iter |
number of iterations |
matrix of simulated draws of (mu, P) from the algorithm
Jim Albert
s <- list(y = rnorm(20, 5, 2), mu0 = 10, sigma0 = 3, a = 1, b = 1) out <- gibbs_normal(s, P = 0.01, iter=100)
s <- list(y = rnorm(20, 5, 2), mu0 = 10, sigma0 = 3, a = 1, b = 1) out <- gibbs_normal(s, P = 0.01, iter=100)
Study to see what variables are helpful in determining admission to Graduate School
GradSchoolAdmission
GradSchoolAdmission
A data frame with 400 observations on the following 3 variables.
student was admitted (1) or not admitted (0)
GRE score
grade point average
Unknown.
Frequency use of "can" for Federalist Papers written by Alexander Hamilton
Hamilton_can
Hamilton_can
A data frame with 49 observations on the following 6 variables.
name of Federalist paper
total number of words
word that is counted
frequency of the word
fraction of words with that word
author of paper
http://www.gutenberg.org/ebooks/18
Measurements of house size and selling price for a collection of homes in a city in Ohio
house_prices
house_prices
A data frame with 24 observations on the following 2 variables.
selling price in $1000
square footage of house
Zillow.com
Weekly hours spent on homework for students from five schools
HWhours5schools
HWhours5schools
A data frame with 116 observations on the following 2 variables.
school number of student
weekly hours spent on homework
Unknown.
Increases font size on all text in a ggplot2 graphic
increasefont(Size = 18)
increasefont(Size = 18)
Size |
font size of all textual elements in a ggplot2 graphic |
ggplot2 theme code to increase the font size
Jim Albert
df <- data.frame(p=c(.1, .3, .5, .7, .9), Prior=rep(1/5, 5)) ggplot(df, aes(p, Prior)) + geom_point() + increasefont()
df <- data.frame(p=c(.1, .3, .5, .7, .9), Prior=rep(1/5, 5)) ggplot(df, aes(p, Prior)) + geom_point() + increasefont()
Model script for JAGS to fit a particular Bayesian model. Currently the possible models are "beta_binomial", "hier_normal", "hier_trajectory", "normal", "regression", "regression_cond_means", and "trajectory".
JAGS_script(model)
JAGS_script(model)
model |
name of the model |
A character string containing the model script
Ratings of Korean dramas prodcast during different days of the week and didfferent producers
KDramaData
KDramaData
A data frame with 101 observations on the following 5 variables.
name of drama
indicator of what day the drama was broadcast
indicator of the producer of the drama
rating of the drama
date of rating
AGB Nielsen Media Research Group
U.S. women labor participation and family income
LaborParticipation
LaborParticipation
A data frame with 753 observations on the following 2 variables.
labor participation of the wife
family income exclusive of wife's income in $1000
University of Michigan Panel Study of Income Dynamics
Frequency use of "can" for Federalist Papers written by James Madison
Madison_can
Madison_can
A data frame with 49 observations on the following 6 variables.
name of Federalist paper
total number of words
word that is counted
frequency of the word
fraction of words with that word
author of paper
http://www.gutenberg.org/ebooks/18
Graph of several normal curves
many_normal_plots(list_normal_par)
many_normal_plots(list_normal_par)
list_normal_par |
list of vectors, where each vector is a mean and standard deviation for a normal distribution |
ggplot2 object containing the graphical display.
Jim Albert
list_normal_par <- list(c(100, 15), c(110, 15), c(120, 15)) many_normal_plots(list_normal_par)
list_normal_par <- list(c(100, 15), c(110, 15), c(120, 15)) many_normal_plots(list_normal_par)
Graphs a collection of spinners
many_spinner_plots(list_regions)
many_spinner_plots(list_regions)
list_regions |
list of vectors of integer areas for the spins 1, 2, ... |
A ggplot2 object containing the spinner displays
Jim Albert
regions1 <- c(1, 1, 1) regions2 <- c(2, 1, 2, 1) many_spinner_plots(list(regions1, regions2))
regions1 <- c(1, 1, 1) regions2 <- c(2, 1, 2, 1) many_spinner_plots(list(regions1, regions2))
Annual marriage counts per 1000 of the population in Italy from 1936 to 1951
marriage_counts
marriage_counts
A data frame with 16 observations on the following 2 variables.
year
count of marriages per 1000 people
Unknown.
Serving size and calories for a selection of sandwiches from McDonalds
mcdonalds
mcdonalds
A data frame with 11 observations on the following 3 variables.
name of sandwich
serving size in grams
calories of sandwich
McDonalds restaurant
Implements Metropolis sampling for an arbitrary continuous probability distribution
metropolis(logpost, current, C, iter, ...)
metropolis(logpost, current, C, iter, ...)
logpost |
function definition of the log probability function |
current |
starting value of algorithm |
C |
half-width of proposal interval |
iter |
number of iterations |
... |
other inputs needed in logpost function |
S |
vector of simulated values |
accept_rate |
acceptance rate of algorithm |
Jim Albert
lpost <- function(theta, s){ dnorm(s$ybar, theta, s$se, log = TRUE) + dcauchy(theta, s$loc, s$scale, log = TRUE) } s <- list(ybar = 20, se = 0.4, loc = 10, scale = 2) post <- metropolis(lpost, 10, 20, 100, s)
lpost <- function(theta, s){ dnorm(s$ybar, theta, s$se, log = TRUE) + dcauchy(theta, s$loc, s$scale, log = TRUE) } s <- list(ybar = 20, se = 0.4, loc = 10, scale = 2) post <- metropolis(lpost, 10, 20, 100, s)
Weekend and gross sales for a selection of movies released in 2017
movies2017
movies2017
A data frame with 10 observations on the following 3 variables.
name of movie
opening weekend sales in millions of dollars
gross sales in millions of dollars
Internet Movie Database
Field goal and free throw shooting data for a collection of great NBA point guards
nba_guards
nba_guards
A data frame with 230 observations on the following 6 variables.
name of player
age of player
field goals
field goal attempts
free throws
free throw attempts
Data collected from Basketball-Reference.com.
Computes and Displays Area Under a Normal Curve
normal_area(lo, hi, normal_pars, Color = "orange")
normal_area(lo, hi, normal_pars, Color = "orange")
lo |
lower bound of interval |
hi |
upper bound of interval |
normal_pars |
vector of mean and standard deviation of the normal curve |
Color |
color of shading in plot |
ggplot2 object containing the graphical display.
Jim Albert
lo <- 10 hi <- 20 normal_pars <- c(25, 10) normal_area(lo, hi, normal_pars)
lo <- 10 hi <- 20 normal_pars <- c(25, 10) normal_area(lo, hi, normal_pars)
Draws a Normal Curve
normal_draw(normal_pars, Color = "red")
normal_draw(normal_pars, Color = "red")
normal_pars |
vector of mean and standard deviation of the normal curve |
Color |
color of line in plot |
ggplot2 object containing the graphical display.
Jim Albert
normal_pars <- c(2, 1) normal_draw(normal_pars)
normal_pars <- c(2, 1) normal_draw(normal_pars)
Computes "equal-tails" probability interval for a normal curve
normal_interval(prob, normal_pars, Color = "orange")
normal_interval(prob, normal_pars, Color = "orange")
prob |
value of coverage probability |
normal_pars |
vector of mean and standard deviation of the normal curve |
Color |
color of shading in plot |
ggplot2 object containing the graphical display.
Jim Albert
normal_pars <- c(2, 0.5) prob <- 0.5 normal_interval(prob, normal_pars)
normal_pars <- c(2, 0.5) prob <- 0.5 normal_interval(prob, normal_pars)
Displays a Quantile of a Normal Curve
normal_quantile(prob, normal_pars, Color = "orange")
normal_quantile(prob, normal_pars, Color = "orange")
prob |
probability value of interest |
normal_pars |
vector of mean and standard deviation of the normal curve |
Color |
color of shading in plot |
ggplot2 object containing the graphical display.
Jim Albert
normal_pars <- c(100, 10) prob <- 0.7 normal_quantile(prob, normal_pars)
normal_pars <- c(100, 10) prob <- 0.7 normal_quantile(prob, normal_pars)
Finds the parameters of the normal posterior with normal data and a normal prior
normal_update(prior, data, teach=FALSE)
normal_update(prior, data, teach=FALSE)
prior |
vector with components mean and sd of the normal prior |
data |
vector with components the sample mean and the standard error of the estimate |
teach |
logical variable indicating the form of the output |
If teach = TRUE, returns data frame that displays the mean, precision, and standard deviation for the prior, data, and posterior. If teach = FALSE, returns a vector with mean and standard deviation of the posterior.
Jim Albert
prior <- c(100, 10) data <- c(110, 15) normal_update(prior, data) normal_update(prior, data, teach=TRUE)
prior <- c(100, 10) data <- c(110, 15) normal_update(prior, data) normal_update(prior, data, teach=TRUE)
Winning times in seconds for the men's and women's 100m butterfly race for the Olympics from 1964 through 2016.
olympic_butterfly
olympic_butterfly
A data frame with 28 observations on the following 3 variables.
year of Olympics
gender
winning time in seconds
https://www.olympic.org/swimming/
Graphs prior and posterior probabilities from a discrete Bayesian model
prior_post_plot(d, Color = "orange")
prior_post_plot(d, Color = "orange")
d |
data frame where the first column are the model values, and columns named Prior and Posterior |
Color |
fill color for the bars |
ggplot2 object containing the graphical display.
Jim Albert
d <- data.frame(p=c(.1, .3, .5, .7, .9), Prior=rep(1/5, 5)) y <- 5 n <- 10 d$Likelihood <- dbinom(y, prob=d$p, size=n) d <- bayesian_crank(d) prior_post_plot(d, "red")
d <- data.frame(p=c(.1, .3, .5, .7, .9), Prior=rep(1/5, 5)) y <- 5 n <- 10 d$Likelihood <- dbinom(y, prob=d$p, size=n) d <- bayesian_crank(d) prior_post_plot(d, "red")
Constructs a graph of a discrete probability distribution
prob_plot(d, Color = "red", Size = 1.5)
prob_plot(d, Color = "red", Size = 1.5)
d |
data frame where the first two columns are the variable and associated probabilities |
Color |
color of line in plot |
Size |
width of line in plot |
A ggplot2 object containing the plot display
Jim Albert
d <- data.frame(x=1:5, Probability=c(.1, .2, .3, .3, .1)) prob_plot(d)
d <- data.frame(x=1:5, Probability=c(.1, .2, .3, .3, .1)) prob_plot(d)
Study on inputs that impact a salary of a professor
ProfessorSalary
ProfessorSalary
A data frame with 397 observations on the following 7 variables.
subject id
professor rank
A is theoretical and B is applied
number of years since receipt of doctorate
number of years of service
Female or Male
nine-month salary in dollars
Unknown.
Prices of a sample of one carat diamonds
pt100price
pt100price
A data frame with 25 observations on the following 2 variables.
index of diamond
price divided by 100
Unknown.
Prices of a sample of 0.99 carat diamonds
pt99price
pt99price
A data frame with 23 observations on the following 2 variables.
index of diamond
price divided by 100
Unknown.
Final standings of the MLB baseball teams in the 2018 season
pythag2018
pythag2018
A data frame with 30 observations on the following 7 variables.
team abbreviation
league abbreviation
number of wins
number of losses
proportion of wins
average runs scored
average runs allowed
Lahman database
Implements Metropolis sampling for an arbitrary discrete probability distribution
random_walk(pd, start, num_steps)
random_walk(pd, start, num_steps)
pd |
function containing discrete probability function on the integers 1, 2, ... |
start |
starting value of algorithm |
num_steps |
number of iterations of algorithm |
A vector of simulated values
Jim Albert
# random walk through a binomial distribution pd <- function(x){ dbinom(x, size = 10, prob = 0.5) } start <- 4 num_steps <- 50 out <- random_walk(pd, start, num_steps)
# random walk through a binomial distribution pd <- function(x){ dbinom(x, size = 10, prob = 0.5) } start <- 4 num_steps <- 50 out <- random_walk(pd, start, num_steps)
Scores on a 20-question T/F exam
ScoreData
ScoreData
A data frame with 30 observations on the following 2 variables.
subject id
number correct in 20-question exam
Data randomly generated.
Sample of sleeping times for a single night for a sample of college students
sleeping_times
sleeping_times
A data frame with 14 observations on the following single variable.
number of hours of sleep
Personal collection
Computes and plots the posterior distribution of spinners given a sequence of spins
spinner_bayes(list_regions, prior, data, plot=TRUE)
spinner_bayes(list_regions, prior, data, plot=TRUE)
list_regions |
list of vectors of integer areas for the spins 1, 2, ... |
prior |
a vector containing the prior probabilities for the spinners |
data |
a vector containing the spin values where 1, 2, 3, ... are the possible spins |
plot |
if plot=TRUE, a comparative graph of the prior and posterior probabilities is displayed |
A data frame with variables Spinner, Prior, Likelihood, Product, and Posterior
Jim Albert
regions1 <- c(1, 1, 1) regions2 <- c(2, 1, 2, 1) data <- c(1, 1, 1, 2) spinner_bayes(list(regions1, regions2), prior=c(0.5, 0.5), data)
regions1 <- c(1, 1, 1) regions2 <- c(2, 1, 2, 1) data <- c(1, 1, 1, 2) spinner_bayes(list(regions1, regions2), prior=c(0.5, 0.5), data)
Simulate random data from a spinner
spinner_data(regions, nsim=1000)
spinner_data(regions, nsim=1000)
regions |
vector of integer values for the spins 1, 2, ... |
nsim |
number of spins |
A vector of random spins from the spinner
Jim Albert
regions <- c(2, 1, 1, 2) spinner_data(regions, nsim=20)
regions <- c(2, 1, 1, 2) spinner_data(regions, nsim=20)
Computes likelihood matrix for many spinners
spinner_likelihoods(regions)
spinner_likelihoods(regions)
regions |
list of vectors of integer areas for the spins 1, 2, ... |
A matrix where each row corresponds to the outcome probabilities for one spinner.
Jim Albert
sp1 <- c(2, 1, 1) sp2 <- c(1, 1, 1, 1) regions <- list(sp1, sp2) spinner_likelihoods(regions)
sp1 <- c(2, 1, 1) sp2 <- c(1, 1, 1, 1) regions <- list(sp1, sp2) spinner_likelihoods(regions)
Constructs a spinner with different regions
spinner_plot(probs, ...)
spinner_plot(probs, ...)
probs |
vector of probabilities for the spins 1, 2, ... |
... |
optional vector of values and title |
A ggplot2 object containing the spinner display
Jim Albert
probs <- rep(.2, 5) spinner_plot(probs, values=c("A", "B", "C", "D", "E"), title="My Spinner") # probs does not need to be normalized spinner_plot(c(1, 2, 1, 2))
probs <- rep(.2, 5) spinner_plot(probs, values=c("A", "B", "C", "D", "E"), title="My Spinner") # probs does not need to be normalized spinner_plot(c(1, 2, 1, 2))
Display probability distribution for a spinner
spinner_probs(regions)
spinner_probs(regions)
regions |
vector of positive values for the spins 1, 2, ... |
Dataframe with variables Region and Prob
Jim Albert
regions <- c(2, 1, 1, 2) spinner_probs(regions)
regions <- c(2, 1, 1, 2) spinner_probs(regions)
Sample of taxi fares from a particular city
taxi_fares
taxi_fares
A data frame with 20 observations on the following single variable.
taxi cab fare
Personal collection
Data on time to serve for six professional tennis players
tennis_serve
tennis_serve
A data frame with 6 observations on the following 3 variables.
last name of player
number of serves
mean time to serve
https://github.com/JeffSackmann
Constructs a discrete distribution for two proportions under a testing or uniform hypotheses
testing_prior(lo=.1, hi=.9, n_values=9, pequal=0.5, uniform=FALSE)
testing_prior(lo=.1, hi=.9, n_values=9, pequal=0.5, uniform=FALSE)
lo |
minimum value of each proportion |
hi |
maximum value of each proportion |
n_values |
number of values of each proportion |
pequal |
probability of the equality of the two proportions |
uniform |
indicates if a uniform prior is desired |
matrix of probabilities where the rows and columns are labeled by the values of the proportions
Jim Albert
# testing prior where each proportion is # .1, .3, .5, .7, .9 Prob <- testing_prior(.1, .9, 5) # uniform prior over same proportion values Prob <- testing_prior(.1, .9, 5, uniform=TRUE)
# testing prior where each proportion is # .1, .3, .5, .7, .9 Prob <- testing_prior(.1, .9, 5) # uniform prior over same proportion values Prob <- testing_prior(.1, .9, 5, uniform=TRUE)
Launch speed and distance traveled for a sample of balls hit by the baseball player Mike Trout
trout20
trout20
A data frame with 25 observations on the following 2 variables.
launch speed in mph
distance in feet
Major League Baseball Advanced Media
Computes posterior of difference P2 - P1 of a probability matrix of two proportions
two_p_summarize(prob_matrix)
two_p_summarize(prob_matrix)
prob_matrix |
probability matrix where the rows and columns are labeled with the values of the proportions |
data frame with variables diff21 and Prob where diff21 = P2 - P1
Jim Albert
# use uniform prior over values .2, .3, .4 prob_matrix <- testing_prior(.2, .4, 3, uniform=TRUE) two_p_summarize(prob_matrix)
# use uniform prior over values .2, .3, .4 prob_matrix <- testing_prior(.2, .4, 3, uniform=TRUE) two_p_summarize(prob_matrix)
Computes posterior distribution of two proportions with a discrete prior
two_p_update(prior, s1f1, s2f2)
two_p_update(prior, s1f1, s2f2)
prior |
prior probability matrix where the rows and columns are labeled with the values of the proportions |
s1f1 |
number of successes and number of failures from first sample |
s2f2 |
number of successes and number of failures from second sample |
posterior probability matrix
Jim Albert
prior <- testing_prior() s1f1 <- c(3, 10) s2f2 <- c(8, 20) two_p_update(prior, s1f1, s2f2)
prior <- testing_prior() s1f1 <- c(3, 10) s2f2 <- c(8, 20) two_p_update(prior, s1f1, s2f2)
Measurements of time to serve serves of the tennis players Roger Federer and Rafael Nadal
two_players_time_to_serve
two_players_time_to_serve
A data frame with 100 observations on the following 2 variables.
last name of player
time to serve in seconds
https://github.com/JeffSackmann
Number of visits to a blog website for different weeks and days of the week
web_visits
web_visits
A data frame with 28 observations on the following 3 variables.
week number
day ofthe week
number of website visits
Personal data collected from Wordpress.com