Sampling Distribution: if the Parent Distribution is Normal

x<-rnorm(500, mean= 35, sd= 9) # randomly select 500 normal data with mean = 35 and sd= 9. 

Histogram of the data

hist(x, prob=TRUE, col="grey")
 curve(dnorm(x,mean=35,sd=9),0,70,add=TRUE,lwd=2,col="red")

plot of chunk unnamed-chunk-3

what exactly we are are getting as a mean and sd?

mean(x)
## [1] 35.02
sd(x)
## [1] 8.858

Now we are going to select 500 random samples of size 5 from x and and find average of each sample. We will have 500 sample means. Our interest is to analyses the distribution of these 500 averages.

mu= 35
sigma=9
n=5

#Sample mean
xbar = rep(0,500) # repeats zero 500 times

for (i in 1:500) { xbar[i]=mean(rnorm(n,mean=mu,sd=sigma)) }
hist(xbar, prob=TRUE , col="grey")

plot of chunk unnamed-chunk-5

The distribution of these 500 sample averages Sampling Distribution is also normal.

Conclusion: If the parent distribution is normal then sampling distribution is also normal. Mathematically, if \(X \sim N(mean= \mu, sd= \sigma)\) then \(\bar {x} \sim N(mean= \mu, sd= \frac{\sigma}{\sqrt{n}})\). In the above example, \(n=5\)(sample size)

Sampling Distribution: if the Parent Distribution is Non-Normal

Let \(Y\) be a beta distribution with parameters alpha = 12 and beta = 1.

y<- rbeta(1000, 12,1)
hist(y, prob=TRUE, col= "grey")

plot of chunk unnamed-chunk-6

caution: above data is highly skewed with long left tail.

Now we are going to select 1000 random samples of size 10 from y and and find average of each sample. We will have 1000 sample means. Our interest is to analyses the distribution of these 1000 averages this time.

alpha=12
beta=1
n=5

ybar=rep(0,10000)

for(i in 1:10000) {
ybar[i]= mean(rbeta(n, 12,1))
}

hist(ybar, prob=TRUE, col= "grey",  main= "sampling distribution when n= 5")

plot of chunk unnamed-chunk-7

Wow !! sampling distribution is also normal.

Conclusion: Regardless of parent distribution the sampling distribution will be normal if we have enough sample (at least 30 !!).

What happens if we increase the sample size from 5 to 10 , 15, 20 30 ?

y<- rbeta(1000, 12,1)
  #for
n= 5
sd = sd(y)/ sqrt(5)
sd
## [1] 0.03263
#for
n= 10
sd = sd(y)/ sqrt(10)
sd
## [1] 0.02307
#for
n= 15
sd = sd(y)/ sqrt(15)
sd
## [1] 0.01884
#for
n= 20
sd = sd(y)/ sqrt(20)
sd
## [1] 0.01632
#for
n= 30
sd = sd(y)/ sqrt(30)
sd
## [1] 0.01332

As the sample size increases the standard deviation of ybar, also called standard error decreases. Lets see the sampling distribution of 1000 sample means of size 30 ( from the beta distribution).

alpha=12
beta=1
n=30

ybar=rep(0,10000)

for(i in 1:10000) {
ybar[i]= mean(rbeta(n, 12,1))
}

hist(ybar, prob=TRUE, col= "grey", main= "sampling distribution when n= 30")

plot of chunk unnamed-chunk-9

This looks better !! If we increase sample size the standard error decreases. (implies higher symmetry (lower skewness) in the sampling distribution). Thus, the histogram of the sampling distribution with saple size 30 looks more symmetrical than that of size 5.