`x<-rnorm(500, mean= 35, sd= 9) # randomly select 500 normal data with mean = 35 and sd= 9. `

Histogram of the data

```
hist(x, prob=TRUE, col="grey")
curve(dnorm(x,mean=35,sd=9),0,70,add=TRUE,lwd=2,col="red")
```

what exactly we are are getting as a `mean`

and `sd`

?

`mean(x)`

`## [1] 35.02`

`sd(x)`

`## [1] 8.858`

Now we are going to select 500 random samples of size 5 from `x`

and and find average of each sample. We will have 500 sample means. Our interest is to analyses the distribution of these 500 averages.

```
mu= 35
sigma=9
n=5
#Sample mean
xbar = rep(0,500) # repeats zero 500 times
for (i in 1:500) { xbar[i]=mean(rnorm(n,mean=mu,sd=sigma)) }
hist(xbar, prob=TRUE , col="grey")
```

The distribution of these 500 sample averages `Sampling Distribution`

is also normal.

Conclusion: If the parent distribution is normal then sampling distribution is also normal. Mathematically, if \(X \sim N(mean= \mu, sd= \sigma)\) then \(\bar {x} \sim N(mean= \mu, sd= \frac{\sigma}{\sqrt{n}})\). In the above example, \(n=5\)(`sample size`

)

Let \(Y\) be a beta distribution with parameters alpha = 12 and beta = 1.

```
y<- rbeta(1000, 12,1)
hist(y, prob=TRUE, col= "grey")
```

caution: above data is highly skewed with long left tail.

Now we are going to select 1000 random samples of size 10 from `y`

and and find average of each sample. We will have 1000 sample means. Our interest is to analyses the distribution of these 1000 averages this time.

```
alpha=12
beta=1
n=5
ybar=rep(0,10000)
for(i in 1:10000) {
ybar[i]= mean(rbeta(n, 12,1))
}
hist(ybar, prob=TRUE, col= "grey", main= "sampling distribution when n= 5")
```

Wow !! sampling distribution is also normal.

Conclusion: Regardless of parent distribution the sampling distribution will be normal if we have enough sample (`at least 30 !!`

).

What happens if we increase the sample size from 5 to 10 , 15, 20 30 ?

```
y<- rbeta(1000, 12,1)
#for
n= 5
sd = sd(y)/ sqrt(5)
sd
```

`## [1] 0.03263`

```
#for
n= 10
sd = sd(y)/ sqrt(10)
sd
```

`## [1] 0.02307`

```
#for
n= 15
sd = sd(y)/ sqrt(15)
sd
```

`## [1] 0.01884`

```
#for
n= 20
sd = sd(y)/ sqrt(20)
sd
```

`## [1] 0.01632`

```
#for
n= 30
sd = sd(y)/ sqrt(30)
sd
```

`## [1] 0.01332`

As the sample size increases the standard deviation of `ybar`

, also called `standard error`

decreases. Lets see the sampling distribution of 1000 sample means of size 30 ( from the beta distribution).

```
alpha=12
beta=1
n=30
ybar=rep(0,10000)
for(i in 1:10000) {
ybar[i]= mean(rbeta(n, 12,1))
}
hist(ybar, prob=TRUE, col= "grey", main= "sampling distribution when n= 30")
```

This looks better !! If we increase sample size the standard error decreases. (implies higher symmetry (lower skewness) in the sampling distribution). Thus, the histogram of the sampling distribution with saple size 30 looks more symmetrical than that of size 5.