Statistics Applets

Sometimes it's hard to really get a grasp on statistical concepts based on what a textbook says or the material presented in a lecture.  These applets provide a step-by-step guide to some important statistical concepts and lets you to actively explore and experiment on your own.  In order to develop a better feel for these statistical concepts try playing around with each applet, change some parameters and see how the results are affected.

Some key terms used in the applets are:

mu = the population mean
var = the population variance
X-bar = the sample mean
s = the sample standard deviation
N = sample size

Applet 1: Standardizing a normally distributed random variable

In order to calculate probabilities associated with a normally distributed random variable (one that follows a symmetric bell-shaped curve) you have to convert to the standard normal variable.  Standardizing takes any normally distributed random variable and converts it into the standard normal variable  (Z), which has a mean of 0 and a variance of 1. Your statistics book contains a standard normal table, from which you can then determine any probability of interest. For an interactive exercise on standardizing a normally distributed random variable click here.

Applet 2: The Central Limit Theorem

The Central Limit Theorem is one of the most important theorems in statistical theory. It states that as the sample size increases the distribution of the sample mean becomes more and more normally distributed regardless of the population distribution. This means that we can use the normal distribution to describe the sample mean from any population, even non-normal ones, if we have a large enough sample. The general rule of thumb is that you need a sample of at least 30 observations for the Central Limit Theorem to apply (i.e., for the distribution of the sample mean to be reasonably approximated with the normal distribution).  For an interactive exercise on the Central Limit Theorem click here.

Applet 3: Confidence Intervals

A confidence interval, or interval estimate, is a range of values that contains the population mean with a level of confidence that the researcher chooses. The most common levels of confidence are 90%, 95%, and 99%.  For example, a 95% confidence interval would be a range of values that has a 95% chance of containing the population mean.   For an interactive exercise on confidence intervals (interval estimates) click here.

Applet 4:  Hypothesis Tests of the Population Mean

Once we have used a sample to produce an estimate of the population mean we often need to use that estimate to make a decision.   For example, suppose you're told the average grade in a particular class is a 75. You collect a random sample of grades and observe a mean of 65 -- is that sufficient evidence to decide that the average grade is not 75?  There are two approaches to formally making such a decision: the critical regions approach and the p-value approach. For an interactive exercise on both approaches to hypothesis testing click here.

Applet 5: Simple Linear Regression

Simple linear regression is a popular tool for describing the relationship between two random variables.  Regression analysis presumes that one variable (Y) depends linearly on another variable (X).  Regression involves finding the line that best represents the relationship between Y and X based on sample points (X,Y).  To determine how well the estimated line fits the data analysis of variance is conducted. This involves figuring out how much of the variation in Y is explained by variation in X and how much is unexplained, or random.

Some key terms used in this applet are:

Uy = the population mean of Y

Yest1, Yest2 ...= the value of Y1, Y2.... predicted, or estimated, by the regression line (Y-hat)

alpha = the level of significance

SST = Sum of Squares Total, a measure of all the variation in Y about its mean

SSE = Sum of Squared Errors, the sum of the squared distances between actual Y and predicted
Y. This measures how much variation in Y is not explained by the regression line.

SSR = Sum of Squares Regression, a measure of how much variation in Y is explained by Y's
linear relationship with X (i.e. variation in Y due to variation in X).

For an interactive exercise on linear regression and analysis of variance click here. Caution: If you draw in a sample line that is really bad just to see what happens, you will find that you get some nonsensical results. So, try to draw in a line that fits the data.

Applet 6: Quality Control using a Control Chart for the Sample Mean

Firms often use statistical analysis to monitor and maintain the quality of their products.  One tool used in quality control analysis is the control chart for the sample mean.  This device helps firms determine if some aspect of their production process has a serious problem that needs to be investigated and repaired.  The firm's problem is to distinguish between normal (random) variation in their product and systematic (non-random) variation due to a problem with inputs or the production process.  Click here for an interactive exercise on using a Control Chart for Sample Mean.

Applet 7: Quality Control using a Control Chart for the Sample Proportion

In addition to control charts which track the sample mean, quality control analysis sometimes also uses control charts for the sample proportion. The sample proportion is the fraction of sample observations that has some characteristic of interest.  This is another device that helps firms determine if some aspect of their production process has a serious problem that needs to be investigated and repaired.  Click here for an interactive exercise on using a Control Chart for Sample Proportion