ABSTRACT
Exercises between the covers of a statistics textbook are nice:
the data are quantitative and come from normal distributions, independence
runs rampant, the "user" knows exactly the confidence level desired, the
population variances are known from prior experience, no harsh political
realities impede the undertaking of statistically indicated recommendations,
etc.
Real-world problems as seen by the statistical practitioner are messy
in comparison: data may be qualitative, non-normal, or incomplete;
the correlation Hydra grows uglier heads frequently, the user expects a
binary answer, population variances are unknown (but large), and competing
political factions hope to use statistics as a drunkard uses a lamppost
-- not for light, but for support.
The insights collected in this paper will help:
· the newly trained statistician making the transition from
textbook exercises to practical application
· the manager assisting the acclimatization of newly hired statisticians
or consultants to their corporate assignments
· the educator eager to give his or her students a "jump-start"
on preparation for a statistician's typical day-to-day work
· the recipient of statistical consulting services who is eager
to understand the "behind-the-scenes" motivation for a competent consultant’s
concerns and questions.
INTRODUCTION
A variety of excerpts from a typical statistics course highlights the
sharp contrasts between textbook exercises and real-world situations.
THE DATA TAXONOMY
Data may be either qualitative (e.g., a rating of poor, fair, good,
or excellent) or quantitative (e.g., a pressure of 1.21 teradynes).
Many statistics texts and classes, by omission and/or implicit assumption,
give the student the impression that almost all data are quantitative.
Certainly almost all statistical computations, beginning with the calculation
of sample mean and standard deviation, require quantitative data.
Qualitative data may be further subdivided into nominal and ordinal
data. Nominal data can be numerically coded, but have no ordering.
For example, a researcher is free to code "male=1" and "female=2" or vice
versa, but in no sense is "male < female" or vice versa. The statistical
measure best suited to summarize nominal data is the mode (the most frequently
occurring value). By contrast, ordinal data, when suitably coded numerically,
have a natural ordering (hence the term "ordinal"). For example,
the ratings "poor," "fair," "good," and "excellent" can be numerically
coded 1 through 4 respectively, with the natural ordering poor < fair
< good < excellent. Either the mode or the median (the 50th
percentile of the data) serve well to summarize ordinal data.
Similarly, quantitative data may be further subdivided into interval
and continuous data. Whereas interval data have no true zero (0 degrees
Celsius does not represent zero heat), continuous data have a true zero
(0 Kelvin does represent zero heat). Interval and continuous data
share the property of "constant difference" -- for example, on either temperature
scale, the difference between 3 degrees and 2 degrees equals the difference
between 2 degrees and 1 degree. The ordinal example above is not quantitative:
the difference "excellent minus good" doesn’t equal the difference "good
minus fair." Meaningful computations of means and variances require
quantitative data. Descriptive analysis of qualitative data typically
follows an important truism: "First make pictures and plots of your
data; then, number-crunch it." This truism works well for quantitative
data too!
To recapitulate this taxonomy, observe that these categories themselves
are ordinal data:
They are not interval -- the increase in applicability of statistical
analysis methods is much greater for the ordinal-to-interval step than
for either the nominal-to-ordinal step or the interval-to-continuous step.
UNDERSTANDING PERCENT
The basic concept of percent is well understood by most people in our
society over the age of 10: 34% implies, on average, 34 out of every 100.
Why is it then that so many seem to be confused about comparing percents,
or at least in the proper expression of such comparisons?
As an example of the latter concern, suppose a sample shows that 20%
favor Plan A, 50% favor Plan B, and the remaining 30% favor Plan C.
Now, consider the following often-heard and totally erroneous comment:
"30% more favor Plan B than Plan A." By itself, our well-understood
basic concept interprets this as "on average, for every 100 who favor Plan
A, 130 favor Plan B." But this is clearly not the case. In
fact, one can easily verify that if 100 favor Plan A, then 250 favor Plan
B (and 150 favor Plan C). Two correct ways to compare the sample
results for Plans A and B are:
1) approval for Plan B exceeds that of Plan A by 30 percentage
points, or, probably better,
2) two-and-a-half times more (or 250% more) people favor Plan
B than favor Plan A.
Percents should usually be compared by dividing, not subtracting.
Expressing results in percent can easily hide a very important piece
of background information: what was the sample size? Sample percents
used as point estimates of true but unknown population proportions should
always be accompanied by an estimate of variation, commonly called a "margin
of error." For example, if the sample size was relatively small,
a test result of "50% favor Plan B" might really be "between 35% and 65%
favor Plan B with 95% confidence."
Sometimes it is just easier to understand a numerical comparison with
actual counts rather than with standardized values such as percent.
For example, one cannot take a simple average of percents; doing so is
equivalent to adding fractions without first finding a common denominator.
Consider the following counter-intuitive anomaly. Baseball player
A hit .300 (he got a hit in 30% of his times at bat) during the first half
of the season, while player B hit .280. In the second half, A hit .400,
B hit .390. For the entire season, however, Player B had a higher
batting average than Player A! This seemingly illogical result becomes
clear when one looks at the counts behind these percents. In the
first half, A had 90 hits in 300 at bats, B 56 hits in 200 at bats.
In the second half, A had 80 hits in 200 at bats, while B had 117 hits
in 300 at bats. Now we see that both players had 500 total at bats,
but while A had 170 hits (.340), B accumulated 173 hits (.346). Note
that .340 is the weighted average of .300 and .400, with the associated
at bats (denominator, "sample size") being the weights.
SIGNIFICANT DIGITS
A personal sore point with at least one of the authors is the common
practice of reporting more significant digits (or fewer) in numerical estimates
than is warranted or necessary. Simply put, if data is recorded with
four-significant-digit accuracy, then all results summarizing the data
(in the original units) should use the same four significant digits, no
more and no fewer, including means, projections, and labels on charts and
graphs. The mean of the data set 12.43, 12.60, 12.48, 12.49 is 12.50,
not 12.5. Furthermore, one is not at liberty to report the second
data value in the set as 12.6. The zero in the hundredths place is
not there to satisfy a number nerd’s neurosis, but rather indicates the
level of precision of the measuring instrument. Also, note that the
sum of the four data values is 50.00 (not 50), which divided by four yields
exactly 12.50. Depending on the rounding convention applied, any
four such values having a sum between 49.98 and 50.01 should have the mean
reported as 12.50. A projection made on the basis of the data cannot
be 12.5438 for the same reason, but needs to be rounded to 12.54.
As a (perhaps) more subtle example, suppose three costs are given as
$350,000, $400,000, and $250,000. Now the first, for example, may
really turn out to be $348,256.55, but we don’t know that from what is
given; these data (without more information) contain at most two significant
digits. Hence, the average should be reported as $330,000, not $333,333.33,
which assumes a level of precision clearly unwarranted.
Teachers should counsel students to do calculations retaining all digits
available on their calculator for intermediary values, then to round the
final answer to the number of significant digits indicated by the original
data. This procedure assures a result free of gross rounding errors
with "significant digit integrity."
Perhaps the technology revolution which has produced the hand-held
calculator and the Pentium processor, wonderful tools for speeding mundane
mountains of data manipulations, is also at least partly to blame for the
seemingly mindless acceptance of the numerical answers produced, regardless
of reasonability. Until these contraptions are made to reason (heaven
help us), that responsibility remains a human one. Teach your children
well!
GETTING THE RESULT YOU WANT: REPETITION AND RANDOM CHANCE
A nasty, distasteful attitude which persists among some who would enlist
the aid of a statistician in order to make data-driven decisions is that
data can (eventually) be made to support whatever point of view one wishes
to impress upon others. As an example, suppose we test 20 randomly
selected parts to see if the part exceeds a strength specification with
90% confidence. The appropriate t-test yields a test statistic with
a p-value of 0.16, so that we cannot conclude that the specification is
exceeded. Assuming that both the specification and the desired confidence
level are reasonable and appropriate, the correct response would probably
be to investigate the cause of the deficiency, attempt to fix it, then
take another random sample to see whether the data will indicate that an
improvement was made. Believe it or not, however, there are times
when management will not support that course of action (too expensive to
stop production, re-tool, tie up busy engineers, etc.), but will ask for
another 20 parts from the current system to be tested, hoping for a better
result. It too fails, but suppose that on the fifth such attempt
a p-value of 0.08 is realized, with corresponding conclusion that the part
now exceeds the specification with 90% confidence. Perhaps so, but
over time, a 90% confidence test will, by definition, yield a significant
result one time out of ten based on random chance factors alone.
Suppose the part in question is a seat belt. Enough said.
USE GRAPHS -- EMPLOY A STANDARD SCALE
A picture is truly worth a thousand words; well-constructed graphs convey
the nature of a data set much more rapidly and effectively to more people
than any number of numerical tables. Graphs are easy to abuse, however.
Care must be taken not to distort the data. One important principle
which is often overlooked or ignored is to construct scales for the x-
and/or y-axes which cover the appropriate range of the data. For
example, for continuous data having a true zero point, zero should usually
be included on the axis, unless it is made clear that a truncated scale
has been employed (Schmid 1983).
Even then, for multiple graphs depicting different variables or combinations
of variables from the same data set, a standard scale should be employed
to allow for more accurate comparisons among variables. With computer
graphics software, the usual default scaling of each graph corresponds
only to the range of the data subset relating to that graph, so that relative
variation is obscured. The solution is to identify the minimum and
maximum values over all variables to be graphed and override the default
scale accordingly. Of course, it may be desirable to "blow up" a
graph to reveal more detail, but the first look should be on the standardized
scale.
INDEPENDENCE
Another nice-to-have property, much more common in textbooks than in
real life, is independence (the probability of event B occurring is unaffected
by event A having occurred or not occurred). Many frequently used
statistical methods assume it, beginning with the basic probability identity:
Prob(both A and B) = Prob(A) * Prob(B).
As an example of appropriate warning, consider the following exercise
from a widely-used statistics text (Mendenhall and Sincich 1992):
This exercise and its back-of-the-book answer tacitly assume independence
between successive disks manufactured. The alert instructor will,
however, draw students’ attention to this assumption by asking for an assessment
of its plausibility. Students then quickly realize that independence
is unlikely -- if a given disk is defective, the cause may be a defect
in raw material or a malfunction in the assembly process, either of which
increases the probability that the next disk manufactured will be likewise
defective.
As another example of the importance, yet elusiveness, of independence,
consider the following exercise from a statistics text (Mendenhall and
Sincich 1995):
The nuclear mishap on Three Mile Island near Harrisburg, Pennsylvania, on March 28, 1979, forced many local residents to evacuate their homes -- some temporarily, others permanently. In order to assess the impact of the accident on the area population, a questionnaire was designed and mailed to a sample of 150 households within 2 weeks after the accident occurred. Residents were asked how they felt both before and after the accident about having some of their electricity generated from nuclear power. The summary results are provided in the table.
|
|
|
|
|
Before accident |
|
|
|
|
After accident |
|
|
|
|
This exercise appears at the end of a section "Estimation of the Difference Between Two Population Proportions," which correctly stresses the importance of "two independent binomial experiments" (emphasis added). In the context of this exercise, the alert instructor mentions the following items of almost daily concern in real-life statistical work:
a) these experiments are hardly independent -- the samples are
actually "pairwise"
b) in class, the student knows what method to use on a problem
-- the one in the section at whose end the problem appears! -- but
in actual practice, this aid disappears
c) real-life surveys often struggle to achieve 30% response rates,
leading to on-going concern
"Are the respondents representative of the population being studied?" (Fecso
et al 1996). By what means did these researchers achieve a 100% response
rate, especially when people were perforce changing addresses on short
notice?
RANDOMIZE!
Most experimental design textbooks and instructors emphasize the importance
of randomizing experimental units at each stage of a test, in order to
avoid as much as possible unwanted bias creeping in. Careful randomization
will help to assure that significant differences observed are due to experimental
factors and not to unknown "noise" factors.
Effective methods of randomization may not be as forthcoming, however.
For example, a naive investigator may assume that for the first stage of
an experiment, one can take the first four parts from an assembly line
and subject them to Process A, take the next four parts for Process B,
and the last four for Process C, since "all parts coming off the line are
independent and equal." That assumption may not hold (e.g., environmental
conditions may change after part #5 has left the line), and furthermore,
is unnecessary. Simply pair each of the twelve parts with the numbers
1 to 12 using a device such as a table of random numbers (found in many
statistics texts), then put parts 1-4 in Process A, etc. The idea
is to "average out" unknown chance factors among the experimental units.
Even experimenters insulted by the suggestion that they might be "that
naive" could fall prey to the notion that they need no further randomization
when ready to proceed to Stage 2. Thus, one might choose any two parts
from each of Processes A, B, and C to subject to Process D, leaving the
remaining six for Process E, believing (perhaps subconsciously) that each
set of six parts has been chosen "randomly." The best practice is
never assume that one can make random selections using "common sense,"
rather always use a "mindless" random number generator which is unaffected
by the complex human psyche and its agendas both conscious and unconscious.
WHICH CONDITIONAL PROBABILITY?
The probability that A will occur, given that B occurred may be very
different from the probability that B occurred, given that A occurred (Paulos
1990). The former is typically written "P(A|B);" the latter, "P(B|A)."
As a practical example, consider a test capable of identifying employees
suitable for a given position with 80% accuracy. That is, given that
an employee is suitable, the test indicates suitability with probability
0.8, and given that an employee is unsuitable, the test indicates unsuitability
with probability 0.8. Suppose that 1000 prospective employees take
the test; unknown to the hiring official, 10% (100 of them) are suitable.
The total number of test-takers judged suitable will be 80 suitable ones
(80% of the 100) and 180 unsuitable ones (20% of the remaining 900).
Hence, the probability that an applicant who "passed" the test is suitable
is 80/(80+180), or less than one-third. The vendors selling the test
will emphasize: P(test indicates suitability|suitability) = 0.8.
But the hiring officer is more interested in the much smaller: P(suitability|test
indicates suitability) = 0.31.
Given this inequality of conditional probabilities, the instructor
should emphasize it by an example such as this.
NORMALITY ISN'T NORMALCY
A common textbook exercise is a clone of this one (Hines and Montgomery 1980):
The diameters of bolts produced by a certain manufacturing process are known to have a standard deviation of 0.0001 inch. A random sample of 10 bolts yields an average diameter of 0.2546 inch. Test the hypothesis that the true mean diameter of bolts equals 0.2550 inch, using a= 0.05.
Several important issues are closely related to this exercise. The instructor might well ask "How does the working statistician know the diameters are normally distributed?" Good answers would be the assurance provided by a bell-shaped histogram (again, note the importance of making pictures of the data before number-crunching it), the results of a chi-square goodness-of-fit test (usually covered in a first course), or results of a Kolmogorov-Smirnov or Anderson-Darling goodness-of-fit test (usually not covered in a first course). Next comes the question "If the diameters aren't normally distributed, then how can the hypothesis in the original question be tested?" Good answers would be "Take a more advanced class to learn the use of nonparametric methods" or "Seek the help of a statistician who knows the applicability and use of nonparametric methods." In summary, normality isn’t always normal, unless the data points are themselves averages, in which case the Central Limit Theorem (the distribution of averages is approximately normal and becomes more so as the sample size increases) applies.
HOW MANY SAMPLES DO I NEED?
Statisticians in industry soon learn that the warning they received
from their instructors that engineers will come to them after an experiment
has yielded data and ask for analysis, without the benefit of assisting
in the design stage and providing input on suitable techniques, was warranted.
But they might also discover that would-be experimenters do come to statisticians
beforehand with that one all-important question, eager to learn one magical
quantity as if the mystery of the universe could be unlocked if one only
knew: HOW MANY SAMPLES DO I NEED?
Given an answer (provided the budget will support it -- statisticians,
of course, always want unreasonably large sample sizes), the erstwhile
engineer will happily trot off to carry out his or her DOE [Design Of Experiment],
believing that accurate results are assured thanks to adequate sample size
alone and that he or she needn’t worry about other data integrity issues.
Perhaps the statistician should not be held responsible for such behavior.
On the other hand, how one responds to the request for sample size
is perhaps as important as providing a "good" one (based on textbook formulas
requiring an estimate of variation and desired confidence level, for instance).
A key issue here is that a large but biased sample is worth less than a
smaller but representative, unbiased one, and may in fact lead to disastrously
wrong conclusions. Hence, stress to the experimenter that obtaining
a sample truly representative of the system to be tested is more important
than achieving a certain sample size, tempered with the understanding that
inferring significant differences which are in fact present but small in
magnitude is made with more confidence the larger the sample.
It cannot be stressed enough that excellent analysis cannot compensate
for poor data. Statistical results can be no better than the data used
to create them.
EXPERIMENTAL UNITS VS. REPLICATES
Even experienced researchers (and yes, even statisticians!) can be confused
about the distinction between experimental units and replicates.
Suppose an engineer wants to test the corrosion resistance of a car door
hem design with and without the external sealer and adhesive currently
used in production. If the design does as well (or better) without
sealer and/or adhesive, the engineer could recommend a cost savings and
make his or her manager, who presents the idea to the car program director,
a hero.
The engineer prepares four door hems, one with each of the experimental
combinations. In order to provide a larger sample size (replicates),
and avoid the cost of buying more whole doors, he or she then cuts each
door into five pieces, mounts all 20 "samples" onto an experimental buck
and begins to cycle them in and out of a humidity chamber, simulating exposure
of the doors to a corrosive environment. Upon completion, corrosion
is measured on each of the pieces and analysis carried out using ANOVA
[Analysis Of Variance] with an assumed total sample size of 20.
Unfortunately, this method did not produce independent replicates for
the experiment. The experimental units were the four original doors,
and no amount of cutting can increase the sample size beyond four.
To truly have replicates which are individual experimental units, individual
preparation is required, preferably to whole doors (in a random manner).
In industry it is often too costly to obtain large samples, and decisions
need to be made on the basis of a relatively small amount of data.
(Here again, we note the importance of the quality of the sample.)
But in the above example, using an error term with an artificially acquired
number of degrees of freedom can lead to declaring significant differences
which do not in reality exist.
HOW MUCH STATISTICAL SIGNIFICANCE YIELDS PRACTICAL SIGNIFICANCE?
Exercises like the one on page 6 are useful springboards for the following
discussion: "How did the user decide that a statistical significance
level of 0.05 was the threshold of practical significance in the sense
of a fork in the managerial decision tree?" In the day-to-day work
of a practicing statistician, the user is hesitant to specify such a significance
level. In such cases, the statistician serves the user well by conducting
a hypothesis test and reporting the observed significance level (p-value)
of the test. It is then left to the recipient of the results to decide,
at leisure, what action to take based on the reported p-value (Hooke 1983),
(Schervish 1996).
JUST ONE MORE INDEPENDENT VARIABLE, PLEASE
Newcomers to regression analysis find the temptation to add independent variables to a model almost irresistible (Kotz and Stroup 1983). Adding another independent variable increases the absolute value of the multiple correlation coefficient. Too often, such an increase becomes an end in itself (in contrast to its proper role of being one of several guidelines to model refinement and improvement). The teacher eager to instruct students in "statistical street smarts" as well as standard analytical methods will present example models having a multiple correlation coefficient near +1.0 or -1.0, yet also suffering from one or more of the following deficiencies:
a) strong linear relationships among the (supposedly) independent
variables cause those variables to "fight" among themselves to explain
the same variations in the dependent variable
b) inclusion of too many cross-product and/or power (quadratic,
cubic) terms causes the response function to oscillate violently between
observed data points, rendering it useless for the common need of
interpolation
c) the predictions of the model ostensibly (and nonsensically)
appear more precise than the data gathered to build the model in
the first place.
As a rule of thumb, to avoid overfitting a regression model, collect
at least 4*(k+1) data points to fit a model with k predictor variables,
and strive for a parsimonious model -- one which fits well with as few
predictor variables as possible.
BEWARE EXTRAPOLATION
Discovering that a statistical model fits collected data very well can
be exhilarating for engineer and statistician alike, especially when confirmed
by further verification testing. Before vital decisions are made
on the basis of the model, however, the wise statistician concerned with
the reputation of the consultee (not to mention his or her own) will be
sure to caution that results, however certain, are known by this analysis
to apply only to the population from which the sample data were drawn.
Extrapolation outside the range of the data used to fit a model is
risky at best and disastrous at worst. For example, suppose it is
shown that the heat resistance of a material improves linearly with the
addition of a certain inexpensive additive, with percent added tested from
2% to 5% in increments of 0.25%. Elated, the manager orders the material
process to run with 8% additive, calculating with the help of the model
(shown to be at least 95% accurate with 99% confidence) that the heat resistance
property of the resultant material will then be superior to that of their
competitors. What the analysis which produced the model could not
show, unfortunately, was that for this material heat resistance begins
to decline exponentially when more than 6.4% of the additive is used.
(For that matter, how do we know that optimum resistance does not occur
for a 1% solution?)
Perhaps the illustration given above is simplistic, but the warning
implied is clear. Recipients of data analysis which supports a point of
view favorable from their perspective can be over-eager to extend the ramifications
beyond what can be reasonably justified from the sample tested. The
moral: always be aware of the population for which inferences are desired,
and take pains to obtain a sample to test which is truly representative
of that population. Failing that, put in writing the risks inherent in
unwarranted extrapolations.
"MANY A SLIP BETWIXT TEXT AND SOFTWARE"
Well might students gasp in dismay when the instructor, using a statistical package, produces a nonsensical answer to this routine exercise:
The lifetime y (in hours) of the central processing unit of a certain type of microcomputer is an exponential random variable with parameter b = 1,000. What is the probability that a central processing unit will have a lifetime of at least 2,000 hours?
Given this dismay, students are well-motivated to learn the cause: the textbook is using the exponential density function
The statistical software package (yes, let’s open its manual!) is using the exponential density function
Hence, each b is the reciprocal of the other.
Now, students are receptive to the sermon:
Wrong: Rush to assume that textbooks and statistical-software
manuals use
exactly matching formulas.
Right: Check the formulas against each
other to avoid overlooking needed
transformations.
ACCURATE USE OF STATISTICAL TABLES
A typical binomial-distribution exercise (reducing to "what is the probability
of success on exactly 7 of 10 independent trials, given that p [probability
of success on any one trial] = 0.9?") provides an opening for yet another
sermon of practical importance. Alert students notice that the back-of-the-book
answer for this exercise is incorrect, but would be correct if the question
referred to 8 instead of 7 successes. The wise instructor now presents
the wrong and right ways to read tables such as a typical binomial-probability
table:
Wrong: While walking down the aisle and conferring with
a co-worker, open
the book to the table, run an index finger
approximately horizontally
across the table, and use the number near
it.
Right: Sit in a chair next to a desk
or table. Open the book flat on the table.
Using a ruler or the straight edge of a blank
sheet of paper as a "mask,"
read the needed entry from precisely the row
and column of the table
pertinent to the problem.
Similar cautions are in order for other tables. For example,
before using a normal probability table, the practicing statistician checks
its legend -- does the shaded area begin at x = minus infinity or at x
= 0?
BEWARE THE "HAWTHORNE EFFECT"
A typical exercise might present a list of numbers representing cycle
times of an assembly-line worker performing a manual operation, and request
the mean and standard deviation of those numbers. How effortlessly
such data appear between the covers of a textbook! However, in practice,
the statistician needs to ask "How were these data collected?" A
typical answer is "The process engineer, clipboard and stopwatch in hand,
stood beside the worker." Enter the Hawthorne effect -- the very act of
gathering the data affects it (Thurkow 1996). Very likely, the worker,
consciously or unconsciously, performed the operation at atypical speed.
To explain the Hawthorne effect to students, a physical analogy is useful.
Visualize measuring the air pressure in a tire -- the very act of pressing
the tire gauge against the valve stem allows air to escape from the tire,
slightly lowering the pressure. Hence the working statistician needs
to intervene before the operational data is collected, asking "How can
these data be collected unobtrusively?" and explaining that the best statistical
analyses, applied to atypical data, will lead to misleading conclusions.
WORKING WITHIN ORGANIZATIONAL REALITY
The most rigorously correct statistical results are useless, and the
time spent to obtain them wasted, if they are disbelieved and consequently
lead to no beneficial action. As an example, consider the following
canonical chi-square exercise:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The answer in the back of the book will certainly be "Yes," with a suitably
high value of the chi-square statistic. In workaday practice, a statistician
who presented this result to the supervisor of Line B would quickly hear
"Of course you come to see me. Of course we had more defectives,
since we produced more parts. It just goes to show ‘You can prove
anything with enough statistics.’" The expert statistician, wise
by experience, would not only be aware of the ability of the chi-square
test to work with fixed marginal totals, but would be able to explain the
value of using that approach to a non-statistician. This situation
is typical of many in which the "street-smart" statistical analyst aims
not for theoretical perfection, but for the best business outcome given
the organizational constraints sometimes imposed on direct application
of statistical theory and methods.
SUMMARY AND CONCLUSIONS
All of the above are examples, actually met in practical work, of what
the competent statistical analyst or consultant must do to extend the benefits
of statistics from the textbook or research journal to the office, store,
factory, laboratory, clinic, or warehouse. As such, they provide
useful guidance to the statistician, manager, educator, or the recipient
of statistical results and concurrent consultation.
ACKNOWLEDGMENTS
Robert M. Czech and John M. Dennis, of Quality & Product Information
Systems, Ford Motor Company, made valuable criticisms to improve the clarity
and presentation of this paper.
AUTHOR BIOGRAPHIES
Edward J. Williams holds bachelor’s and master’s degrees in mathematics (Michigan State University, 1967; University of Wisconsin, 1968). From 1969 to 1971, he did statistical programming and analysis of biomedical data at Walter Reed Army Hospital, Washington, D.C. He joined Ford in 1972, where he works as a computer software analyst supporting statistical and simulation software. Since 1980, he has also taught evening classes at the University of Michigan, including both undergraduate and graduate statistics classes (supporting software Excel™, Minitab™, SAS™, and SPSS™) and simulation classes (using GPSS/H™, SLAM II™, or SIMAN™). He is a member of the Association for Computing Machinery [ACM] and its Simulation Special Interest Group [SIGSIM], the Institute of Electrical and Electronics Engineers [IEEE], the Society for Computer Simulation [SCS], the Society for Manufacturing Engineers [SME], the Institute of Industrial Engineers [IIE], and the American Statistical Association [ASA]. He serves on the editorial board of the International Journal of Industrial Engineering – Applications and Practice.
John Harder earned Master’s degrees in mathematics from Wichita State University (1987) and in statistics from Kansas State University (1992). He taught high school mathematics in Peabody, Kansas (1981-85), and at the college level at Wichita State (1985-87), Tarkio College, Missouri (1987-88), and Kansas State (1988-92). At KSU, he was Instructor of Statistics and served as Graduate Student Teaching Coordinator. In July 1992 he joined the Materials, Fasteners and Corrosion Protection Engineering Department at Ford Motor Company in Dearborn, Michigan as statistical analyst. His major responsibilities include planning and leadership of corrosion field surveys and subsequent data analysis and report writing, warranty data analysis, and assisting engineers with DOE’s. At Ford he has also had a few months’ experience with each of the following activities: Analytical Software Support (under Edward Williams), Auto Safety Office, Body and Assembly Quality Office, and Body Engineering Reliability and Statistical Methods. He is a member of the American Statistical Association. John lives in Windsor, Ontario with wife Julie and two pre-teenage daughters.
BIBLIOGRAPHY
Fecso, Ronald S., William D. Kalsbeek, Sharon L. Lohr, Richard L. Scheaffer, Fritz J. Scheuren, and Elizabeth A. Stasny. 1996. "Teaching Survey Sampling." The American Statistician 50(4):328-340.
Hines, William W., and Douglas C. Montgomery. 1980. Probability and Statistics in Engineering and Management Science, second edition. New York, New York: John Wiley & Sons.
Hooke, Robert. 1983. How to Tell the Liars from the Statisticians. New York, New York: Marcel Dekker.
Kotz, Samuel, and Donna F. Stroup. 1983. Educated Guessing. New York, New York: Marcel Dekker.
Mendenhall, William, and Terry Sincich. 1992. Statistics for Engineering and the Sciences, third edition. San Francisco, California: Dellen.
Mendenhall, William, and Terry Sincich. 1995. Statistics for Engineering and the Sciences, fourth edition. Englewood Cliffs, New Jersey: Prentice-Hall, Incorporated.
Paulos, John Allen. 1990. Innumeracy. New York, New York: Vantage.
Schervish, Mark J. 1996. "P Values: What They Are and What They Are Not." The American Statistician 50(3):203-206.
Schmid, Calvin F. 1983. Statistical Graphics. New York, New York: John Wiley & Sons, Incorporated.
Thurkow, Niki M. 1996. "The Use of Security Video Film in
Video Work Sampling." In Proceedings of the 1st Annual International
Conference on Industrial Engineering Applications and Practice, eds.
Jacob Jen-Gwo Chen and Anil Mital, 1145-1149.