Williams & Harder, Teaching Statistics -- Industrial Realities Not Between the Textbook Covers

TEACHING STATISTICS -- INDUSTRIAL REALITIES NOT BETWEEN THE TEXTBOOK COVERS Edward J. Williams 206-2 Engineering Computer Center Mail Drop 3 Ford Motor Company Post Office Box 2053 Dearborn, Michigan 48121-2053 williame@umdsun2.umd.umich.edu John Harder 1C072 Building 5 Mail Drop 5006 Ford Motor Company Post Office Box 2053 Dearborn, Michigan 48121-2053 jharder@gw.ford.com

ABSTRACT

Exercises between the covers of a statistics textbook are nice: the data are quantitative and come from normal distributions, independence runs rampant, the "user" knows exactly the confidence level desired, the population variances are known from prior experience, no harsh political realities impede the undertaking of statistically indicated recommendations, etc.
Real-world problems as seen by the statistical practitioner are messy in comparison: data may be qualitative, non-normal, or incomplete; the correlation Hydra grows uglier heads frequently, the user expects a binary answer, population variances are unknown (but large), and competing political factions hope to use statistics as a drunkard uses a lamppost -- not for light, but for support.
The insights collected in this paper will help:
· the newly trained statistician making the transition from textbook exercises to practical application
· the manager assisting the acclimatization of newly hired statisticians or consultants to their corporate assignments
· the educator eager to give his or her students a "jump-start" on preparation for a statistician's typical day-to-day work
· the recipient of statistical consulting services who is eager to understand the "behind-the-scenes" motivation for a competent consultant’s concerns and questions.

INTRODUCTION

A variety of excerpts from a typical statistics course highlights the sharp contrasts between textbook exercises and real-world situations.

THE DATA TAXONOMY

Data may be either qualitative (e.g., a rating of poor, fair, good, or excellent) or quantitative (e.g., a pressure of 1.21 teradynes). Many statistics texts and classes, by omission and/or implicit assumption, give the student the impression that almost all data are quantitative. Certainly almost all statistical computations, beginning with the calculation of sample mean and standard deviation, require quantitative data.
Qualitative data may be further subdivided into nominal and ordinal data. Nominal data can be numerically coded, but have no ordering. For example, a researcher is free to code "male=1" and "female=2" or vice versa, but in no sense is "male < female" or vice versa. The statistical measure best suited to summarize nominal data is the mode (the most frequently occurring value). By contrast, ordinal data, when suitably coded numerically, have a natural ordering (hence the term "ordinal"). For example, the ratings "poor," "fair," "good," and "excellent" can be numerically coded 1 through 4 respectively, with the natural ordering poor < fair < good < excellent. Either the mode or the median (the 50th percentile of the data) serve well to summarize ordinal data.
Similarly, quantitative data may be further subdivided into interval and continuous data. Whereas interval data have no true zero (0 degrees Celsius does not represent zero heat), continuous data have a true zero (0 Kelvin does represent zero heat). Interval and continuous data share the property of "constant difference" -- for example, on either temperature scale, the difference between 3 degrees and 2 degrees equals the difference between 2 degrees and 1 degree. The ordinal example above is not quantitative: the difference "excellent minus good" doesn’t equal the difference "good minus fair." Meaningful computations of means and variances require quantitative data. Descriptive analysis of qualitative data typically follows an important truism: "First make pictures and plots of your data; then, number-crunch it." This truism works well for quantitative data too!
To recapitulate this taxonomy, observe that these categories themselves are ordinal data:

nominal < ordinal < interval < continuous.

They are not interval -- the increase in applicability of statistical analysis methods is much greater for the ordinal-to-interval step than for either the nominal-to-ordinal step or the interval-to-continuous step.

UNDERSTANDING PERCENT

The basic concept of percent is well understood by most people in our society over the age of 10: 34% implies, on average, 34 out of every 100. Why is it then that so many seem to be confused about comparing percents, or at least in the proper expression of such comparisons?
As an example of the latter concern, suppose a sample shows that 20% favor Plan A, 50% favor Plan B, and the remaining 30% favor Plan C. Now, consider the following often-heard and totally erroneous comment: "30% more favor Plan B than Plan A." By itself, our well-understood basic concept interprets this as "on average, for every 100 who favor Plan A, 130 favor Plan B." But this is clearly not the case. In fact, one can easily verify that if 100 favor Plan A, then 250 favor Plan B (and 150 favor Plan C). Two correct ways to compare the sample results for Plans A and B are:
1) approval for Plan B exceeds that of Plan A by 30 percentage points, or, probably better,
2) two-and-a-half times more (or 250% more) people favor Plan B than favor Plan A.
Percents should usually be compared by dividing, not subtracting.
Expressing results in percent can easily hide a very important piece of background information: what was the sample size? Sample percents used as point estimates of true but unknown population proportions should always be accompanied by an estimate of variation, commonly called a "margin of error." For example, if the sample size was relatively small, a test result of "50% favor Plan B" might really be "between 35% and 65% favor Plan B with 95% confidence."
Sometimes it is just easier to understand a numerical comparison with actual counts rather than with standardized values such as percent. For example, one cannot take a simple average of percents; doing so is equivalent to adding fractions without first finding a common denominator. Consider the following counter-intuitive anomaly. Baseball player A hit .300 (he got a hit in 30% of his times at bat) during the first half of the season, while player B hit .280. In the second half, A hit .400, B hit .390. For the entire season, however, Player B had a higher batting average than Player A! This seemingly illogical result becomes clear when one looks at the counts behind these percents. In the first half, A had 90 hits in 300 at bats, B 56 hits in 200 at bats. In the second half, A had 80 hits in 200 at bats, while B had 117 hits in 300 at bats. Now we see that both players had 500 total at bats, but while A had 170 hits (.340), B accumulated 173 hits (.346). Note that .340 is the weighted average of .300 and .400, with the associated at bats (denominator, "sample size") being the weights.

SIGNIFICANT DIGITS

A personal sore point with at least one of the authors is the common practice of reporting more significant digits (or fewer) in numerical estimates than is warranted or necessary. Simply put, if data is recorded with four-significant-digit accuracy, then all results summarizing the data (in the original units) should use the same four significant digits, no more and no fewer, including means, projections, and labels on charts and graphs. The mean of the data set 12.43, 12.60, 12.48, 12.49 is 12.50, not 12.5. Furthermore, one is not at liberty to report the second data value in the set as 12.6. The zero in the hundredths place is not there to satisfy a number nerd’s neurosis, but rather indicates the level of precision of the measuring instrument. Also, note that the sum of the four data values is 50.00 (not 50), which divided by four yields exactly 12.50. Depending on the rounding convention applied, any four such values having a sum between 49.98 and 50.01 should have the mean reported as 12.50. A projection made on the basis of the data cannot be 12.5438 for the same reason, but needs to be rounded to 12.54.
As a (perhaps) more subtle example, suppose three costs are given as $350,000, $400,000, and $250,000. Now the first, for example, may really turn out to be $348,256.55, but we don’t know that from what is given; these data (without more information) contain at most two significant digits. Hence, the average should be reported as $330,000, not $333,333.33, which assumes a level of precision clearly unwarranted.
Teachers should counsel students to do calculations retaining all digits available on their calculator for intermediary values, then to round the final answer to the number of significant digits indicated by the original data. This procedure assures a result free of gross rounding errors with "significant digit integrity."
Perhaps the technology revolution which has produced the hand-held calculator and the Pentium processor, wonderful tools for speeding mundane mountains of data manipulations, is also at least partly to blame for the seemingly mindless acceptance of the numerical answers produced, regardless of reasonability. Until these contraptions are made to reason (heaven help us), that responsibility remains a human one. Teach your children well!

GETTING THE RESULT YOU WANT: REPETITION AND RANDOM CHANCE

A nasty, distasteful attitude which persists among some who would enlist the aid of a statistician in order to make data-driven decisions is that data can (eventually) be made to support whatever point of view one wishes to impress upon others. As an example, suppose we test 20 randomly selected parts to see if the part exceeds a strength specification with 90% confidence. The appropriate t-test yields a test statistic with a p-value of 0.16, so that we cannot conclude that the specification is exceeded. Assuming that both the specification and the desired confidence level are reasonable and appropriate, the correct response would probably be to investigate the cause of the deficiency, attempt to fix it, then take another random sample to see whether the data will indicate that an improvement was made. Believe it or not, however, there are times when management will not support that course of action (too expensive to stop production, re-tool, tie up busy engineers, etc.), but will ask for another 20 parts from the current system to be tested, hoping for a better result. It too fails, but suppose that on the fifth such attempt a p-value of 0.08 is realized, with corresponding conclusion that the part now exceeds the specification with 90% confidence. Perhaps so, but over time, a 90% confidence test will, by definition, yield a significant result one time out of ten based on random chance factors alone. Suppose the part in question is a seat belt. Enough said.

USE GRAPHS -- EMPLOY A STANDARD SCALE

A picture is truly worth a thousand words; well-constructed graphs convey the nature of a data set much more rapidly and effectively to more people than any number of numerical tables. Graphs are easy to abuse, however. Care must be taken not to distort the data. One important principle which is often overlooked or ignored is to construct scales for the x- and/or y-axes which cover the appropriate range of the data. For example, for continuous data having a true zero point, zero should usually be included on the axis, unless it is made clear that a truncated scale has been employed (Schmid 1983).
Even then, for multiple graphs depicting different variables or combinations of variables from the same data set, a standard scale should be employed to allow for more accurate comparisons among variables. With computer graphics software, the usual default scaling of each graph corresponds only to the range of the data subset relating to that graph, so that relative variation is obscured. The solution is to identify the minimum and maximum values over all variables to be graphed and override the default scale accordingly. Of course, it may be desirable to "blow up" a graph to reveal more detail, but the first look should be on the standardized scale.

INDEPENDENCE

Another nice-to-have property, much more common in textbooks than in real life, is independence (the probability of event B occurring is unaffected by event A having occurred or not occurred). Many frequently used statistical methods assume it, beginning with the basic probability identity: Prob(both A and B) = Prob(A) * Prob(B).
As an example of appropriate warning, consider the following exercise from a widely-used statistics text (Mendenhall and Sincich 1992):

Experience has shown that a manufacturer of computer software produces, on the average, only 1 defective blank diskette in 100. a. Of the next three blank diskettes manufactured, what is the probability that all three will be nondefective? b. In general, if k blank diskettes are manufactured, what is the probability that at least one of the k will be defective?

This exercise and its back-of-the-book answer tacitly assume independence between successive disks manufactured. The alert instructor will, however, draw students’ attention to this assumption by asking for an assessment of its plausibility. Students then quickly realize that independence is unlikely -- if a given disk is defective, the cause may be a defect in raw material or a malfunction in the assembly process, either of which increases the probability that the next disk manufactured will be likewise defective.
As another example of the importance, yet elusiveness, of independence, consider the following exercise from a statistics text (Mendenhall and Sincich 1995):

The nuclear mishap on Three Mile Island near Harrisburg, Pennsylvania, on March 28, 1979, forced many local residents to evacuate their homes -- some temporarily, others permanently. In order to assess the impact of the accident on the area population, a questionnaire was designed and mailed to a sample of 150 households within 2 weeks after the accident occurred. Residents were asked how they felt both before and after the accident about having some of their electricity generated from nuclear power. The summary results are provided in the table.

ATTITUDE TOWARD NUCLEAR POWER

	Favor	Oppose	Indifferent	Totals
Before accident	62	35	53	150
After accident	52	72	26	150

Construct a 99% confidence interval for the difference in the true proportions of Three Mile Island residents who favor nuclear power before and after the accident.

This exercise appears at the end of a section "Estimation of the Difference Between Two Population Proportions," which correctly stresses the importance of "two independent binomial experiments" (emphasis added). In the context of this exercise, the alert instructor mentions the following items of almost daily concern in real-life statistical work:

a) these experiments are hardly independent -- the samples are actually "pairwise"
b) in class, the student knows what method to use on a problem -- the one in the section at whose end the problem appears! -- but in actual practice, this aid disappears
c) real-life surveys often struggle to achieve 30% response rates, leading to on-going concern "Are the respondents representative of the population being studied?" (Fecso et al 1996). By what means did these researchers achieve a 100% response rate, especially when people were perforce changing addresses on short notice?

RANDOMIZE!

Most experimental design textbooks and instructors emphasize the importance of randomizing experimental units at each stage of a test, in order to avoid as much as possible unwanted bias creeping in. Careful randomization will help to assure that significant differences observed are due to experimental factors and not to unknown "noise" factors.
Effective methods of randomization may not be as forthcoming, however. For example, a naive investigator may assume that for the first stage of an experiment, one can take the first four parts from an assembly line and subject them to Process A, take the next four parts for Process B, and the last four for Process C, since "all parts coming off the line are independent and equal." That assumption may not hold (e.g., environmental conditions may change after part #5 has left the line), and furthermore, is unnecessary. Simply pair each of the twelve parts with the numbers 1 to 12 using a device such as a table of random numbers (found in many statistics texts), then put parts 1-4 in Process A, etc. The idea is to "average out" unknown chance factors among the experimental units.
Even experimenters insulted by the suggestion that they might be "that naive" could fall prey to the notion that they need no further randomization when ready to proceed to Stage 2. Thus, one might choose any two parts from each of Processes A, B, and C to subject to Process D, leaving the remaining six for Process E, believing (perhaps subconsciously) that each set of six parts has been chosen "randomly." The best practice is never assume that one can make random selections using "common sense," rather always use a "mindless" random number generator which is unaffected by the complex human psyche and its agendas both conscious and unconscious.

WHICH CONDITIONAL PROBABILITY?

The probability that A will occur, given that B occurred may be very different from the probability that B occurred, given that A occurred (Paulos 1990). The former is typically written "P(A|B);" the latter, "P(B|A)." As a practical example, consider a test capable of identifying employees suitable for a given position with 80% accuracy. That is, given that an employee is suitable, the test indicates suitability with probability 0.8, and given that an employee is unsuitable, the test indicates unsuitability with probability 0.8. Suppose that 1000 prospective employees take the test; unknown to the hiring official, 10% (100 of them) are suitable. The total number of test-takers judged suitable will be 80 suitable ones (80% of the 100) and 180 unsuitable ones (20% of the remaining 900). Hence, the probability that an applicant who "passed" the test is suitable is 80/(80+180), or less than one-third. The vendors selling the test will emphasize: P(test indicates suitability|suitability) = 0.8. But the hiring officer is more interested in the much smaller: P(suitability|test indicates suitability) = 0.31.
Given this inequality of conditional probabilities, the instructor should emphasize it by an example such as this.

NORMALITY ISN'T NORMALCY

A common textbook exercise is a clone of this one (Hines and Montgomery 1980):

The diameters of bolts produced by a certain manufacturing process are known to have a standard deviation of 0.0001 inch. A random sample of 10 bolts yields an average diameter of 0.2546 inch. Test the hypothesis that the true mean diameter of bolts equals 0.2550 inch, using a= 0.05.

Several important issues are closely related to this exercise. The instructor might well ask "How does the working statistician know the diameters are normally distributed?" Good answers would be the assurance provided by a bell-shaped histogram (again, note the importance of making pictures of the data before number-crunching it), the results of a chi-square goodness-of-fit test (usually covered in a first course), or results of a Kolmogorov-Smirnov or Anderson-Darling goodness-of-fit test (usually not covered in a first course). Next comes the question "If the diameters aren't normally distributed, then how can the hypothesis in the original question be tested?" Good answers would be "Take a more advanced class to learn the use of nonparametric methods" or "Seek the help of a statistician who knows the applicability and use of nonparametric methods." In summary, normality isn’t always normal, unless the data points are themselves averages, in which case the Central Limit Theorem (the distribution of averages is approximately normal and becomes more so as the sample size increases) applies.

HOW MANY SAMPLES DO I NEED?

Statisticians in industry soon learn that the warning they received from their instructors that engineers will come to them after an experiment has yielded data and ask for analysis, without the benefit of assisting in the design stage and providing input on suitable techniques, was warranted. But they might also discover that would-be experimenters do come to statisticians beforehand with that one all-important question, eager to learn one magical quantity as if the mystery of the universe could be unlocked if one only knew: HOW MANY SAMPLES DO I NEED?
Given an answer (provided the budget will support it -- statisticians, of course, always want unreasonably large sample sizes), the erstwhile engineer will happily trot off to carry out his or her DOE [Design Of Experiment], believing that accurate results are assured thanks to adequate sample size alone and that he or she needn’t worry about other data integrity issues. Perhaps the statistician should not be held responsible for such behavior.
On the other hand, how one responds to the request for sample size is perhaps as important as providing a "good" one (based on textbook formulas requiring an estimate of variation and desired confidence level, for instance). A key issue here is that a large but biased sample is worth less than a smaller but representative, unbiased one, and may in fact lead to disastrously wrong conclusions. Hence, stress to the experimenter that obtaining a sample truly representative of the system to be tested is more important than achieving a certain sample size, tempered with the understanding that inferring significant differences which are in fact present but small in magnitude is made with more confidence the larger the sample.
It cannot be stressed enough that excellent analysis cannot compensate for poor data. Statistical results can be no better than the data used to create them.

EXPERIMENTAL UNITS VS. REPLICATES

Even experienced researchers (and yes, even statisticians!) can be confused about the distinction between experimental units and replicates. Suppose an engineer wants to test the corrosion resistance of a car door hem design with and without the external sealer and adhesive currently used in production. If the design does as well (or better) without sealer and/or adhesive, the engineer could recommend a cost savings and make his or her manager, who presents the idea to the car program director, a hero.
The engineer prepares four door hems, one with each of the experimental combinations. In order to provide a larger sample size (replicates), and avoid the cost of buying more whole doors, he or she then cuts each door into five pieces, mounts all 20 "samples" onto an experimental buck and begins to cycle them in and out of a humidity chamber, simulating exposure of the doors to a corrosive environment. Upon completion, corrosion is measured on each of the pieces and analysis carried out using ANOVA [Analysis Of Variance] with an assumed total sample size of 20.
Unfortunately, this method did not produce independent replicates for the experiment. The experimental units were the four original doors, and no amount of cutting can increase the sample size beyond four. To truly have replicates which are individual experimental units, individual preparation is required, preferably to whole doors (in a random manner).
In industry it is often too costly to obtain large samples, and decisions need to be made on the basis of a relatively small amount of data. (Here again, we note the importance of the quality of the sample.) But in the above example, using an error term with an artificially acquired number of degrees of freedom can lead to declaring significant differences which do not in reality exist.

HOW MUCH STATISTICAL SIGNIFICANCE YIELDS PRACTICAL SIGNIFICANCE?

Exercises like the one on page 6 are useful springboards for the following discussion: "How did the user decide that a statistical significance level of 0.05 was the threshold of practical significance in the sense of a fork in the managerial decision tree?" In the day-to-day work of a practicing statistician, the user is hesitant to specify such a significance level. In such cases, the statistician serves the user well by conducting a hypothesis test and reporting the observed significance level (p-value) of the test. It is then left to the recipient of the results to decide, at leisure, what action to take based on the reported p-value (Hooke 1983), (Schervish 1996).

JUST ONE MORE INDEPENDENT VARIABLE, PLEASE

Newcomers to regression analysis find the temptation to add independent variables to a model almost irresistible (Kotz and Stroup 1983). Adding another independent variable increases the absolute value of the multiple correlation coefficient. Too often, such an increase becomes an end in itself (in contrast to its proper role of being one of several guidelines to model refinement and improvement). The teacher eager to instruct students in "statistical street smarts" as well as standard analytical methods will present example models having a multiple correlation coefficient near +1.0 or -1.0, yet also suffering from one or more of the following deficiencies:

a) strong linear relationships among the (supposedly) independent variables cause those variables to "fight" among themselves to explain the same variations in the dependent variable
b) inclusion of too many cross-product and/or power (quadratic, cubic) terms causes the response function to oscillate violently between observed data points, rendering it useless for the common need of interpolation
c) the predictions of the model ostensibly (and nonsensically) appear more precise than the data gathered to build the model in the first place.

As a rule of thumb, to avoid overfitting a regression model, collect at least 4*(k+1) data points to fit a model with k predictor variables, and strive for a parsimonious model -- one which fits well with as few predictor variables as possible.

BEWARE EXTRAPOLATION

Discovering that a statistical model fits collected data very well can be exhilarating for engineer and statistician alike, especially when confirmed by further verification testing. Before vital decisions are made on the basis of the model, however, the wise statistician concerned with the reputation of the consultee (not to mention his or her own) will be sure to caution that results, however certain, are known by this analysis to apply only to the population from which the sample data were drawn.
Extrapolation outside the range of the data used to fit a model is risky at best and disastrous at worst. For example, suppose it is shown that the heat resistance of a material improves linearly with the addition of a certain inexpensive additive, with percent added tested from 2% to 5% in increments of 0.25%. Elated, the manager orders the material process to run with 8% additive, calculating with the help of the model (shown to be at least 95% accurate with 99% confidence) that the heat resistance property of the resultant material will then be superior to that of their competitors. What the analysis which produced the model could not show, unfortunately, was that for this material heat resistance begins to decline exponentially when more than 6.4% of the additive is used. (For that matter, how do we know that optimum resistance does not occur for a 1% solution?)
Perhaps the illustration given above is simplistic, but the warning implied is clear. Recipients of data analysis which supports a point of view favorable from their perspective can be over-eager to extend the ramifications beyond what can be reasonably justified from the sample tested. The moral: always be aware of the population for which inferences are desired, and take pains to obtain a sample to test which is truly representative of that population. Failing that, put in writing the risks inherent in unwarranted extrapolations.

"MANY A SLIP BETWIXT TEXT AND SOFTWARE"

Well might students gasp in dismay when the instructor, using a statistical package, produces a nonsensical answer to this routine exercise:

The lifetime y (in hours) of the central processing unit of a certain type of microcomputer is an exponential random variable with parameter b = 1,000. What is the probability that a central processing unit will have a lifetime of at least 2,000 hours?

Given this dismay, students are well-motivated to learn the cause: the textbook is using the exponential density function

f(y) = e^-y/b/b

The statistical software package (yes, let’s open its manual!) is using the exponential density function

f(y) = be^-by

Hence, each b is the reciprocal of the other.
Now, students are receptive to the sermon:
Wrong: Rush to assume that textbooks and statistical-software manuals use
    exactly matching formulas.
Right:     Check the formulas against each other to avoid overlooking needed
     transformations.

ACCURATE USE OF STATISTICAL TABLES

A typical binomial-distribution exercise (reducing to "what is the probability of success on exactly 7 of 10 independent trials, given that p [probability of success on any one trial] = 0.9?") provides an opening for yet another sermon of practical importance. Alert students notice that the back-of-the-book answer for this exercise is incorrect, but would be correct if the question referred to 8 instead of 7 successes. The wise instructor now presents the wrong and right ways to read tables such as a typical binomial-probability table:
Wrong: While walking down the aisle and conferring with a co-worker, open
     the book to the table, run an index finger approximately horizontally
     across the table, and use the number near it.
Right:     Sit in a chair next to a desk or table. Open the book flat on the table.
     Using a ruler or the straight edge of a blank sheet of paper as a "mask,"
     read the needed entry from precisely the row and column of the table
     pertinent to the problem.
Similar cautions are in order for other tables. For example, before using a normal probability table, the practicing statistician checks its legend -- does the shaded area begin at x = minus infinity or at x = 0?

BEWARE THE "HAWTHORNE EFFECT"

A typical exercise might present a list of numbers representing cycle times of an assembly-line worker performing a manual operation, and request the mean and standard deviation of those numbers. How effortlessly such data appear between the covers of a textbook! However, in practice, the statistician needs to ask "How were these data collected?" A typical answer is "The process engineer, clipboard and stopwatch in hand, stood beside the worker." Enter the Hawthorne effect -- the very act of gathering the data affects it (Thurkow 1996). Very likely, the worker, consciously or unconsciously, performed the operation at atypical speed. To explain the Hawthorne effect to students, a physical analogy is useful. Visualize measuring the air pressure in a tire -- the very act of pressing the tire gauge against the valve stem allows air to escape from the tire, slightly lowering the pressure. Hence the working statistician needs to intervene before the operational data is collected, asking "How can these data be collected unobtrusively?" and explaining that the best statistical analyses, applied to atypical data, will lead to misleading conclusions.

WORKING WITHIN ORGANIZATIONAL REALITY

The most rigorously correct statistical results are useless, and the time spent to obtain them wasted, if they are disbelieved and consequently lead to no beneficial action. As an example, consider the following canonical chi-square exercise:

Production Line A B C D

Parts produced 100 400 300 150

Number defective 3 45 5 4

Does the proportion of defective parts differ among the lines?

The answer in the back of the book will certainly be "Yes," with a suitably high value of the chi-square statistic. In workaday practice, a statistician who presented this result to the supervisor of Line B would quickly hear "Of course you come to see me. Of course we had more defectives, since we produced more parts. It just goes to show ‘You can prove anything with enough statistics.’" The expert statistician, wise by experience, would not only be aware of the ability of the chi-square test to work with fixed marginal totals, but would be able to explain the value of using that approach to a non-statistician. This situation is typical of many in which the "street-smart" statistical analyst aims not for theoretical perfection, but for the best business outcome given the organizational constraints sometimes imposed on direct application of statistical theory and methods.

SUMMARY AND CONCLUSIONS

All of the above are examples, actually met in practical work, of what the competent statistical analyst or consultant must do to extend the benefits of statistics from the textbook or research journal to the office, store, factory, laboratory, clinic, or warehouse. As such, they provide useful guidance to the statistician, manager, educator, or the recipient of statistical results and concurrent consultation.

ACKNOWLEDGMENTS

Robert M. Czech and John M. Dennis, of Quality & Product Information Systems, Ford Motor Company, made valuable criticisms to improve the clarity and presentation of this paper.

AUTHOR BIOGRAPHIES

Edward J. Williams holds bachelor’s and master’s degrees in mathematics (Michigan State University, 1967; University of Wisconsin, 1968). From 1969 to 1971, he did statistical programming and analysis of biomedical data at Walter Reed Army Hospital, Washington, D.C. He joined Ford in 1972, where he works as a computer software analyst supporting statistical and simulation software. Since 1980, he has also taught evening classes at the University of Michigan, including both undergraduate and graduate statistics classes (supporting software Excel™, Minitab™, SAS™, and SPSS™) and simulation classes (using GPSS/H™, SLAM II™, or SIMAN™). He is a member of the Association for Computing Machinery [ACM] and its Simulation Special Interest Group [SIGSIM], the Institute of Electrical and Electronics Engineers [IEEE], the Society for Computer Simulation [SCS], the Society for Manufacturing Engineers [SME], the Institute of Industrial Engineers [IIE], and the American Statistical Association [ASA]. He serves on the editorial board of the International Journal of Industrial Engineering – Applications and Practice.

John Harder earned Master’s degrees in mathematics from Wichita State University (1987) and in statistics from Kansas State University (1992). He taught high school mathematics in Peabody, Kansas (1981-85), and at the college level at Wichita State (1985-87), Tarkio College, Missouri (1987-88), and Kansas State (1988-92). At KSU, he was Instructor of Statistics and served as Graduate Student Teaching Coordinator. In July 1992 he joined the Materials, Fasteners and Corrosion Protection Engineering Department at Ford Motor Company in Dearborn, Michigan as statistical analyst. His major responsibilities include planning and leadership of corrosion field surveys and subsequent data analysis and report writing, warranty data analysis, and assisting engineers with DOE’s. At Ford he has also had a few months’ experience with each of the following activities: Analytical Software Support (under Edward Williams), Auto Safety Office, Body and Assembly Quality Office, and Body Engineering Reliability and Statistical Methods. He is a member of the American Statistical Association. John lives in Windsor, Ontario with wife Julie and two pre-teenage daughters.

BIBLIOGRAPHY

Fecso, Ronald S., William D. Kalsbeek, Sharon L. Lohr, Richard L. Scheaffer, Fritz J. Scheuren, and Elizabeth A. Stasny. 1996. "Teaching Survey Sampling." The American Statistician 50(4):328-340.

Hines, William W., and Douglas C. Montgomery. 1980. Probability and Statistics in Engineering and Management Science, second edition. New York, New York: John Wiley & Sons.

Hooke, Robert. 1983. How to Tell the Liars from the Statisticians. New York, New York: Marcel Dekker.

Kotz, Samuel, and Donna F. Stroup. 1983. Educated Guessing. New York, New York: Marcel Dekker.

Mendenhall, William, and Terry Sincich. 1992. Statistics for Engineering and the Sciences, third edition. San Francisco, California: Dellen.

Mendenhall, William, and Terry Sincich. 1995. Statistics for Engineering and the Sciences, fourth edition. Englewood Cliffs, New Jersey: Prentice-Hall, Incorporated.

Paulos, John Allen. 1990. Innumeracy. New York, New York: Vantage.

Schervish, Mark J. 1996. "P Values: What They Are and What They Are Not." The American Statistician 50(3):203-206.

Schmid, Calvin F. 1983. Statistical Graphics. New York, New York: John Wiley & Sons, Incorporated.

Thurkow, Niki M. 1996. "The Use of Security Video Film in Video Work Sampling." In Proceedings of the 1st Annual International Conference on Industrial Engineering Applications and Practice, eds. Jacob Jen-Gwo Chen and Anil Mital, 1145-1149.

Production Line	A	B	C	D
Parts produced	100	400	300	150
Number defective	3	45	5	4