The method you choose for analyzing your data could cause you to draw incorrect conclusions, causing extra data collection, waste and frustration. Most statistical analysis use parametric tests (t-tests, ANOVA, Pearson’s correlation, etc), but there are some limitations to these tests. More often than not, the nonparametric tests (Mann-Whitney, Kruskal-Wallis, Kendall’s Tau, etc) may be the more appropriate and more powerful test to use, with less risk, even if the data fits or is transformed to a normal distribution.
Nonparametric tests use the median instead of the mean, and do not rely on the sample data belonging to any particular probability distribution. Parametric tests are much more popular, but usually require that the data fits a normal (Gaussian) distribution.
If you’ve been through Black Belt training, remember your statistics class from college, or have used statistics recently, you probably learned about these parametric tests. In fact, most Six Sigma training and college classes only cover parametric tests. If the data is non-normal, we usually tell students to let us know, so we can help them transform the data, so it can fit the parametric test requirements.
The problem, especially with environmental data, is that the data being analyzed is often non-normal, so the data gets transformed and manipulated to make it fit, instead of using the more appropriate nonparametric test.
The risk of using a parametric test instead of a nonparametric test is that the p-value on many of the tests will be higher than they should be, which means it will be harder to find significant factor differences that might actually be present.
This oversight has been a mistake, and we need to change how we teach statistics. We are doing a disservice to organizations and industries by NOT teaching nonparametric tests.
This problem was highlighted during a week long training class I took from Dr. Dennis Helsel and Dr. Ed Gilroy from PracticalStats.com. The class was called Applied Environmental Statistics (AES). Here is a link to the outline.
One of the biggest takeaways was that nonparametric tests are not just for non-normal data, but it should be used more often than parametric tests, even if the data appears to fit a normal distribution. Especially with small sample sizes, most normality tests do not have enough “power” in the test to actually determine if the distribution is normal or not, so we should default to non-normal. There is a perception that nonparametric tests are not as powerful as parametric tests. In fact, the only time it is less powerful is when the data perfectly fits a normal distribution. However, this is unlikely to actually occur, and the benefits are only slightly better. For skewed data with outliers, nonparametric methods have a large power advantage, and parametric tests can really get you into trouble, by encouraging you to remove outliers, or draw incorrect conclusions. In addition, traditional parametric analysis may require you to gather many more sample sizes than you need, which wastes time and adds cost. Even more support to use nonparametric tests as the default test and analysis of your data.
What tests should we be using?
The following is a list of some popular parametric and their alternate nonparametric tests that should be used.
|Parametric Tests||Nonparametric Tests|
|Two-sample t-test||Rank-sum (Mann-Whitney)
|Analysis of variance (ANOVA)||Kruskal-Wallis test|
|Levene’s Test||Squared Ranked Test|
A full list of nonparametric tests can be found on Wikipedia.
How do I learn more about non parametric tests?
The AES class was very intense (4.5 days), but it was a great class and I learned a bunch of great techniques, along with the statistical software R! There was also training on charts to look at other than histograms, such as the normal quantile (probability) plots and boxplots.
If you are a Black Belt, you definitely need to take this class. Not only will it give you a good refresher of parametric tests, but you learn about numerous nonparametric tests that are probably more applicable to your data that you think. In fact, nonparametric tests are easier to understand and calculate, so it’s a good class for Green Belts as well.
If you deal with environmental data, you also need to take this class. As we mentioned, environmental data is rarely normally distributed, so these techniques are essential in your studies, and important to know when reviewing other analysis and reports.
If you don’t want to wait for the class, sign up for their newsletter, and review the documentation on their website, such as the “Top 12 Tips in Environmental Statistics” and “What test should I use?”
Contact them directly to find out when the next class will take place. They also offer courses in Multivariate Analysis, Non-detect data, and Time Series Analysis.
If you have an immediate question or have a question about the class, contact us and we’ll try to help you out.
In summary, you should default to nonparametric tests, not just use them when you think the data is non-normal.
What has been your experience with non parametric tests? Have you been taught these techniques, or even heard of them?