软件测试 测试停止标准_停止正常测试
軟件測試 測試停止標準
I see a lot of data scientists using tests such as the Shapiro-Wilk test and the Kolmogorov–Smirnov to test for normality. Stop doing this. Just stop. If you’re not yet convinced (and I don’t blame you!), let me show you why these are a waste of your time.
我看到許多數據科學家使用諸如Shapiro-Wilk檢驗和Kolmogorov-Smirnov檢驗之類的檢驗其正態性。 別這樣 停下來。 如果您還沒有說服您(并且我不怪您!),請讓我告訴您為什么這會浪費您的時間。
為什么我們關心正常性? (Why do we care about normality?)
We should care about normality. It’s an important assumption that underpins a wide variety of statistical procedures. We should always be sure of our assumptions and make efforts to check that they are correct. However, normality tests are not the way for us to do this.
我們應該關心正常性。 這是一個重要的假設,可支持多種統計程序。 我們應該始終確保自己的假設,并努力檢查它們是否正確。 但是,正常性測試不是我們執行此操作的方法。
However, in large samples (n > 30) which most of our work as data scientists is based upon the Central Limit Theorem usually applies and we need not worry about the normality of our data. But in cases where it does not apply let’s consider how we can check for normality in a range of different samples.
但是,在大樣本(n> 30)中,我們作為數據科學家所做的大部分工作通常都基于中心極限定理 ,因此我們不必擔心數據的正態性。 但是在不適用的情況下,讓我們考慮如何檢查一系列不同樣本中的正態性。
小樣本的正態性測試 (Normality testing in small samples)
First let us consider a small sample. Say n=10. Let’s look at the histogram for this data.
首先讓我們考慮一個小樣本。 假設n = 10。 讓我們看一下這些數據的直方圖。
Histogram of x (n=10). (Image by author)x的直方圖(n = 10)。 (作者提供的圖片)Is this normally distributed? Doesn’t really look like it — does it? Hopefully you’re with me and accept that this isn’t normally distributed. Now let’s perform the Shapiro-Wilk test on this data.
這是正態分布的嗎? 看起來不是真的嗎? 希望您與我在一起,并接受這種分布不是正態分布。 現在,我們對該數據執行Shapiro-Wilk測試。
Oh. p=0.53. No evidence to suggest that x is not normally distributed. Hmm. What do you conclude then. Well, of course, not being evidence that x is not normally distributed does not mean that x is normally distributed. What’s actually happening is that in small samples the tests are underpowered to detect deviations from normality.
哦。 p = 0.53。 沒有證據表明x不是正態分布。 嗯 那你得出什么結論。 好吧,當然,沒有證據表明x不是正態分布并不意味著x是正態分布。 實際情況是,在小樣本中,測試功能不足以檢測與正常值的偏差。
Normal Q-Q Plot of x (n=10). (Image by author)x的普通QQ圖(n = 10)。 (作者提供的圖片)The best way to assess normality is through the use of a quantile-quantile plot — Q-Q plot for short. If the data is normally distributed we would expect to see a straight line. This data shows some deviation from normality, the line is not very straight. There appears to be some issues in the tail. Admittedly, without more data it is hard to say.
評估正態性的最好方法是使用分位數-分位數圖 (簡稱QQ圖)。 如果數據是正態分布的,我們期望看到一條直線。 該數據表明與正常情況有些偏差,直線不是很直。 尾部似乎有一些問題。 誠然,沒有更多數據很難說。
With this data, I would have concerns about assuming normality as there appears to be some deviation in the Q-Q plot and in the histogram. But, if we had just relied on our normality test, we wouldn’t have picked this up. This is because the test is underpowered in small samples.
有了這些數據,我將擔心假設正態性,因為QQ圖和直方圖中似乎有些偏差。 但是,如果我們只是依靠我們的正態性檢驗,那么我們就不會選擇這種方式。 這是因為小樣本中的測試功能不足。
大樣本的正態性測試 (Normality testing in large samples)
Now let’s take a look at normality testing in a large sample (n=5000). Let’s take a look at a histogram.
現在,讓我們看一下大樣本(n = 5000)中的正態性測試。 讓我們看一下直方圖。
Histogram of x (n=5000). (Image by author)x的直方圖(n = 5000)。 (作者提供的圖片)I hope you’d all agree that this looks to be normally distributed. Okay, so what does the Shapiro-Wilk test say. Bazinga! p=0.001. There’s very strong evidence that x is not normally distributed. Oh dear. Well, let’s take a quick look at our Q-Q plot. Just to double check.
希望大家都同意,這種分布看起來是正態分布的。 好的,Shapiro-Wilk測試怎么說。 巴辛加! p = 0.001。 有非常有力的證據表明x 不是正態分布的。 噢親愛的。 好吧,讓我們快速看一下我們的QQ圖。 只是要仔細檢查。
Normal Q-Q plot for x (n=5000). (Image by author)x的普通QQ圖(n = 5000)。 (作者提供的圖片)Wow. This looks to be normally distributed. In fact, there shouldn’t be any doubt that this is normally distributed. But, the Shapiro-Wilk test says it isn’t.
哇。 這看起來是正態分布的。 其實,不應該有任何懷疑,這是正態分布的。 但是,Shapiro-Wilk測試表明并非如此。
What’s going on here? Well the Shapiro-Wilk test (and other normality tests) are designed to test for theoretical normality (i.e. the perfect bell curve). In small samples these tests are underpowered to detect quite major deviations from normality which can be easily detected through graphical methods. In larger samples these tests will detect even extremely minor deviations from theoretical normality that are not of practical concern.
這里發生了什么? 那么Shapiro-Wilk檢驗(以及其他正態性檢驗)旨在測試理論正態性(即完美的鐘形曲線)。 在小樣本中,這些測試的功能不足,無法檢測到很大的偏離正常值的情況,可以通過圖形方法輕松地檢測出這些偏離。 在較大的樣本中,這些測試甚至可以檢測出與理論正常性無關的極小偏差,而這在實際中并不重要。
結論 (Conclusion)
Hopefully, I have shown you that normality tests are not of practical utility for data scientists. Don’t use them. Forget about them. At best, they are useless; at worst, they are misleading. If you want to assess the normality of some data, use Q-Q plots and histograms. They’ll give you a much clearer picture about the normality of your data.
希望我已經向您展示了正態性測試對數據科學家而言并不實用。 不要使用它們。 算了吧 充其量,它們是無用的。 最糟糕的是,它們具有誤導性。 如果要評估某些數據的正態性,請使用QQ圖和直方圖。 它們將為您提供有關數據正常性的更清晰的畫面。
翻譯自: https://towardsdatascience.com/stop-testing-for-normality-dba96bb73f90
軟件測試 測試停止標準
總結
以上是生活随笔為你收集整理的软件测试 测试停止标准_停止正常测试的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 机器学习学习吴恩达逻辑回归_机器学习基础
- 下一篇: ipm模块供电电压是多少