自我价值感缺失的表现_不同类型的缺失价值观和应对方法
自我價值感缺失的表現(xiàn)
Before handling the missing values, we must know what all possible types of it exists in the data science world. Basically there are 3 types to be found everywhere on the web, but in some of the core research papers there is one more type of it. Let me introduce you with all of them very briefly-
在處理缺失值之前,我們必須知道數(shù)據(jù)科學(xué)世界中存在所有可能的類型。 基本上,在網(wǎng)絡(luò)上到處都可以找到3種類型,但是在一些核心研究論文中,還有另外一種類型。 讓我簡單地向大家介紹一下-
Structurally Missing Data- Let me tell you an example where we have the results of the students of a university of a particular semester and out of the entire data, some of the result values were missing. This may happen when either of the students have dropped out before exams or maybe were absent. So, this is a structurally missing value. In this case, the best possible solution is to deduce by inserting 0 at those missing places.
結(jié)構(gòu)上缺失的數(shù)據(jù)-讓我告訴你一個例子,其中我們有特定學(xué)期大學(xué)學(xué)生的成績,而在全部數(shù)據(jù)中,有些結(jié)果值丟失了。 當(dāng)任何一個學(xué)生在考試前輟學(xué)或缺席時,可能會發(fā)生這種情況。 因此,這是結(jié)構(gòu)上缺失的值。 在這種情況下,最好的解決方案是在那些丟失的位置插入0來推斷。
MCAR (Missing Completely at Random)- When missing values are randomly distributed over entire dataset, MCAR occurs in instances where missing data is not related to the scores on the variables in the question and is not related to the scores on any other variables under analysis. For example, when data are missing for respondents for which their questionnaire was lost. Say you have complete data of 15 questions and incomplete data of 10. In this case, we compare these two datasets by some testing say t-test and if we don’t find any difference in means between the two samples of data, we can assume the data to be MCAR.
MCAR(完全隨機缺失)-當(dāng)缺失值隨機分布在整個數(shù)據(jù)集中時,MCAR發(fā)生在以下情況下:缺失數(shù)據(jù)與問題中變量的分?jǐn)?shù)無關(guān),并且與分析中任何其他變量的分?jǐn)?shù)均無關(guān)。 例如,當(dāng)丟失了問卷的受訪者的數(shù)據(jù)丟失時。 假設(shè)您有15個問題的完整數(shù)據(jù),有10個問題的不完整數(shù)據(jù)。在這種情況下,我們通過一些測試(例如t檢驗)比較了這兩個數(shù)據(jù)集,如果我們發(fā)現(xiàn)兩個數(shù)據(jù)樣本之間的均值沒有任何差異,我們可以假設(shè)數(shù)據(jù)為MCAR。
MAR (Missing at Random)- Data is not missing randomly across entire dataset but is missing randomly only within sub samples of data. When the probability of missing data on a variable is related to some other measured variable in the model, but not to the value of the variable with missing value itself is MAR. For example, in an IQ dataset, only older people have missing value. Thus, the probability of missing data on IQ is related to age. Also, to assume this as MAR is difficult because there is no way of testing it.
MAR(隨機丟失)-數(shù)據(jù)在整個數(shù)據(jù)集中并不是隨機丟失的,而是僅在子數(shù)據(jù)樣本內(nèi)隨機丟失的。 當(dāng)變量上缺失數(shù)據(jù)的概率與模型中其他一些測量變量相關(guān),而與缺失值本身無關(guān)的變量值則為MAR。 例如,在IQ數(shù)據(jù)集中,只有老年人的價值缺失。 因此,丟失智商數(shù)據(jù)的可能性與年齡有關(guān)。 而且,很難將其假定為MAR,因為沒有辦法對其進(jìn)行測試。
NMAR (Not Missing at Random)- When the missing data has no structure to it, we can’t treat it as missing at random. It may be the case where we can’t make conclusions to the missing value.
NMAR(隨機丟失)-當(dāng)丟失的數(shù)據(jù)沒有結(jié)構(gòu)時,我們不能將其視為隨機丟失。 在某些情況下,我們無法得出缺失值的結(jié)論。
Some Common Approaches to deal with such type of missing data:
處理此類丟失數(shù)據(jù)的一些常用方法 :
Simple one: Drop the corresponding Column/ Row-
簡單一:刪除相應(yīng)的Column / Row-
If your data size is large and corresponding count of missing values in column/rows are comparatively quite low, then we use this approach.
如果您的數(shù)據(jù)量很大,并且列/行中缺失值的相應(yīng)計數(shù)相對較低,那么我們可以使用這種方法。
2. Imputation- It fills the missing value with some number. The imputed value won’t be exactly right in most cases, but it usually leads to more accurate models than you would get from dropping the column/row entirely. We can name some of the imputation techniques as below:
2.插補-用一些數(shù)字填充缺失值。 在大多數(shù)情況下,推算的值并不完全正確,但是與完全刪除列/行相比,推導(dǎo)的值通常會導(dǎo)致更準(zhǔn)確的模型。 我們可以將一些插補技術(shù)命名為:
a) Mean/Median Imputation: As the name suggests, in this we replace missing values by mean or median of the total. We use this approach when the number of missing observations is low.
a)均值/中位數(shù)插補:顧名思義,在此我們將缺失值替換為總數(shù)的均值或中位數(shù)。 當(dāng)缺少的觀察次數(shù)很少時,我們使用這種方法。
b) Multivariate Imputation by Chained Equations (MICE): It assumes that the missing data are Missing at Random (MAR). It imputes data on a variable-by-variable basis by specifying an imputation model per variable. It uses all the variables in the data for predictions.
b)鏈?zhǔn)椒匠潭嘣?/strong>估計(MICE):它假定丟失的數(shù)據(jù)是隨機丟失(MAR)。 通過為每個變量指定插補模型,它可以逐變量插補數(shù)據(jù)。 它使用數(shù)據(jù)中的所有變量進(jìn)行預(yù)測。
3. Random Forest- Yes, it is also a non-parametric imputation method that works well with both data missing at random and not missing at random. It uses multiple decision trees to estimate missing values and outputs OOB (out of bag) imputation error estimates.
3.隨機森林-是的,它也是一種非參數(shù)插補方法,可以很好地處理隨機丟失的數(shù)據(jù)和隨機丟失的數(shù)據(jù)。 它使用多個決策樹來估計缺失值,并輸出OOB(袋外)估算誤差估計。
However, there are various other efficient methods to handle the missing values as per the given scenario and the type of data. I have discussed here the most common ones with you. Hope it was helpful, thanks for reading! Good luck!! Be safe!!
但是,根據(jù)給定方案和數(shù)據(jù)類型,還有各種其他有效的方法來處理缺失值。 我在這里與您討論了最常見的問題。 希望對您有所幫助,感謝您的閱讀! 祝好運!! 注意安全!!
翻譯自: https://medium.com/analytics-vidhya/different-types-of-missing-values-approaches-to-deal-with-them-1f67c617374c
自我價值感缺失的表現(xiàn)
總結(jié)
以上是生活随笔為你收集整理的自我价值感缺失的表现_不同类型的缺失价值观和应对方法的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 梦到小狗是什么意思周公解梦
- 下一篇: 做梦梦到鞋被别人拿走了好吗