工作10年厌倦写代码_厌倦了数据质量讨论?
工作10年厭倦寫代碼
I have been in tons of meetings where data and results of any sort of analysis have been presented. And most meetings have one thing in common, data quality is being challenged and most of the meeting time is used for discussing potential data quality issues. The number one follow up of this meeting is to verify the open question, and we start all over again. Sounds familiar?
我參加過無數次會議,提出了各種分析的數據和結果。 大多數會議有一個共同點, 數據質量正在受到挑戰,而大多數會議時間都用于討論潛在的數據質量問題 。 這次會議的首要跟進工作是核實懸而未決的問題, 我們從頭再來 。 聽起來很熟悉?
It can be different. There are meetings where these discussions don’t take place, or perhaps were started, but immediately taken care of. I have seen and been involved in a few. And there was ONE difference between these types of meetings that I have seen over and over again. The person presenting the data was not on top of their data, was not anticipating and not thinking a step further.
可以不同。 在有些會議中,這些討論沒有進行,也可能沒有開始,但立即得到了處理。 我已經看到并參與了一些。 我一遍又一遍地看到,這些類型的會議之間只有一個區別 。 提供數據的人不在他們的數據之上, 沒有期待 ,也沒有進一步思考 。
The person presenting the data was not on top of their data, was not anticipating and not thinking a step further.
呈現數據的人是 不是自己的數據之上, 并 沒有期待 ,而 不是進一步思考的一個步驟 。
In fact, many of these data quality discussions are not actually data quality issues but an understanding of the meaning of the data. For example its hierarchy or structure, and the interpretation of the metrics. It is very easy when you don’t understand something to blame the data quality, but usually, the issue lies somewhere else.
實際上,許多數據質量討論實際上并不是數據質量問題,而是對數據含義的理解。 例如,其層次結構或結構以及指標的解釋。 這是很容易當你不明白的東西惹的禍 數據質量 ,但通常情況下,問題在于其他地方 。
It is very easy when you don’t understand something to blame the data quality, but usually, the issue lies somewhere else.
這是很容易當你不明白的東西惹的禍 數據質量 ,但通常情況下,問題在于其他地方 。
Let's assume you are working on some exploratory data analysis that you are doing to get started with AI. The key to success is to really understand the data you are working with. If the quality is not up to standard, make it up to standard or find a way to work with the data nonetheless. Be proactive and then it will find it’s a long way.
讓我們假設您正在做一些探索性數據分析,以開始使用AI 。 成功的關鍵是真正了解正在使用的數據 。 如果質量不符合標準,則使其達到標準或找到一種處理數據的方法。 積極主動,然后發現它還有很長的路要走。
1.從小做起 (1. Start small)
The key here is as with so many things to start small. If you are looking at a handful of features you can actually dig into what these features mean. If you are starting off with hundreds, it will be more difficult. Let’s look at the number of products per customer, which is clearly small.
關鍵在于從頭開始有很多事情。 如果您正在查看一些功能 ,則實際上可以深入了解這些功能的含義。 如果您剛開始有數百個,那將更加困難。 讓我們看看每個客戶的產品數量,這顯然很小。
If you are looking at a handful of features you can actually dig into what these features mean
如果您正在查看一些功能,那么您實際上可以深入了解這些功能的含義
2.確保您了解自己的數據 (2. Make sure you understand your data)
Because you started small, you are able to dig deep. Do your correlation plots, look at the frequencies, and read the documentation on these features.
因為從小開始 ,所以您可以深入研究 。 做相關圖,查看頻率,并閱讀這些功能的文檔。
Because you started small, you are able to dig deep and truly understand the data
因為您從小開始 ,所以您能夠深入并真正理解數據
In our example, we basically have two features to look at, two features that actually both have a large potential for discussion. I have once taken about three months to define what is meant with customer, an especially difficult question when working in a B2B environment. Depending on the company you work in, there may be different levels of products used, each of them who can be of interest in a different type of role. A product manager can have a different hierarchy of interest than the head of sales of a region.
在我們的示例中,我們基本上要看兩個功能,實際上兩個功能都有很大的討論潛力 。 我曾經花了大約三個月的時間來定義客戶的含義,這是在B2B環境中工作時特別棘手的問題。 根據您所工作的公司的不同,可能會使用不同級別的產品,每種產品可能會對不同類型的職位感興趣。 產品經理的興趣層次與區域銷售主管的興趣層次可能不同。
3.驗證數據質量 (3. Verify the data quality)
There may be standard ways already that the data quality is checked, and you should understand and be able to explain these. I recommend going a step beyond the usual checks. Check for inconsistencies from a business perspective, are most of the jobs of your customer “Accountant”? Think again, it may be the top selection of the drop-down list. Another typical quality issue is the inconsistency between systems. Be sure you know these inconsistencies, what drives them, and their implications.
可能已經有檢查數據質量的標準方法,您應該理解并能夠解釋這些方法。 我建議超越常規檢查范圍。 從業務角度檢查不一致之處 ,客戶的大部分工作是“會計”嗎? 再想一想,它可能是下拉列表的首選。 另一個典型的質量問題是系統之間的不一致 。 確保您知道這些不一致之處,驅動它們的原因及其含義。
It may be the top selection of the drop-down list
它可能是下拉列表的首選
4.預測問題 (4. Anticipate the issues)
Quite a few questions and issues you can anticipate. What are the questions you typically get? What KPIs have been reported to your audience? What discussions have taken place in the past? Which words are used in the daily discussions? That should for example give you a good sense of the product split you are looking at (spoiler alert, it may well be none of the splits in your data). Make sure you understand the different levels of why they are used and how.
您可以預期的一些問題。 您通常會遇到什么問題? 向您的聽眾報告了哪些KPI ? 過去進行了哪些討論 ? 日常討論中使用哪些詞 ? 例如,這應該可以使您很好地了解要查看的產品拆分(擾流板警報,很可能不是您數據中的任何拆分)。 確保您了解為什么使用它們以及如何使用它們的不同層次。
Anticipating the issues will allow you to divert from the data quality discussion
預計問題將使您從數據質量討論中 轉移出來
In my example, there were many different product hierarchies (from different systems) that were used by different audiences. I have built-in both hierarchies in my dashboard and was able to explain the overlap and differences between the two.
在我的示例中,不同的受眾使用了許多不同的產品層次結構(來自不同的系統)。 我在儀表板上內置了兩個層次結構,并且能夠解釋兩者之間的重疊和差異。
If you find out which systems your audience is using and what data they typically see. Have an upfront discussion with someone you trust to go through the data and results to take out all possible flaws.
如果您找出觀眾使用的系統以及他們通常看到的數據。 與您信任的人進行前期討論,以審閱數據和結果以發現所有可能的缺陷 。
5.了解問題并解決它們 (5. Know the issues and work around them)
Once you know the issues that are there. It’s time to work around them. One way is to tackle the issue at source. It may not be your job but potentially critical for a follow-up project where these features are going to be used.
一旦知道存在的問題。 現在該解決它們了。 一種方法是從源頭上解決問題。 這可能不是您的工作,但對于將要使用這些功能的后續項目而言可能至關重要。
If you are still in the exploratory phase, then you could think of making the issues and assumptions clear. Key will be that you are able to explain them and their implications to gain the trust of your audience.
如果您仍處于探索階段,則可以考慮將問題和假設弄清楚。 關鍵在于您能夠解釋它們及其含義,從而贏得聽眾的信任。
Key will be that you are able to explain the issues and their implications to gain the trust of your audience.
關鍵在于您能夠解釋這些問題及其含義,從而贏得聽眾的信任 。
You are thinking that this is a lot of work? Well think again, once this is sorted you can actually do your job and start creating actionable insights, and take action.
您以為這是很多工作嗎? 再想一想 ,一旦解決了這個問題,您就可以真正完成自己的工作并開始創建可行的見解 ,并采取行動 。
About me: I am an Analytics Consultant and Director of Studies for “AI Management” at a local business school. I am on a mission to help organizations generating business value with AI and creating an environment in which Data Scientists can thrive. Sign up to my newsletter for new articles, insights, and offerings on AI Management here.
關于我:我是當地商學院的分析顧問和“ AI管理”研究總監。 我的使命是幫助組織通過AI創造業務價值,并創造一個數據科學家可以蓬勃發展的環境。 在 此處 注冊我的時事通訊,以獲得有關AI Management的新文章,新見解和新產品 。
翻譯自: https://towardsdatascience.com/tired-of-data-quality-discussions-654106ce2e00
工作10年厭倦寫代碼
總結
以上是生活随笔為你收集整理的工作10年厌倦写代码_厌倦了数据质量讨论?的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: r a/b 测试_R中的A / B测试
- 下一篇: 做梦梦到鬼子是什么意思