第一名数据科学工作冠状病毒医生
背景 (Background)
3 years ago, I had just finished medical school and started working full-time as a doctor in the UK’s National Health Service (NHS). Now, I work full-time as a data scientist at dunnhumby, writing code for “Big Data” analytics with Python and Spark.
3年前,我剛讀完醫(yī)學(xué)院,并開始在英國國家衛(wèi)生局(NHS)擔(dān)任全職醫(yī)生。 現(xiàn)在,我在dunnhumby從事數(shù)據(jù)科學(xué)家的全職工作 ,使用Python和Spark編寫“大數(shù)據(jù)”分析代碼。
More and more people are making the transition towards data science, or related technical roles, from a variety of disciplines. So in this article I’m going to share my experiences and advice for making a (perhaps) unconventional career transition into a technical role. I can break these down into five main learnings:
越來越多的人正在從各種學(xué)科過渡到數(shù)據(jù)科學(xué)或相關(guān)的技術(shù)角色。 因此,在本文中,我將分享我的經(jīng)驗和建議,以使(也許)非常規(guī)的職業(yè)轉(zhuǎn)變?yōu)榧夹g(shù)角色。 我可以將其分解為五個主要的學(xué)習(xí)內(nèi)容:
(1) 尋找技術(shù)朋友 ((1) Find technical friends)
Coming from a medical background, I didn’t have the first clue about how to develop coding skills or data science understanding. And neither did anybody around me.
來自醫(yī)學(xué)背景,我對如何發(fā)展編碼技能或?qū)?shù)據(jù)科學(xué)的了解沒有第一條線索。 我周圍的人也沒有。
This made it really important for me to branch out and find people who did. I quickly saw the benefits of doing so.
這對我來說很重要的一點是,要找到能干的人。 我很快看到了這樣做的好處。
早期效率低下 (Early inefficiencies)
When I initially started out, I’d have to resort to Google or StackOverflow to try and solve my problems. These are great resources, but it’s hard to find what you want when you don’t really know what you’re looking for.
剛開始時,我不得不求助于Google或StackOverflow嘗試解決問題。 這些都是很棒的資源,但是當(dāng)您真的不知道要尋找什么時,很難找到想要的東西。
One of the top skills a developer needs to know is what to search to find the solution to your current problem. Without that skill, I would spend ages stuck at a relatively simple hurdle — like trying to manipulate a pandas dataframe in a particular way, or how to install and import the package I needed.
開發(fā)人員需要知道的最重要的技能之一是要搜索什么才能找到當(dāng)前問題的解決方案。 沒有這種技能,我會花很多時間在相對簡單的障礙上,例如嘗試以特定方式操縱熊貓數(shù)據(jù)框,或者如何安裝和導(dǎo)入所需的軟件包。
I’d heard that it’s best to think of your own projects as a means to learn. However, without insight into what it takes to build a project, and what’s possible, I would typically come up with over-ambitious projects with too many moving parts. I remember an early project idea was to build a chatbot patient for doctors to practice with, and I even started collecting transcripts from real conversations to help make this. In hindsight, this type of task was way too ambitious for someone of my technical level at that time (having just completed an Intro to Python course).
我聽說最好將自己的項目視為學(xué)習(xí)的一種方式。 但是,如果不了解構(gòu)建項目所需要的內(nèi)容以及可行的方法,那么我通常會提出過于雄心勃勃的項目,其中包含過多的活動部件。 我記得一個早期的項目構(gòu)想是為醫(yī)生創(chuàng)建一個聊天機器人患者以進行練習(xí),我甚至開始從真實的對話中收集成績單以幫助實現(xiàn)這一目標(biāo)。 事后看來,對于我這個技術(shù)水平的人來說,這種任務(wù)太過雄心勃勃了(剛剛完成了Python入門課程)。
在一些朋友的幫助下獲得成功 (Getting by with a little help from some friends)
Having technical friends is great for overcoming both of these sources of inefficiency.
擁有技術(shù)朋友對于克服這兩種效率低下的問題都非常有用。
If you have a relatively simple technical issue, but don’t know where to go to solve it, a technical friend can point you in the right direction pretty quickly. This saves a lot of time and frustration.
如果您有一個相對簡單的技術(shù)問題,但又不知道應(yīng)該去哪里解決,那么技術(shù)朋友可以Swift為您指出正確的方向。 這樣可以節(jié)省大量時間和挫敗感。
Likewise, if you come up with a project idea, you can run it by a technical friend. They’ll be able to break it down into stages and ultimately advise you whether it makes sense to do and how best to go about it. This can save you a lot of time from barking up the wrong tree.
同樣,如果您提出了一個項目構(gòu)想,則可以由技術(shù)朋友來執(zhí)行。 他們將能夠?qū)⑵浞纸鉃槎鄠€階段,并最終建議您這樣做是否有意義以及如何最好地進行。 這樣可以避免樹錯樹皮,從而節(jié)省大量時間。
如何交技術(shù)朋友 (How to make technical friends)
I don’t think there’s a “right” way to find technical friends and establish a relationship where you can ask them for advice. Here are a few principles that I find helpful.
我認(rèn)為沒有找到“技術(shù)朋友”并建立合作關(guān)系的“正確”方法,您可以向他們尋求建議。 這里有一些我認(rèn)為有幫助的原則。
Firstly, being open and honest about my intentions (“I’m learning to code and would love someone I could ping a message to when I get stuck”).
首先,要對自己的意圖保持開放和誠實(“我正在學(xué)習(xí)編碼,并且會愛上一個我可以在遇到困難時向其發(fā)送消息的人”)。
Secondly, being respectful of their time. I pushed myself to only ask for help if I’d truly searched for the solution and spent time trying to solve it myself. (To be honest, I think you also learn better this way.)
其次,尊重他們的時間。 我強迫自己只在尋求解決方案時才尋求幫助,并花時間嘗試自己解決問題。 (說實話,我認(rèn)為您也可以通過這種方式學(xué)習(xí)得更好。)
This wasn’t always easy. Sometimes I’d hit the initial wall of frustration and feel an urge to send off multiple messages, hoping for a quick solution. I tried to have a good crack myself first, but I’ll admit I sometimes caved in.
這并不總是那么容易。 有時,我會遇到挫折感,并發(fā)出發(fā)送多條消息的渴望,希望能有一個快速的解決方案。 我試圖先做好自己的準(zhǔn)備,但我承認(rèn)有時我會屈服。
It was also helpful to have multiple people I could go to. I didn’t have to keep bugging the same person, reducing the risk of annoying them.
有多個人可以去找我也很有幫助。 我不必一直煩擾同一個人,從而減少了使他們煩惱的風(fēng)險。
I personally met these friends from multiple places; from attending events that interested me (such as data science and machine learning meet-ups), from working on projects together (more on that in Section 2 and 3) and from a smattering of formal ‘networking’, friends-of-friends and random LinkedIn messages.
我親自從多個地方認(rèn)識了這些朋友。 從參加令我感興趣的活動(例如數(shù)據(jù)科學(xué)和機器學(xué)習(xí)聚會),從一起開展項目(在第2節(jié)和第3節(jié)中有更多討論),以及從少量的正式“網(wǎng)絡(luò)”,朋友的朋友和隨機LinkedIn消息。
At some point, when it felt comfortable, I’d reach out for advice on a specific problem or project I was working on. Sometimes the problems were simple or the project ideas were bad, so I had to put my pride to the side and seek out the constructive criticism.
在某個時候,當(dāng)感覺舒適時,我會就我正在研究的特定問題或項目尋求建議。 有時問題很簡單,或者項目思路不好,所以我不得不放下自己的驕傲,去尋求建設(shè)性的批評。
If you’re starting out, and looking for a technical friend to help you get started, feel free to reach out (email at hi@chrislovejoy.me, Twitter @ChrisLovejoy_).
如果您是 新手 , 并正在尋找技術(shù)朋友來幫助您入門,請隨時與我們聯(lián)系(發(fā)送電子郵件至 hi@chrislovejoy.me ,Twitter @ChrisLovejoy_ )。
(2)建立投資組合:3的法則 ((2) Build a portfolio: the rule of 3)
One of the first pieces of advice I received from a mentor is one that has stuck with me since:
我從導(dǎo)師那里收到的第一批建議之一是自從我以來一直堅持的建議:
Have a portfolio of three great projects.
擁有三個偉大項目的投資組合。
Initially, for me, this just meant striving to get three projects under my belt. I went to hackathons, interned at companies and designed my own.
最初,對我而言,這只是意味著要努力完成三個項目。 我參加了黑客馬拉松,在公司實習(xí)并設(shè)計了自己的游戲。
Once I achieved that, I kept looking at how I could build on them or replace them with new and cooler projects.
一旦實現(xiàn)這一目標(biāo),我就會一直在研究如何在它們之上構(gòu)建或用新的更酷的項目替換它們。
It’s so simple, but I’ve found it a useful way to frame it. You’re only as good as your top three projects.
它非常簡單,但是我發(fā)現(xiàn)它是一種有用的框架。 您僅與前三個項目一樣出色。
To this day, the lower right-hand side of my CV is dedicated to my top three projects at that moment in time.
時至今日,我的簡歷的右下側(cè)致力于當(dāng)時的前三個項目。
If a round of job applications comes up, or if somebody asks me about work I’ve done, I know what to talk about. These three projects are always in mind.
如果出現(xiàn)一輪求職申請,或者有人問我完成的工作,我知道該談?wù)撔┦裁础?始終牢記這三個項目。
獲得項目 (Getting projects)
I’m a big advocate of designing our own projects, both for learning and for potentially contributing to community. But it’s not always easy, particularly when starting out.
我大力倡導(dǎo)設(shè)計我們自己的項目 ,既用于學(xué)習(xí)又可能對社區(qū)做出貢獻。 但這并不總是那么容易,尤其是剛開始時。
A good source of ‘ready-made’ projects to work on is Kaggle. They provide a dataset and often specific challenges. You can also see other people’s solutions, which is a great source of learning.
Kaggle是可以進行“現(xiàn)成”項目的一個很好的資源。 它們提供了數(shù)據(jù)集,并且通常提供特定的挑戰(zhàn)。 您還可以看到其他人的解決方案,這是學(xué)習(xí)的重要資源。
A great way to devise a project within a team is to attend hackathons. These are typically weekend sprints to develop a solution to a problem and are held in most major cities around the world.
在團隊中設(shè)計項目的一種好方法是參加黑客馬拉松。 這些通常是周末沖刺,用于開發(fā)問題的解決方案,并且在全球大多數(shù)主要城市中舉行。
One thing I found really helpful was attending a project-based course. So much so, that I’ll devote the next section to it.
我發(fā)現(xiàn)真正有用的一件事是參加基于項目的課程。 如此之多,我將在下一節(jié)中進行介紹。
(3)如果可以的話,參加基于項目的課程或訓(xùn)練營 ((3) Go on a project-based course or bootcamp if you can)
Even with all the motivation in the world and a great team of technical mentors, it can still be challenging to build great projects and learn new skills off your own back. There’s a lot to be said for being in the right environment, and having good projects being defined for you.
即使擁有世界上所有的動力和強大的技術(shù)導(dǎo)師團隊,建立出色的項目和學(xué)習(xí)新技能仍然具有挑戰(zhàn)性。 在正確的環(huán)境中,要為您定義好的項目有很多話要說。
The best place for this would be to get a full-time job in a technical role. However, it’s not always possible to jump straight into this.
最好的地方是擔(dān)任技術(shù)職位的全職工作。 但是,并非總是可能直接跳入這一步。
A really great intermediate step can be to go on a project-based course.
真正偉大的中間步驟可以是參加基于項目的課程。
These typically range between around 5 weeks to few months and are centred around a group project that produces a tangible output. There’s typically a partnership with a commercial client who has a genuine interest in what you are building.
這些時間通常在大約5周到幾個月之間,并且以產(chǎn)生有形產(chǎn)出的小組項目為中心。 通常會與對您的建筑有真正興趣的商業(yè)客戶建立合作關(guān)系。
I went on the “Science to Data Science” (S2DS) virtual course. I found it really helpful having a defined project and having responsive technical mentors to go to for any problems that I arose.
我參加了“科學(xué)到數(shù)據(jù)科學(xué)”(S2DS)虛擬課程。 我發(fā)現(xiàn)有一個明確的項目并讓響應(yīng)的技術(shù)顧問解決我遇到的任何問題真的很有幫助。
I learnt a huge amount during the project; in particular I learnt how to structure source code, became more familiar with github, gained better understanding of regression performance metrics and learnt PEP-8 Python coding guidelines. (I’ll be sharing a full post on this in future.)
我在項目期間學(xué)到了很多東西; 特別是,我學(xué)習(xí)了如何構(gòu)建源代碼,對github更加熟悉,對回歸性能指標(biāo)有了更好的了解,并學(xué)習(xí)了PEP-8 Python編碼指南。 (以后,我將分享一篇完整的帖子。)
Another course I’ve heard good things about is the ‘ASI Data Science’ course (who I think have now re-branded as ‘faculty.ai’).
我聽說過的另一門很好的課程是“ ASI數(shù)據(jù)科學(xué)”課程(我認(rèn)為該課程現(xiàn)已更名為“ faculty.ai” )。
Note: I’m based in the UK. S2DS is international. I’m not sure about ASI. I’m sure there are programs abroad that I’m not familiar with.
注意:我是英國人。 S2DS是國際性的。 我不確定ASI。 我確定國外有一些我不熟悉的程序。
One word of warning is that there are a lot of courses which, in my opinion, over-charge. This is a reflection of the area having become popular, but it isn’t necessarily a reflection of the value that courses offer. The S2DS course I attended cost £800, which felt like fantastic value after attending, but I’ve seen many courses in the £5,000+ range.
值得一提的是,我認(rèn)為很多課程收費過高。 這反映出該領(lǐng)域變得越來越流行,但這并不一定反映課程提供的價值。 我參加的S2DS課程費用為800英鎊,參加該課程后感覺像是物超所值,但我看過很多課程都在5,000英鎊以上。
(4)深入了解核心概念 ((4) Nail down understanding of core concepts)
Data science is a really big (and expanding) field. There’s a huge amount that you could learn and it’s easy to be overwhelmed when starting out.
數(shù)據(jù)科學(xué)是一個很大的領(lǐng)域(并且正在擴展)。 您可以學(xué)到很多東西,起步時很容易不知所措。
My approach has been to work towards a solid understanding of (i) core concepts and (ii) my specific areas of interest.
我的方法是努力對(i)核心概念和(ii)我的特定興趣領(lǐng)域有扎實的理解。
So I guess the question is: what constitutes a ‘core concept?
所以我想問題是:什么構(gòu)成“核心概念”?
I can’t claim to be an authority on what you should know, but these pages appear useful:
我不能聲稱是您應(yīng)該知道的權(quán)威,但是這些頁面似乎很有用:
Essential Math for Data Science
數(shù)據(jù)科學(xué)基礎(chǔ)數(shù)學(xué)
10 Essential Skills You Need to Know to Start Doing Data Science
開始進行數(shù)據(jù)科學(xué)需要了解的10個基本技能
If I were to suggest core concepts and skills, it would be something like:
如果我要提出核心概念和技能,那將是:
PROGRAMMING:
編程:
- comfortable with python, pandas, numpy and scikit-learn 熟悉python,pandas,numpy和scikit-learn
 - familiar with GitHub 熟悉GitHub
 - familiar with the command line 熟悉命令行
 - familiar with installing packages 熟悉安裝軟件包
 
THEORY:
理論:
- classification algorithms (SVMs, random forest, logistic regression, AdaBoost): higher level principles of how they work 分類算法(SVM,隨機森林,邏輯回歸,AdaBoost):它們工作方式的更高層??次原則
 - regression algorithms (linear/OLS regression, lasso and ridge regression, regression trees): higher level principles of how they work, and common considerations 回歸算法(線性/ OLS回歸,套索和嶺回歸,回歸樹):它們?nèi)绾喂ぷ鞯母邔哟卧瓌t以及常見注意事項
 - performance measures for classification and for regression algorithms 分類和回歸算法的性能指標(biāo)
 - a familiarity with neural networks and deep learning 熟悉神經(jīng)網(wǎng)絡(luò)和深度學(xué)習(xí)
 - clustering algorithms: K means and hierarchical clustering 聚類算法:K均值和層次聚類
 - familiar with dimensionality reduction techniques such as PCA 熟悉降維技術(shù),例如PCA
 
I’d suggest using online courses to build up your understanding in each of these key areas. Ones I found helpful were Brilliant.org and Khan academy for principles and maths, plus various courses on Coursera and Udemy for the more technical aspects.
我建議您使用在線課程來建立您對這些關(guān)鍵領(lǐng)域的了解。 我發(fā)現(xiàn)有幫助的是Brilliant.org和Khan學(xué)院的原理和數(shù)學(xué)課程,以及有關(guān)Coursera和Udemy的各種技術(shù)方面的課程。
As for the skills on top of this, I’d argue it depends on the industry of interest and type of work you’ll be doing. If working with Big Data, Apache Spark will be helpful. If working with time series, familiarity with ARIMA models will be helpful.
至于最重要的技能,我認(rèn)為這取決于感興趣的行業(yè)和您將要從事的工作類型。 如果使用大數(shù)據(jù), Apache Spark將很有幫助。 如果使用時間序列,熟悉ARIMA模型將很有幫助。
(5)碩士學(xué)位(或其他正式資格)不是必需的,但可以幫助您 ((5) A Master’s Degree (or other formal qualification) isn’t essential, but can help)
There are a lot of data science roles that don’t specify a master’s degree as a formal requirement, and I think it’s possible to get a job without one.
數(shù)據(jù)科學(xué)中有很多角色并未將碩士學(xué)位指定為正式要求,而且我認(rèn)為沒有一份工作也有可能。
However, for me, and coming from an unconventional background, I found it hard to get taken seriously until I started working towards one.
但是,對我而言,由于來自非常規(guī)背景,我發(fā)現(xiàn)很難認(rèn)真對待它,直到我開始朝著一個方向努力。
I think if my bachelor’s degree was more directly relevant to data science (rather than Medicine), I may not have needed to. A good bachelor’s + proof of practical experience is sufficient in many cases.
我認(rèn)為,如果我的學(xué)士學(xué)位與數(shù)據(jù)科學(xué)(而不是醫(yī)學(xué))更直接相關(guān),那么我可能并不需要。 在許多情況下,具有良好的學(xué)士+實踐經(jīng)驗證明就足夠了。
I ended up choosing a master’s degree in Data Science and Machine Learning at UCL in London. Even before starting it, I was pretty confident that I had a good grasp of key concepts from my own self-study.
我最終選擇了倫敦UCL的數(shù)據(jù)科學(xué)和機器學(xué)習(xí)碩士學(xué)位。 甚至在開始之前,我就對自己的自學(xué)掌握了關(guān)鍵概念非常有信心。
But a lot of job applications led to straight-out rejection, so I never had the chance to prove myself. And I can completely understand why. If you have a lot of applications for a role, the one that says “I’m a full-time doctor, but have done loads of data science self-study” is a pretty easy one to remove from the pile.
但是很多工作申請導(dǎo)致了被拒絕,所以我從來沒有機會證明自己。 我完全可以理解為什么。 如果您有很多職位申請,那么說“我是一名專職醫(yī)生,但已經(jīng)完成了大量的數(shù)據(jù)科學(xué)自學(xué)”的人很容易從職位中刪除。
The master’s didn’t guarantee I’d progress to an interview, but I definitely felt it helped me get a foot in the door.
師父并不能保證我會接受面試,但是我絕對覺得這可以幫助我踏進門。
(Whether or not I’m fully behind paying X thousand for a master’s degree in our current climate of remote courses is a matter for another day, however…)
(在目前的遠程課程環(huán)境下,是否要全額支付X 1000的碩士學(xué)位費用,這是另一回事了……)
最后的想法 (Final thoughts)
I’m absolutely loving work as a data scientist. It’s really satisfying to see the hard work pay off, even if the journey was tough at times. It feels great to have two skillsets in my portfolio (as both a doctor and as a data scientist) and one step towards a more ‘portfolio’ future approach to work. I hope that the experiences and suggestions I’ve shared here help you with your transition, too.
我絕對喜歡從事數(shù)據(jù)科學(xué)家的工作。 即使旅途有時很艱難,看到辛勤的工作也會收獲很大的滿足感。 在我的投資組合中擁有兩個技能組(既是醫(yī)生又是數(shù)據(jù)科學(xué)家)真是太好了,朝著更加“組合”的未來工作方法邁出了一步。 我希望我在這里分享的經(jīng)驗和建議也能幫助您過渡。
Best of luck! :)
祝你好運! :)
Many thanks to Abdel Mahmoud and Luke Harries for reviewing this article.
非常感謝Abdel Mahmoud和Lu??ke Harries審閱本文。
翻譯自: https://towardsdatascience.com/first-data-science-job-coronavirus-doctor-b8cf074bae96
總結(jié)
以上是生活随笔為你收集整理的第一名数据科学工作冠状病毒医生的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
                            
                        - 上一篇: 数据分析团队的价值_您的数据科学团队的价
 - 下一篇: 已婚女人梦到小蛇什么预兆