《成为一名机器学习工程师》_如何在2020年成为机器学习工程师
《成為一名機器學(xué)習(xí)工程師》
機器學(xué)習(xí)工程 (Machine Learning Engineering)
The title of “Machine Learning Engineer” is quickly becoming more popular and with that, there is significant interest from people trying to enter the Data Science field. What kind of career path is this and what skill set does a Machine Learning Engineer need to have? Is it possible to define steps to take in order to become a ML Engineer? Can you follow online training and get certified? I figured I’d write up my ideas on the state of the field and how viable it is for those looking to pursue a career in it.
“機器學(xué)習(xí)工程師”的稱號Swift變得越來越流行,因此,試圖進入數(shù)據(jù)科學(xué)領(lǐng)域的人們引起了極大的興趣。 這是什么樣的職業(yè)道路 ,機器學(xué)習(xí)工程師需要具備哪些技能 ? 可以定義要成為ML工程師要采取的步驟嗎? 您可以接受在線培訓(xùn)并獲得認(rèn)證嗎? 我想我會就該領(lǐng)域的現(xiàn)狀寫出自己的想法,以及對于那些尋求在該領(lǐng)域謀求職業(yè)的人們來說是多么可行。
機器學(xué)習(xí)工程師 (The Machine Learning Engineer)
Let’s get one point out of the way first. Some might look at the job title and expect it to be a Data Scientist who purely focuses on model building — and that’s it. This is a big no no; if only because most ML Engineering work starts after the initial model is built. While it’s often part of the job, a Machine Learning Engineer does not purely build models. And honestly, that part will only take up 5 to 10% of the job.
首先讓我們指出一點。 有些人可能會看這個職位,并期望它成為純粹專注于模型構(gòu)建的數(shù)據(jù)科學(xué)家,僅此而已。 這是一個很大的不,不是。 僅僅是因為大多數(shù)ML Engineering的工作都是在構(gòu)建初始模型之后開始的。 雖然這通常是工作的一部分,但是機器學(xué)習(xí)工程師并不能純粹建立模型。 老實說,這部分僅占工作的5%到10%。
Look at this image of all the components that are involved in the model ecosystem. The black square at the center? That’s the actual ML code.
查看此模型生態(tài)系統(tǒng)中涉及的所有組件的圖像。 中間的黑色方塊? 那就是實際的ML代碼。
here.這里 。So what kind of creature is the Machine Learning Engineer then and where does it fit into the grand scheme of things? I prefer to fall back on a part of Tomasz Dudek’s definition from 2018:
那么,機器學(xué)習(xí)工程師是什么樣的生物?它在什么宏偉的事物中適合? 我更喜歡從2018年開始引用Tomasz Dudek的定義:
…A person called a machine learning engineer asserts that all production tasks are working properly in terms of actual execution and scheduling, abuses machine learning libraries to their extremes, often adding new functionalities. (They) ensure that data science code is maintainable, scalable and debuggable, automating and abstracting away different repeatable routines that are present in most machine learning tasks. They bring the best software development practices to the data science team and help them speed up their work…
…一個叫機器學(xué)習(xí)工程師的人斷言,所有生產(chǎn)任務(wù)在實際執(zhí)行和調(diào)度方面都正常工作,濫用了機器學(xué)習(xí)庫的極限,經(jīng)常添加新功能。 (他們)確保數(shù)據(jù)科學(xué)代碼是可維護的,可伸縮的和可調(diào)試的,以自動化和抽象化大多數(shù)機器學(xué)習(xí)任務(wù)中存在的不同可重復(fù)例程。 他們?yōu)閿?shù)據(jù)科學(xué)團隊帶來了最佳的軟件開發(fā)實踐,并幫助他們加快了工作速度……
— Tomasz Dudek in “But what is this “machine learning engineer” actually doing?”
-Tomasz Dudek在 “但是,這個“機器學(xué)習(xí)工程師”實際上在做什么? ”
Essentially a ML Engineer is then some kind of wizard that brings models to production in a sensible way, is able to improve the Data Scientist’s models and is also partly an architect who lays the road for the Data Science team. This sounds incredibly like some kind of senior engineering role, and yet it doesn’t have to be.
從本質(zhì)上來說,ML工程師是一種將明智地將模型投入生產(chǎn),能夠改進Data Scientist的模型的向?qū)?#xff0c;并且在某種程度上也是為Data Science團隊鋪路的建筑師。 這聽起來像是某種高級工程角色,但并非必須如此。
常見的ML工程背景。 (Common ML Engineering backgrounds.)
Most of the other ML Engineers I’ve met fall into one of two categories. The first group is highly educated, with most having a master’s or even a PhD in Computer Science, Artificial Intelligence, Data Science or Software Engineering. Surprisingly many are relatively new grads, with 1–3 years of experience under their belt when they became ML Engineers. There’s also a second group that consists of more experienced developers that transitioned into this role from neighboring fields such as Software Engineering or Data Engineering, and of course Data Science.
我遇到的其他大多數(shù)ML工程師都屬于以下兩類之一。 第一組受過高等教育,大多數(shù)人擁有計算機科學(xué),人工智能,數(shù)據(jù)科學(xué)或軟件工程的碩士學(xué)位,甚至博士學(xué)位。 令人驚訝的是,許多是相對較新的應(yīng)屆畢業(yè)生,當(dāng)他們成為ML工程師時擁有1-3年的經(jīng)驗。 還有另一個小組,由經(jīng)驗豐富的開發(fā)人員組成,他們從軟件工程或數(shù)據(jù)工程,當(dāng)然還有數(shù)據(jù)科學(xué)等鄰近領(lǐng)域過渡到這一角色。
This indicates that there is a level of proficiency needed to be a ML Engineer that could come from either of the two directions that make up the role. You could be a great software engineer, or a fantastic machine learning virtuoso. Maybe both! If you are one already, this might be the field for you. If you are not, it might be a viable direction to develop yourself towards.
這表明,要成為ML工程師,需要具備一定的熟練水平,而該水平可能來自于組成該角色的兩個方向中的任何一個。 您可能是一名出色的軟件工程師,或者是出色的機器學(xué)習(xí)專家。 也許兩者! 如果您已經(jīng)是一個人,那么這可能是適合您的領(lǐng)域。 如果您不是,這可能是朝著自己發(fā)展的可行方向。
But do not make the mistake that Software Engineers or Data Scientists automatically make good ML Engineers. I come from a software background myself and I can vouch that most ML concepts and APIs are absolutely alien to Software Engineers. I remember the intense struggles I had getting to know TensorFlow and Theano years ago. Even though I started coding in my teens I had never seen anything like it. The experience was humbling.
但是請不要誤以為軟件工程師或數(shù)據(jù)科學(xué)家會自動成為優(yōu)秀的ML工程師。 我本人來自軟件背景,我可以保證大多數(shù)ML概念和API絕對與軟件工程師無關(guān)。 我記得幾年前我開始了解TensorFlow和Theano時所進行的艱苦奮斗。 即使我從十幾歲開始編碼,也從未見過類似的東西。 經(jīng)驗令人謙卑。
A beginner-level ML Engineer is not a beginner programmer. This is a journey that is always at least traveled with experience. Is it then impossible to land a ML Engineering job without experience or training?
初學(xué)者級ML工程師不是初學(xué)者。 這是一個至少總是有經(jīng)驗的旅程。 那么,沒有經(jīng)驗或沒有培訓(xùn)就不可能找到ML Engineering的工作嗎?
Of course not. However, the odds are against you. It is far easier to get into this niche when you have a similar background. There is some light on the horizon, however.
當(dāng)然不是。 但是,賠率對您不利。 當(dāng)您具有相似的背景時,進入這個利基市場要容易得多。 但是,地平線上有一些亮點。
Remember that back when Data Science started becoming popular we said the same thing about Data Scientists because the people doing Data Science at that time were some of the brightest and most highly-educated people in the world. Since then Data Science has become more accessible and in truth, nowadays you can be a great Data Scientist without needing a PhD. Whether the same will fully apply for ML Engineering I am not sure, but I hope that as our field matures the barriers to entry will become lower.
請記住,當(dāng)數(shù)據(jù)科學(xué)開始流行時,我們對數(shù)據(jù)科學(xué)家也說了同樣的話,因為當(dāng)時從事數(shù)據(jù)科學(xué)的人是世界上最聰明,最受過高等教育的人。 從那時起,數(shù)據(jù)科學(xué)變得越來越容易訪問,實際上,如今,您可以成為一名出色的數(shù)據(jù)科學(xué)家而無需博士學(xué)位。 我不確定這是否會完全適用于ML Engineering,但我希望隨著我們領(lǐng)域的成熟,準(zhǔn)入門檻將會降低。
Data Science. Software Engineering. Probably some linear algebra too. These were the ingredients chosen to create the perfect ML Engineer. Whiteboard creation by author.數(shù)據(jù)科學(xué)。 軟件工程。 也可能是一些線性代數(shù)。 這些都是創(chuàng)建完美的ML工程師所選擇的要素。 由作者創(chuàng)建白板。The toolbelt of the ML Engineer is not simply the lovechild of an intense affair between a Software Engineer’s IDE and a Data Scientist Jupyter Lab. It has many tools and techniques that are intrinsic to the field. Which brings me the next section…
ML工程師的工具帶不僅僅是軟件工程師的IDE與數(shù)據(jù)科學(xué)家Jupyter實驗室之間激烈關(guān)系的摯愛。 它具有該領(lǐng)域固有的許多工具和技術(shù)。 這帶給我下一部分...
機器學(xué)習(xí)工程師的技能 (The Machine Learning Engineer’s Skills)
Skills lists become outdated soon after being written, and often take on a life of their own. And yet I am here to draft up a non-exhaustive list of skills and topics to study! The tool landscape is so broad that it’s unlikely any ML Engineer will have proficiency with every language, tool and concept out there. Please don’t look upon this as some kind of list of items you need to cross off on your ML Engineering journey like so many online resources will instruct you to. Rather, take note and look at these as themes within the ML Engineering field.
技能列表在編寫后很快就過時了,并且往往會過著自己的生活。 但是,我在這里起草了一份不完整的技能和主題列表,以供學(xué)習(xí)! 工具范圍如此之廣,以至于任何ML工程師都不可能精通其中的每種語言,工具和概念。 請不要將此視為您在ML Engineering之旅中需要克服的某些項目清單,就像許多在線資源將指導(dǎo)您這樣做一樣。 相反,請注意并在ML Engineering領(lǐng)域中將它們視為主題。
I’ll try to discuss concepts more than specific tools. That way most of this will remain relevant in a couple of months or years.
我將嘗試討論概念而不是特定工具。 這樣,大多數(shù)情況將在幾個月或幾年后保持相關(guān)性。
數(shù)據(jù)科學(xué) (Data Science)
Python. Look into coding standards and some of the cool stuff in the recent versions of Python. Having a basic understanding of R is also useful and your Data Scientists will thank you for it.
Python。 查看編碼標(biāo)準(zhǔn)和最新版本的Python中的一些很棒的東西。 對R有一個基本的了解也很有用,您的數(shù)據(jù)科學(xué)家將感謝您。
Statistics.
統(tǒng)計。
Model optimization.
模型優(yōu)化。
Model validation.
模型驗證。
ML frameworks such as sci-kit learn
ML框架,例如sci-kit學(xué)習(xí)
Deep learning frameworks such as TensorFlow and PyTorch
深度學(xué)習(xí)框架,例如TensorFlow和PyTorch
ML applications such as NLP, computer vision and time series analysis.
ML應(yīng)用程序,例如NLP,計算機視覺和時間序列分析。
Mathematics. Implicitly, you’ll use a lot of linear algebra and calculus.
數(shù)學(xué) 。 隱式地,您將使用很多線性代數(shù)和微積分。
The reason why I would take Python over R or any other language is mainly because of the production aspect. While you can do a lot with R it is often not supported as well as Python is. There’s also the time aspect that plays here: often it is far faster to productionalize code in Python than R.
之所以選擇Python而不是R或其他任何語言,主要是由于生產(chǎn)方面的原因。 盡管您可以使用R做很多事情,但它通常不像Python那樣受支持。 這里還有時間方面的問題:在Python中進行生產(chǎn)化代碼通常比R要快得多。
軟件工程 (Software engineering)
Experience outside of python in a second programming language, such as Java, C++, or JavaScript.
使用第二種編程語言 (例如Java,C ++或JavaScript)在python之外體驗。
Cloud offerings. More on that later.
云產(chǎn)品 。 以后再說。
Distributed computing
分布式計算
System design and software architecture
系統(tǒng)設(shè)計和軟件架構(gòu)
Data Structures and Algorithms.
數(shù)據(jù)結(jié)構(gòu)和算法。
Databases and the query languages that come with it.
數(shù)據(jù)庫及其附帶的查詢語言。
Containerization (e.g. Docker, KubeFlow)
容器化 (例如Docker,KubeFlow)
Functional programming concepts
函數(shù)式程序設(shè)計概念
Design patterns
設(shè)計模式
Big O
大O
API development
API開發(fā)
Version control: git
版本控制: git
Testing
測試中
Project management. Probably the most underrated element in any SE curriculum.
項目管理 。 可能是所有SE課程中被低估的元素。
CI/CD
CI / CD
MLOps
多播
So how do you learn about all of these if not on the job? Courses and online training can be great but they won’t teach you how to do apply it in a real-life setting. For things like statistics it doesn’t matter, but for technical subjects knowing “about” it is only half of the mastery. It doesn’t take more than a quick glance at Reddit’s r/learnprogramming to see that there are many people struggling to make the jump from coding in the protected IDE in an online course to coding their own projects on their own machine.
那么,如果不在工作中,您如何了解所有這些信息呢? 課程和在線培訓(xùn)可能很棒,但是它們不會教您如何在現(xiàn)實生活中應(yīng)用它。 對于諸如統(tǒng)計之類的事情來說,這并不重要,但是對于了解“大約”的技術(shù)人員來說,這僅僅是精通的一半。 只需一眼就可以看到Reddit的r / learnprogramming ,很多人都在努力從在線課程中的受保護IDE編碼過渡到在自己的機器上編碼自己的項目。
My experience is that it might be better to get started on a project on your own to learn a new skill, and supplement your knowledge with online training when you already have some applied knowledge. Instead of all-in-one training programs there are many tutorials online to help you with that, from building your own clock or calculator to a complete web app. Be aware of any course that promises you can go from zero to hero in a couple of weeks or months.
我的經(jīng)驗是,最好是自己開始一個項目以學(xué)習(xí)新技能,并在已經(jīng)掌握一些應(yīng)用知識的情況下,通過在線培訓(xùn)補充知識。 從構(gòu)建自己的時鐘或計算器到完整的Web應(yīng)用程序,在線上有許多教程可以為您提供幫助,而不是一站式培訓(xùn)計劃。 請注意任何可以保證您在幾周或幾個月內(nèi)從零變到英雄的過程。
Certifications are a similar beast. A certification can be particularly valuable if you’re in consulting and want to signal to clients that your skills meet certain standards. Having a certification that corresponds to a client’s tech stack immediately puts you at the front of the pack. However, a certification is worthless without the skills to back this up in the first place. Consider now that you can obtain many certifications without having to code for them and you’ll see where I’m headed. Often, the time spent getting a certification would be better spent just building applications.
認(rèn)證是類似的野獸。 如果您正在咨詢并且想向客戶表明您的技能符合某些標(biāo)準(zhǔn),那么證書特別有價值。 擁有與客戶的技術(shù)堆棧相對應(yīng)的認(rèn)證,將使您立于不敗之地。 但是,如果沒有足夠的技能來首先進行認(rèn)證,那么認(rèn)證就一文不值。 現(xiàn)在考慮一下,您可以獲得許多認(rèn)證,而無需為其編寫代碼,您將看到我的去向。 通常,花費時間來獲得認(rèn)證會更好地花費在構(gòu)建應(yīng)用程序上。
That said, there are some certifications that do carry some merit for ML Engineers, particularly for cloud vendors. Often these require a couple of years of experience deploying applications on their respective platforms, but anyone can pay $100–300 and register for a certification examination. As of 2020, there are three cloud vendors worth mentioning: Azure (Microsoft), GCP (Google), and AWS (Amazon). Here’s a list of certifications they offer that are in the sphere of interest of the ML Engineer.
也就是說,有些認(rèn)證確實對ML工程師(尤其是云供應(yīng)商)具有一定的價值。 通常,這些程序需要在其各自平臺上部署應(yīng)用程序的經(jīng)驗 ,但是任何人都可以支付100-300美元并注冊認(rèn)證考試。 截至2020年,值得一提的有三家云供應(yīng)商:Azure(Microsoft),GCP(Google)和AWS(Amazon)。 這是他們提供的與ML工程師有關(guān)的認(rèn)證列表。
Source資源Microsoft Azure: (Microsoft Azure:)
Microsoft offers associate-level certification for both Data Scientists and AI Engineers, as well as about a dozen other certifications. Some certifications actually require multiple exams, but this is not (yet?) the case for both the Data Scientist and the AI Engineer cert. The certification topics are a little bit superficial, but the exam should not be underestimated.
微軟為數(shù)據(jù)科學(xué)家和AI工程師提供助理級別的認(rèn)證,以及大約十二種其他認(rèn)證。 某些認(rèn)證實際上需要多次考試,但數(shù)據(jù)科學(xué)家和AI工程師證書都還不是(現(xiàn)在呢)。 認(rèn)證主題有些膚淺,但是考試不應(yīng)被低估。
Microsoft Certified: Azure AI Fundamentals
微軟認(rèn)證:Azure AI基礎(chǔ)知識
Azure Data Scientist Associate
Azure數(shù)據(jù)科學(xué)家助理
Azure AI Engineer Associate
Azure AI工程師助理
Google云端平臺: (Google Cloud Platform:)
Google is the challenger when it comes to cloud services and the state of their certification reflects that. At the moment the ML Engineer exam is in beta and no certifications have been awarded yet. The exam takes four (!) hours but is an incredibly comprehensive list of what a ML Engineer’s job is all about. Prior to this certification being introduced, some ML topics fell under the Data Engineer certification, so many ML Engineers, myself included, actually took the Data Engineering certification track.
Google在云服務(wù)方面是挑戰(zhàn)者,其認(rèn)證狀態(tài)反映了這一點。 目前,ML工程師考試尚處于測試階段,尚未獲得任何認(rèn)證。 考試需要四(!)小時,但它是ML工程師的工作內(nèi)容的綜合列表,令人難以置信。 在引入此認(rèn)證之前,某些ML主題屬于Data Engineer認(rèn)證 ,因此包括我在內(nèi)的許多ML Engineer實際上都參加了Data Engineering認(rèn)證。
You could also look at the Google Cloud Architect, Developer or DevOps certification, but these barely touch upon it and might add a little bit of noise on your resume that lines you up for different gigs. I say that as a certified Cloud Architect myself who learned this from experience. On the other hand, it could make your profile a little bit more appealing.
您也可以查看Google Cloud Architect , Developer或DevOps認(rèn)證,但是這些認(rèn)證幾乎沒有涉及到,并且可能會在履歷表中增加一點噪音,使您準(zhǔn)備參加不同的演出。 我說自己是一名通過認(rèn)證的Cloud Architect,他是從經(jīng)驗中學(xué)到的。 另一方面,它可以使您的個人資料更具吸引力。
Google Cloud Certified Professional Data Engineer
Google Cloud認(rèn)證的專業(yè)數(shù)據(jù)工程師
Google Cloud Certified Professional Machine Learning Engineer (currently in beta)
Google Cloud認(rèn)證的專業(yè)機器學(xué)習(xí)工程師 (當(dāng)前處于測試版)
AWS : (AWS:)
Amazon has specific paths for both analytic roles and ML roles. Given that the data analytics certification is almost entirely focused on data processing and reporting, I would propose that only the ML Specialty is of interest to the ML Engineers. Their Machine Learning Specialty’s syllabus covers a lot of ML Engineering topics, though it is not as exhaustive as the Google certification.
Amazon具有分析角色和ML角色的特定路徑。 鑒于數(shù)據(jù)分析認(rèn)證幾乎完全集中在數(shù)據(jù)處理和報告上,我建議ML工程師只對ML專業(yè)感興趣。 他們的機器學(xué)習(xí)專業(yè)課程提綱涵蓋了許多ML工程主題,盡管它不如Google認(rèn)證那么詳盡。
AWS Certified Machine Learning — Specialty
AWS認(rèn)證的機器學(xué)習(xí)-專業(yè)
AWS Certified Developer Associate
AWS認(rèn)證開發(fā)人員助理
那你應(yīng)該選哪一個呢? (So which ones should you get?)
At the moment Amazon is the market leader, with around 60% market share. Azure sits at 30% and GCP at 10%. While the overall market is growing a lot, AWS is slowly losing market share to Google and Microsoft. Google might look like an underdog, but they have a pretty strong track record with AI innovations and their ownership of TensorFlow. Speaking of which, there’s also a certificate for TF. If you’re not forced to use one cloud vendor over the other by an employer I would advice to test out all three with trial accounts and deploying a pet project. Figure out which one you like and also look at what kind of companies use these cloud vendors.
目前,亞馬遜是市場領(lǐng)導(dǎo)者,擁有約60%的市場份額。 Azure占30%,GCP占10%。 盡管整個市場增長Swift,但AWS逐漸失去了Google和Microsoft的市場份額。 Google可能看起來像個失敗者,但他們在AI創(chuàng)新和TensorFlow所有權(quán)方面擁有相當(dāng)不錯的業(yè)績。 說到這, 還有TF的證書。 如果您沒有被雇主強迫使用一個云供應(yīng)商而不是另一個,那么我建議您使用試用帳戶測試這三個云供應(yīng)商并部署一個寵物項目。 找出您喜歡的公司,然后查看哪種公司使用這些云供應(yīng)商。
Why do you need cloud tech at all? Well, eventually Data Science work makes it to production and most of the time it is deployed on a cloud platform. You don’t need to rival the skills of a Cloud Engineer but you should know how to implement ML projects in your chosen platform. They don’t often teach how to navigate {vendor} cloud consoles at a formal place like an university.
為什么根本需要云技術(shù)? 好吧,最終Data Science的工作可以投入生產(chǎn),并且大部分時間都部署在云平臺上。 您不需要與云工程師的技能相抗衡,但是您應(yīng)該知道如何在所選平臺上實施ML項目。 他們通常不會教如何在像大學(xué)這樣的正式場所瀏覽{vendor}云控制臺。
There is a downside to learning a cloud platform: literally not a week goes by without a new cloud product being announced on one of these giants. Keeping up to date with cloud offerings is hard. Having a wide range of certifications also brings into question whether you’re actually up to snuff with all of them.
學(xué)習(xí)云平臺有一個弊端: 從字面上看,沒有一個星期能在這些巨頭之一中宣布新的云產(chǎn)品 。 與云產(chǎn)品保持同步是很難的。 擁有廣泛的認(rèn)證也使您懷疑您是否真的要對所有這些都them之以鼻。
You might have noticed that, so far, I did not link any courses or tutorials. There are already many resources already out there for that. Furthermore, my main point is that the road to becoming a ML Engineer is traveled by doing projects and getting experience in the field, as it is not an entry-level job.
您可能已經(jīng)注意到,到目前為止,我還沒有鏈接任何課程或教程。 為此已經(jīng)有很多資源。 此外,我的主要觀點是,成為ML工程師的道路是通過做項目和獲得現(xiàn)場經(jīng)驗來進行的,因為這不是入門級的工作。
構(gòu)成ML工作的差異 (Variation In What Constitutes ML Work)
When you’ve done all that you should know that there is a world of difference between working at a small company and doing ML than working at FAANG and doing ML. Likewise, there is a lot of variation between working on ‘product’ companies or at consultancies. Similarly, a bank and a start-up are worlds apart in terms of technology adaption.
當(dāng)您完成所有工作后,您應(yīng)該知道在小公司工作和執(zhí)行ML與在FAANG工作和執(zhí)行ML之間存在很大的差異。 同樣,在“產(chǎn)品”公司或咨詢公司之間工作也存在很大差異。 同樣,在技術(shù)適應(yīng)性方面,一家銀行與一家初創(chuàng)企業(yè)是天壤之別。
You’re much more likely to be a jack of all trades at small companies , e.g. being asked to do data engineering, visualization and data science-y work as part of your day to day activities. Larger companies are more likely to hire specific staff that focuses on specific parts of the ML chain, and might even have different types of ML Engineers running around. If you’re at a company that does many different projects you might never go deep into any framework, but get to experience many kinds of tools and domains. These are specific considerations to keep in mind when you’re going to look for your first ML job.
您更有可能成為小公司所有交易的負(fù)責(zé)人,例如被要求在日常活動中進行數(shù)據(jù)工程,可視化和數(shù)據(jù)科學(xué)工作。 較大的公司更有可能雇用專門負(fù)責(zé)ML鏈特定部分的特定人員,甚至可能四處奔波的不同類型的ML工程師。 如果您所在的公司從事許多不同的項目,那么您可能永遠都不會深入研究任何框架,而是會體驗多種工具和領(lǐng)域。 這些是您要尋找第一個ML工作時要記住的特定注意事項。
準(zhǔn)備失敗 (Be Prepared To Fail)
There’s a slightly toxic notion in computer science that good developers hardly ever make mistakes and good code is without bugs. This is complete nonsense and it has led to an epidemic of imposter syndrome. Code is almost never written correct the first time and it is a process that highly depends on the time and money thrown at it. You grow a lot as a engineer in this field, but at the same time the field will grow faster than you. Being a ML Engineer is continuously having to learn new things on the job, and you give up some proficiency in coding by virtue of being in an interdisciplinary field.
在計算機科學(xué)中,有一個略帶毒害的概念,即好的開發(fā)人員幾乎不會犯錯誤,而好的代碼沒有錯誤。 這完全是胡說八道,并導(dǎo)致了冒名頂替綜合癥的流行。 第一次代碼幾乎永遠不會寫對,這是一個高度依賴于時間和金錢的過程。 作為該領(lǐng)域的工程師,您成長很多,但與此同時,該領(lǐng)域的成長速度將比您快。 作為一名ML工程師,不斷需要在工作中學(xué)習(xí)新事物,并且由于處于跨學(xué)科領(lǐng)域,您放棄了一些編碼方面的專業(yè)知識。
I regularly look back at code written a while ago and find that I made mistakes. Sometimes I rewrite it with the knowledge I have right now, or see if I can update an old model to a new version of an API. Some developers swear by completely tossing out dense code and rewriting it from scratch. Over time you develop a feeling to retroactively detect code smells and this is what engineering is about.
我經(jīng)?;仡櫼幌虑耙欢螘r間編寫的代碼,發(fā)現(xiàn)自己犯了錯誤。 有時,我會以現(xiàn)在的知識重寫它,或者看看是否可以將舊模型更新為API的新版本。 一些開發(fā)人員發(fā)誓要完全扔掉密集的代碼并從頭開始重寫它。 隨著時間的流逝,您會產(chǎn)生一種可以追溯地檢測代碼氣味的感覺,這就是工程的意義所在。
Don’t be afraid to make mistakes. Failing is natural, particularly with something as new as ML.
不要害怕犯錯誤。 失敗是很自然的,尤其是像ML這樣的新事物。
機器學(xué)習(xí)工程師的閱讀清單 (A Reading List for the Machine Learning Engineer)
A Machine Learning Engineer’s bookcase. Notice the worn-down copy of Linear Algebra and its applications. Photo by author.機器學(xué)習(xí)工程師的書架。 請注意線性代數(shù)及其應(yīng)用的舊版本。 圖片由作者提供。Although this is by no means meant as a complete list, here are some resources that I feel would benefit those who want to break into this field.
盡管這絕不是完整列表,但我認(rèn)為有些資源對那些希望涉足這一領(lǐng)域的人有利。
Books:
圖書:
Clean code by Robert C. Martin
Robert C. Martin的干凈代碼
Machine learning yearning by Andrew Ng
對機器學(xué)習(xí)的渴望
Deep Learning by Ian Goodfellow, Yoshua Bengio and Aaron Courville
Ian Goodfellow,Yoshua Bengio和Aaron Courville的深度學(xué)習(xí)
The Pragmatic Programmer by David Thomas and Andrew Hunt
大衛(wèi)·托馬斯和安德魯·亨特的實用程序員
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppman
設(shè)計數(shù)據(jù)密集型應(yīng)用程序:可靠,可擴展和可維護的系統(tǒng)背后的大構(gòu)想 by Martin Kleppman
Posts / blogs:
帖子/博客:
How to read scientific papers by Christoph Schmidl
如何閱讀 Christoph Schmidl的科學(xué)論文
Everything you need to know about becoming a self-taught ML Engineer by Jason Benn has some excellent motives.
賈森·本恩(Jason Benn) 成為自學(xué)成才的機器學(xué)習(xí)工程師所需的一切知識,都有一些出色的動機。
Become a Data Scientist in 2020 with these 10 resources by Rahul Agarwan will appeal to those looking for lists of online resources.
Rahul Agarwan撰寫的這10篇資源將成為2020年的數(shù)據(jù)科學(xué)家,這將吸引那些尋求在線資源列表的人們。
Openai’s blog
Openai的博客
Papers:
文件:
Hidden Technical Debt in Machine Learning Systems
機器學(xué)習(xí)系統(tǒng)中的隱藏技術(shù)債務(wù)
Software Engineering for Machine Learning: A Case Study
機器學(xué)習(xí)的軟件工程:案例研究
會議,聚會和ML工程現(xiàn)場 (Conferences, meetups and the ML Engineering Scene)
Some people hate them, others love them. Some use them as a way to self-promote themselves to the max while other people share absolutely brilliant ideas. I like meetups for the simple reason they’ll allow you to cherry pick what kind of topics you want to learn about. Do you like Scala? AI in Healthcare? Meetups. More of a fan of Bayesian optimization? There’s probably a meetup for that. Most of the meetups have gone fully online due to Corona and I expect this to continue through to the first half of 2021 at least.
有些人討厭他們,另一些人愛他們。 有些人用它們作為自我提升的一種方式,而另一些人則分享絕對的絕妙想法。 我喜歡聚會 ,原因很簡單,因為它們使您可以挑選想要學(xué)習(xí)的主題。 你喜歡Scala嗎? 醫(yī)療保健中的AI? 聚會。 更喜歡貝葉斯優(yōu)化嗎? 可能有一個聚會。 由于有電暈,大多數(shù)聚會都已經(jīng)完全在線,我希望至少到2021年上半年為止。
Protip: They’re great for networking if you’re looking to get an internship, job, or a mentor.
Protip :如果您想找實習(xí),工作或?qū)?#xff0c;它們非常適合建立人脈。
其他職業(yè) (Other Careers)
There are a couple of other careers that should be mentioned to those considering moving into this field.
考慮進入該領(lǐng)域的人還應(yīng)該提到其他一些職業(yè)。
Data Engineering: Anything that touches data needs to be able to handle scale and complex transforms. You’re the one specialized in connecting various elements of the data pipeline. There is often a significantly higher demand for your services than there is for those of the Machine Learning Engineer.
數(shù)據(jù)工程 :涉及數(shù)據(jù)的任何事物都必須能夠處理規(guī)模和復(fù)雜的轉(zhuǎn)換。 您是專門研究連接數(shù)據(jù)管道各個元素的人。 對您的服務(wù)的需求通常比對機器學(xué)習(xí)工程師的需求要高得多。
Data Scientist: Analysis, Storytelling, Statistics, Machine Learning and presenting it to the CEO. You got it all. Usually this job is more diverse and involves less programming, but it really depends where you end up — there are so many flavors of data science it is hard to define it as a single role and some data scientists run a complete Data & Analytics department by themselves.
數(shù)據(jù)科學(xué)家:分析,講故事,統(tǒng)計,機器學(xué)習(xí)并將其呈現(xiàn)給CEO。 知道了 通常,這項工作的種類更多,涉及的編程更少,但實際上取決于您的最終目標(biāo)-數(shù)據(jù)科學(xué)的種類繁多,很難將其定義為單一角色,并且一些數(shù)據(jù)科學(xué)家會通過一個完整的數(shù)據(jù)與分析部門他們自己。
Cloud Engineering: Specialized in integrating different applications and moving workflows to the cloud, you’re pretty good friends with the Data and ML Engineers.
云工程:專門集成不同的應(yīng)用程序并將工作流移動到云中,您是數(shù)據(jù)和ML工程師的好朋友。
結(jié)論 (Conclusion)
A Machine Learning Engineer has a broad range of topics to understand from both Machine Learning and Software Development. Courses and certifications don’t bring you there as of 2020. A formal training or experience in the field is still desirable, but I expect that it will become more accessible over time, similar to how Data Science became more open to newcomers. With that in mind, I feel that the best path for those looking to become ML Engineers without formal training would then be to enter Data Science or Software Engineering, and transfer from there while picking up the elements that make up ML Engineering.
機器學(xué)習(xí)工程師有很多主題可以從機器學(xué)習(xí)和軟件開發(fā)中了解。 到2020年,課程和認(rèn)證并不會帶您到那里。仍然需要在該領(lǐng)域進行正式培訓(xùn)或經(jīng)驗,但是我希望隨著時間的流逝,它將變得更加容易獲得,類似于Data Science對新移民更加開放。 考慮到這一點,我認(rèn)為對于那些沒有經(jīng)過正式培訓(xùn)而想要成為ML工程師的人來說,最好的途徑就是進入數(shù)據(jù)科學(xué)或軟件工程,然后從那里轉(zhuǎn)移,同時挑選組成ML Engineering的要素。
結(jié)束語 (Closing words)
As the field moves fast I would like to focus on the ML Engineering skillset and tool landscape in a future post by data scraping job postings and doing some magic on that in order to come up with a more statistically sound analysis of what a ML Engineer could know. Wondering whether you should learn TensorFlow over PyTorch? Stay tuned for that :)
隨著領(lǐng)域的快速發(fā)展,我想在以后的文章中重點介紹ML工程技能和工具領(lǐng)域,方法是抓取數(shù)據(jù)并對其進行一些魔術(shù)處理,以便對ML工程師的工作進行更合理的統(tǒng)計分析。知道。 想知道您是否應(yīng)該通過PyTorch學(xué)習(xí)TensorFlow? 敬請期待:)
With this post, I added to the growing body of Data Science articles. I hope you found it useful. I wanted to write something accessible to those not currently in the field and I hope my two cents will help you in figuring out whether this niche of ML and Engineering is for you.
通過這篇文章,我添加了越來越多的數(shù)據(jù)科學(xué)文章。 希望你覺得它有用。 我想寫一些東西,讓那些目前不在該領(lǐng)域的人可以使用,我希望我的兩分錢能幫助您確定ML和Engineering的利基市場是否適合您。
翻譯自: https://towardsdatascience.com/how-to-become-a-machine-learning-engineer-in-2020-1161aa29261e
《成為一名機器學(xué)習(xí)工程師》
總結(jié)
以上是生活随笔為你收集整理的《成为一名机器学习工程师》_如何在2020年成为机器学习工程师的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 无人机新革命?MIT团队开发出几乎无声的
- 下一篇: FreeBSD 在 2022 年结束时未