netflix 开源_Netflix的Polynote是一个新的开源框架,可用来构建更好的数据科学笔记本
netflix 開源
I recently started a new newsletter focus on AI education. TheSequence is a no-BS( meaning no hype, no news etc) AI-focused newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Please give it a try by subscribing below:
我最近開始了一份有關AI教育的新時事通訊。 TheSequence是無BS(意味著沒有炒作,沒有新聞等),它是專注于AI的新聞通訊,需要5分鐘的閱讀時間。 目標是讓您了解機器學習項目,研究論文和概念的最新動態。 請通過以下訂閱嘗試一下:
Notebooks are the data scientist best friend and can also be a nightmare to work with. For someone accustomed to work with modern integrated develop environments(IDEs), working with notebooks feels like going back decades. Furthermore, modern notebook environments is mostly constrained to Python programs and lack first-class support for other programming languages. A few days ago, Netflix open sourced Polynote, a new notebook environment that addresses some of those challenges.
筆記本電腦是數據科學家最好的朋友,也可能是工作的噩夢。 對于習慣于使用現代集成開發環境(IDE)的人來說,使用筆記本電腦就像回溯幾十年。 此外,現代筆記本環境大多受限于Python程序,并且缺乏對其他編程語言的一流支持。 幾天前, Netflix開源了Polynote ,這是一個可以解決其中一些挑戰的新筆記本環境。
Polynote was born out of the necessity to accelerate data science experimentation at Netflix. Over the years, Netflix has built a world-class machine learning platform mostly based on JVM languages like Scala. The support for those languages in mainstream notebook technologies such as Jupyter is fundamentally basic so they needed a better solutions. Polynote was initiated by that basic requirement but incorporated the lessons learned building one of the most ambitious notebook-based experimentation platforms in the data science world.
Polynote誕生于加速Netflix數據科學實驗的必要性。 多年來,Netflix建立了一個世界級的機器學習平臺,該平臺主要基于Scala等JVM語言。 Jupyter等主流筆記本技術對這些語言的支持從根本上來說是基礎,因此他們需要更好的解決方案。 Polynote是由該基本要求發起的,但結合了所學的經驗教訓,從而建立了數據科學界最雄心勃勃的基于筆記本的實驗平臺之一。
Netflix的筆記本驅動器架構內部 (Inside Netflix’ Notebook Drive Architecture)
Over the last few years, Netflix has transformed its use of data science notebooks from an experimentation artifact to a key component of the lifecycle of machine learning solutions. Initially, Netflix adopted Jupyter Notebooks like a data exploration and analysis tools. However, the engineering team quickly realized that Jupyter offered tangible advantages in terms of runtime abstraction, extensibility, interpretability of the code and debugging that could have a major impact in data science workloads if used correctly. In order to expand the use of Jupyter as a data science runtime, the Netflix team needed to solve a few major challenges:
在過去的幾年中,Netflix已將其對數據科學筆記本的使用從實驗工件轉變為機器學習解決方案生命周期的關鍵組成部分。 最初,Netflix將Jupyter Notebooks用作數據探索和分析工具。 但是,工程團隊很快意識到,Jupyter在運行時抽象,可擴展性,代碼的可解釋性和調試方面提供了明顯的優勢,如果使用得當,它們可能會對數據科學工作量產生重大影響。 為了擴大Jupyter作為數據科學運行時的使用,Netflix團隊需要解決一些主要挑戰:
· The Code-Output Mismatch: Notebooks are frequently changed and, many times, the output you are seeing in the environment does not correspond to the current code.
· 代碼輸出不匹配:筆記本經常更改,并且在許多情況下,您在環境中看到的輸出與當前代碼不對應。
· The Server Requirement: Notebooks typically require a Notebook server runtime to run which represents an architecture challenge when adopted at scale.
· 服務器要求:筆記本計算機通常需要運行筆記本計算機服務器運行時,這在大規模采用時對體系結構提出了挑戰。
· Scheduling: Most data science models need to be executed on a periodic basics but the tools for scheduling Notebooks are still fairly limited.
· 計劃:大多數數據科學模型需要定期執行,但是用于計劃筆記本的工具仍然相當有限。
· Parametrizing: Notebooks are fairly static code-environments and the processes for passing input parameters are far from trivial.
· 參數化:筆記本電腦是相當靜態的代碼環境,傳遞輸入參數的過程絕非易事。
· Integration Testing: Notebooks are isolated code- environments which notoriously difficult to integrate with other Notebooks. As a result, tasks like integration testing become a nightmare when using Notebooks.
· 集成測試:筆記本電腦是孤立的代碼環境,眾所周知,它很難與其他筆記本電腦集成。 因此,使用筆記本電腦時,集成測試等任務將成為噩夢。
To address those requirements, Netflix built a very ambitious architecture that enable the operationalization of Jupyter notebooks. The initial implementation included technologies such as Papermill which enables the parametrization of notebooks.
為了滿足這些要求,Netflix建立了一個雄心勃勃的體系結構,可以使Jupyter筆記本電腦投入運營。 最初的實現包括諸如Papermill之類的技術,這些技術可以實現筆記本的參數化。
Source: https://polynote.org/資料來源: https : //polynote.org/While the initial notebook architecture at Netflix was certainly ambitious, it was also constrained Python programs. Now it was time to expand.
盡管Netflix最初的筆記本架構確實雄心勃勃,但它也限制了Python程序。 現在該擴展了。
輸入Polynote (Entering Polynote)
Polynote is a multi-language notebook experimentation environment. In addition to Python, the current release supports languages such as SQL, Vega(visualizations) and, of course, Scala. The platform is also integrated with data science infrastructures such as Apache Spark. At its core, Polynote includes the following capabilities:
Polynote是一種多語言筆記本實驗環境。 除Python外,當前版本還支持SQL,Vega(visualizations),當然還有Scala等語言。 該平臺還與數據科學基礎架構(例如Apache Spark)集成在一起。 Polynote的核心包括以下功能:
a) Improved Editing Experience: Polynote tries to enable an editing experience closer to modern IDEs.
a) 改進的編輯體驗: Polynote試圖使編輯體驗更接近現代IDE。
b) Multi-Language Support: Polynote introduces first-class support for Scala and other languages used in data science environmenhts.
b) 多語言支持: Polynote引入了對Scala和數據科學環境中使用的其他語言的一流支持。
c) Data Visualization Improvements: Polynote integrates native data visualizations into notebooks’ dataset without the need of adding a lot of code.
c) 數據可視化方面的改進: Polynote將原生數據可視化集成到筆記本的數據集中,而無需添加大量代碼。
d) Configuration and Dependency Management: Languages like Scala require complex package dependencies in its programs. Polynote saves the package dependency configuration within the notebook itself addressing some of the common challenges in this area experienced by JVM developers.
d) 配置和依賴性管理: Scala之類的語言在其程序中需要復雜的軟件包依賴性。 Polynote將程序包依賴項配置保存在筆記本自身中,以解決JVM開發人員在該領域遇到的一些常見挑戰。
e) Reproducibility: The combination of code, data and execution results into a single document makes notebooks powerful, but also difficult to reproduce. Polynote includes reproducibility as a first-class capability of the framework.
e)可復制性:將代碼,數據和執行結果組合到一個文檔中,使筆記本功能強大,但也難以復制。 Polynote將可再現性作為框架的一流功能。
改進的編輯體驗 (Improved Editing Experience)
Polynote includes common features in IDEs such as code auto-completion or syntax error highlighting which improves the experience for data scientists and researchers building Notebooks. More of the editing capabilities are powered by the Monaco editor which powers the experience of Visual Studio Code.
Polynote包含IDE中的常見功能,例如代碼自動完成或語法錯誤突出顯示,從而改善了構建筆記本電腦的數據科學家和研究人員的體驗。 摩納哥編輯器提供了更多的編輯功能,這些功能為Visual Studio Code的體驗提供了支持。
Source: https://polynote.org/資料來源: https : //polynote.org/多國語言支持 (Multi-Language Support)
Polynote does not only provide support for multiple languages but it also allows those languages to be combined in a single program. In Polynote, every cell can be based on a different language. When a cell is run, the kernel provides the available typed input values to the cell’s language interpreter. In turn, the interpreter provides the resulting typed output values back to the kernel. This allows cells in Polynote notebooks to operate within the same context. The example below shows a Python library, to compute an isotonic regression of a dataset generated with Scala.
Polynote不僅提供對多種語言的支持,而且還允許將這些語言組合在一個程序中。 在Polynote中,每個單元格可以基于不同的語言。 當單元運行時,內核將可用的類型化輸入值提供給單元的語言解釋器。 反過來,解釋器將結果輸入的輸出值提供回內核。 這使Polynote筆記本中的單元格可以在相同的上下文中運行。 下面的示例顯示了一個Python庫,用于計算使用Scala生成的數據集的等滲回歸。
Source: https://polynote.org/資料來源: https : //polynote.org/數據可視化改進 (Data Visualization Improvements)
Data visualizations are a common component of most notebook environment. However, Polynote takes the visualization value proposition to another level by including it as a native component of the platform which does not require developers to write any code in order to visually explore a dataset.
數據可視化是大多數筆記本環境的常見組件。 但是,Polynote通過將可視化價值主張包含在平臺的本機組件中,將可視化價值主張提升到了另一個層次,不需要開發人員編寫任何代碼即可直觀地瀏覽數據集。
Source: https://polynote.org/資料來源: https : //polynote.org/配置和依賴性管理 (Configuration and Dependency Management)
Most of the time, data scientists working on notebooks can enjoy the efficiency of Python’s package management model to handle the dependencies of a program. However, in JVM-languages like Scala dependency management can become a total night mare. Polynote addresses that challenge by storing the configuration and dependency information directly in the notebook itself, rather than relying on external files. Additionally, Polynote provides a user-friendly Configuration section where users can set dependencies for each notebook.
大多數時候,從事筆記本工作的數據科學家可以享受Python的包管理模型處理程序依賴關系的效率。 但是,在諸如Scala依賴關系管理之類的JVM語言中,它們可能會變成一頭噩夢。 Polynote通過將配置和相關性信息直接存儲在筆記本本身中而不是依賴于外部文件來解決這一挑戰。 此外,Polynote還提供了一個用戶友好的“配置”部分,用戶可以在其中為每個筆記本設置依賴性。
Source: https://polynote.org/資料來源: https : //polynote.org/重現性 (Reproducibility)
With Polynote, Netflix a new code interpretation block instead of relying on a REPL model like a traditional notebook. One of the key capabilities of the new interpretation model is that it removes hidden states which allows data scientists to copy cells within a notebook without introducing any state from the previous position.
借助Polynote,Netflix有了新的代碼解釋模塊,而不再像傳統筆記本那樣依賴REPL模型。 新解釋模型的關鍵功能之一是,它消除了隱藏狀態,這使數據科學家可以在筆記本中復制單元而無需從先前位置引入任何狀態。
Source: https://polynote.org/資料來源: https : //polynote.org/Polynote is a new release in the ambitious competitive of data science notebooks but one that stands in its own merits. The support for JVM-based languages could make Polynote a favorite of developers working on Spark infrastructures. Also the editing and reproducatility capabilities are definitely welcomed enhancements to traditional notebook environments. Polynote is available in Github and you can also follow the project’s website.
Polynote是在雄心勃勃的數據科學筆記本電腦競爭中推出的新版本,但它有自己的優點。 對基于JVM的語言的支持可能使Polynote成為使用Spark基礎結構的開發人員的最愛。 同樣,編輯和再現性功能無疑是對傳統筆記本環境的增強。 Polynote 在Github中可用,您也可以訪問該項目的網站 。
翻譯自: https://medium.com/dataseries/netflixs-polynote-is-a-new-open-source-framework-to-build-better-data-science-notebooks-4bdab6b8d0ae
netflix 開源
總結
以上是生活随笔為你收集整理的netflix 开源_Netflix的Polynote是一个新的开源框架,可用来构建更好的数据科学笔记本的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 项目中对象存储(OSS、COS、OBS、
- 下一篇: 电场 大学_人工电场优化算法