我如何在咨询项目中使用Vagrant和Docker
By Doug Ashton – Data Scientist, UK
作者:道格·阿什頓(Doug Ashton)–英國數據科學家
Just like you I like to try out all the latest tech. If there’s a new feature in Shiny then I’ll download the latest version without thinking. I’ve currently got 4 versions of R on my laptop, 270 packages, 2 versions of Java, and a number of other open source tools. While being on the cutting edge is part of my job, this conflicts with the need for strict audit and reproducibility requirements that we have for project work.
就像您一樣,我喜歡嘗試所有最新技術。 如果Shiny中有一項新功能,那么我會不加考慮地下載最新版本。 我目前在筆記本電腦上有4個版本的R,270個軟件包,2個Java版本以及許多其他開源工具。 雖然在我的工作中處于最前沿,但這與我們對項目工作的嚴格審核和可重復性要求的需求相矛盾。
One problem with R is that due to the fast changing nature of CRAN it can be difficult to gain a consistent combination of packages across your team and production servers. The R community has responded to this problem with a number of noteworthy packages for managing package libraries, such as packrat, checkpoint, switchr and our own pkgsnap. Another approach is to use the MRAN mirror to freeze CRAN to a particular date.
R的一個問題是,由于CRAN的特性日新月異,因此很難在團隊和生產服務器之間獲得一致的軟件包組合。 R社區已通過許多值得注意的軟件包來解決此問題,這些軟件包用于管理軟件包庫,例如packrat , checkpoint , switchr和我們自己的pkgsnap 。 另一種方法是使用MRAN鏡像將CRAN凍結到特定日期。
A bigger problem is how R is interacting with the various system depenedencies you have installed. At Mango this is why we use continuous integration and unit testing to make sure our results are reproducible on dedicated build servers. Even this can leave you scratching your head when tests don’t match.
一個更大的問題是R如何與您已安裝的各種系統依賴關系進行交互。 在Mango,這就是為什么我們使用持續集成和單元測試來確保我們的結果在專用構建服務器上可再現的原因。 即使測試不匹配,這也會使您撓頭。
All this led us to look for a better way of working. We needed an environment that was easily reproducible, and more in line with the production environment we are deploying to. We’ve already been using Docker for some time so this was the natural choice.
所有這些使我們尋求一種更好的工作方式。 我們需要一個易于復制的環境,并且與我們要部署到的生產環境更加一致。 我們已經使用Docker已有一段時間了,所以這是自然的選擇。
碼頭工人 (Docker)
As described in a previous post, Docker is designed to provide an isolated, portable and repeatable wrapper around your applications. We use this in a number of ways:
如前一篇文章所述 ,Docker旨在為您的應用程序提供一個隔離,可移植和可重復的包裝器。 我們以多種方式使用它:
1.可重現的環境 (1. Reproducible environments)
Each project can run inside its own container, completely sandboxed from the rest of your system. We have a number of base images, each built on specific R versions and provisioned with standard sets of packages (using our pkgsnap package) and RStudio Server. Each project can build on one of these images with any specific package dependencies. The recipe to build this image is stored in the Dockerfile that can be saved in the project directory. An example project Docker file is shown in this demonstration.
每個項目都可以在自己的容器中運行,并且與系統其余部分完全沙盒化。 我們有許多基礎映像,每個基礎映像都基于特定的R版本構建,并配有標準的軟件包集(使用我們的pkgsnap軟件包)和RStudio服務器。 每個項目都可以在這些映像之一上建立任何特定的程序包依賴關系。 構建該映像的配方存儲在Dockerfile中,該文件可以保存在項目目錄中。 此演示中顯示了一個示例項目Docker文件。
2.系統依賴性 (2. System dependencies)
If there are system dependencies such as database connections or external libraries, then building an image with these installed makes it much easier to distribute the project to others. This also makes Docker a great way of trying a new technology without the pain of installing it on your system. For example the excellent Jupyter/all-spark-notebook has everything you need to get started with Spark from R, Python or Scala.
如果存在諸如數據庫連接或外部庫之類的系統依賴項,則在安裝了這些依賴項的情況下構建映像將使將項目分發給其他人更加容易。 這也使Docker成為嘗試一項新技術的好方法,而無需在系統上安裝新技術。 例如,出色的Jupyter / all-spark-notebook提供了從R,Python或Scala入門Spark所需的一切。
3.可擴展性 (3. Scalability)
Once you’re used to working in containers it can significantly lower the barrier to scaling up the compute power when needed. Your container will work just the same on your laptop and a 32 core EC2 instance. You just spin up a node, pull the image and deploy your application. Multiple containers from the same image can be spawned across a grid in seconds and a small scale Spark cluster can be swapped out for a much larger one.
一旦習慣了在容器中工作,它就可以顯著降低在需要時擴展計算能力的障礙。 您的容器在筆記本電腦和32核EC2實例上的工作原理相同。 您只需旋轉一個節點,拉取映像并部署您的應用程序。 可以在幾秒鐘內跨網格生成同一圖像中的多個容器,并且可以將小規模的Spark集群換成更大的集群。
流浪漢 (Vagrant)
For larger software development projects we also use Vagrant as a tool for reproducible development environments. As described in an earlier post Vagrant is a set of command line tools for managing virtual machines (VMs). This creates a dedicated VM for each project that is consistent across the development team and only creates a small file in version control.
對于大型軟件開發項目,我們還使用Vagrant作為可重現的開發環境的工具。 如之前的文章所述,Vagrant是一組用于管理虛擬機(VM)的命令行工具。 這將為每個項目創建專用的VM,該VM在整個開發團隊中是一致的,并且僅在版本控制中創建一個小文件。
更多資源 (More resources)
翻譯自: https://www.pybloggers.com/2015/12/how-i-use-vagrant-and-docker-in-consultancy-projects/
總結
以上是生活随笔為你收集整理的我如何在咨询项目中使用Vagrant和Docker的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Weisfeiler-Lehman(WL
- 下一篇: 计算机专业欧美排名,数字媒体艺术大学排名