通化红灯_我们如何构建廉价,可扩展的架构来对世界进行卡通化!
通化紅燈
A lot of people were interested in the architecture behind Cartoonizer. So Tejas and I (Niraj) have tried explaining the process we employed to make it work. Kudos to Algorithmia for powering our video inference pipeline. 😇
許多人對 Cartoonizer 背后的體系結(jié)構(gòu)感興趣 。 因此, Tejas 和我( Niraj )試圖解釋我們用來使其工作的過程。 感謝Algorithmia為我們的視頻推理流程提供支持。 😇
In today’s fast paced world, ML experts are expected to put on multiple hats in the ML workflow. One of the critical tasks in the workflow is to serve the models in production! This seemingly important piece in the pipeline tends to get overlooked thus faltering to provide value to the customers.
在當(dāng)今快速發(fā)展的世界中,機器學(xué)習(xí)專家有望在機器學(xué)習(xí)工作流程中脫穎而出。 工作流中的關(guān)鍵任務(wù)之一就是為生產(chǎn)中的模型提供服務(wù)! 流水線中看似重要的部分往往被忽視,因此步履蹣跚,無法為客戶提供價值。
The engineering discipline clearly can’t exist without the work of the (data) scientists — MLE (Machine Learning Engineering) is built on the work of data science — but the engineering is how the the science gets applied to the world.- Caleb Kaiser
沒有(數(shù)據(jù))科學(xué)家的工作,工程學(xué)科顯然是不存在的-MLE(機器學(xué)習(xí)工程)是建立在數(shù)據(jù)科學(xué)的基礎(chǔ)上的,但是工程學(xué)是科學(xué)如何應(yīng)用于世界的方法。- Caleb Kaiser
This article is going to explain our attempt to not only serve a computationally intensive GAN model in production inexpensively but also scale it horizontally.
本文將解釋我們的嘗試,該嘗試不僅可以廉價地在生產(chǎn)中使用計算密集型GAN模型,而且可以水平擴展它。
ML Woes😅 (ML Woes 😅)
If you are familiar with hosting a REST API, it warrants these basic things -
如果您熟悉托管REST API,則可以保證以下基本內(nèi)容-
- GCP or AWS instance GCP或AWS實例
- System dependencies as well as python specific dependencies (pip) 系統(tǒng)依賴性以及python特定依賴性(pip)
- Proxy server 代理服務(wù)器
- Multiple workers to scale horizontally 多名工人水平擴展
As an ML engineer, the 2nd point is tedious and less than satisfactory in terms of scalability and server costs. Gone are those days when the responsibility of maintaining servers rests on your shoulders! I am talking about outsourcing and automating the 2nd point completely. Enter Google Cloud Run!
作為ML工程師,第二點很繁瑣,并且在可伸縮性和服務(wù)器成本方面并不令人滿意。 那些日子,維護服務(wù)器的責(zé)任就落在了肩上! 我說的是外包和完全自動化第二點。 輸入Google Cloud Run!
運行云運行! (Run Cloud Run!)
Before I go into how the architecture works, I would like to indulge you in some user statistics and give you a feel of the traffic we could cater to with minimal costs!
在介紹該體系結(jié)構(gòu)的工作原理之前,我想向您介紹一些用戶統(tǒng)計信息,并讓您感受到我們可以用最少的費用滿足的流量!
我們收到的流量😮 (Traffic we received 😮)
Since we launched our demo web-app on 26th July, we have had around 12,000 users in less than 3 weeks! 6,000 of those coming in the first 4 days — most of the traffic coming from our Reddit post and TheNextWeb article which was then picked up by other blogs from various countries as well.
自7月26日啟動演示網(wǎng)絡(luò)應(yīng)用程序以來,我們在不到3周的時間內(nèi)擁有約12,000名用戶! 前6,000 天中有6,000位訪問者 -大部分流量來自我們的Reddit帖子和TheNextWeb文章 ,隨后來自各個國家/地區(qū)的其他博客也將其吸引 。
At any given point during this peak time, we had around 50+ users requesting our image and video services.
在這個高峰時間的任何時候,我們都有大約50多個用戶在請求我們的圖像和視頻服務(wù)。
Users over a period of over 2 weeks用戶超過2周我們已經(jīng)準(zhǔn)備好交通 (Traffic we are ready for 💪)
Out of the box, Cloud Run lets us spawn 1000 instances based on incoming traffic. It defaults to a max of 80 requests per container instance. So ideally we can cater to 80,000 requests/second!
開箱即用,Cloud Run使我們可以根據(jù)傳入流量生成1000個實例。 默認(rèn)情況下,每個容器實例最多80個請求。 因此, 理想情況下,我們可以每秒處理80,000個請求!
BUT since the cartoonization process was already heavy, we limited our program to 8 workers per instance. That means one instance was limited to 8 concurrent requests. The 9th request will be routed to a second instance if at all. So essentially we can cater to 8000 requests/sec!
但是,由于卡通化過程已經(jīng)很繁重,因此我們將程序的實例數(shù)限制為8個實例。 這意味著一個實例僅限于8個并發(fā)請求。 第九個請求將被路由到第二個實例。 因此,基本上我們可以滿足8000個請求/秒 !
在CPU或GPU上進行視頻處理? 🎥 (Video processing on CPU or GPU? 🎥)
Our unique selling point was putting together an architecture which would allow us to serve videos along with images at minimal cost. Videos are nothing but a collection of images (frames) and we have to cartoonize each frame.
我們獨特的賣點是建立一種體系結(jié)構(gòu),使我們能夠以最低的成本提供視頻和圖像。 視頻不過是圖像(幀)的集合,我們必須將每個幀卡通化。
Cartoonized at 30 fps with 720p resolution以720p分辨率以30 fps進行卡通化On a 8-core i7 CPU, it takes around 1 second to cartoonize a 1080p resolution image. Unfortunately, Google Cloud Run provides only a maximum of 2 vCPUs which brings up the time to 3 seconds/image! You can imagine the horror of processing a video on that kind of compute! A 10 second video at 30 frames per second (fps) would take 15 minutes! 😱
在8核i7 CPU上,卡通化1080p分辨率圖像大約需要1秒鐘。 不幸的是,Google Cloud Run最多只能提供2個vCPU,這會使時間縮短到3秒/圖像 ! 您可以想象在這種計算下處理視頻的恐怖! 以每秒30幀(fps)的速度播放10秒的視頻需要15分鐘 ! 😱
We employed 2 techinques to bring down the video inference time.
我們雇用了2名技術(shù)專家來縮短視頻推理時間。
Reduce the resolution of image to 480p: This essentially lessened the load per frame without any conspicuous change in the quality. This helped us reach 1 second/image inference time.
將圖像分辨率降低到480p :這實質(zhì)上減輕了每幀的負(fù)載,而質(zhì)量沒有明顯變化。 這幫助我們達到了1秒/圖像推斷時間。
Decrease the frame rate of the video: We downplayed it from 30 fps to 15 fps which drastically reduced our video computation time.
降低視頻的幀頻 :我們將視頻的幀率從30 fps降低到15 fps,從而大大減少了視頻計算時間。
We experimented with tensorflow lite for weight quantization to speed up inference pipeline, but we faced issues with serving models for dynamic input image sizes. While it worked for a fixed image size, we didn’t find the results to latency and computation tradeoff justified.
我們使用tensorflow lite進行了權(quán)重量化實驗,以加快推理流程,但是我們面臨著動態(tài)輸入圖像尺寸的服務(wù)模型問題。 雖然它適用于固定的圖像大小,但我們沒有發(fā)現(xiàn)延遲和計算折衷的結(jié)果是合理的。
Even by downplaying resolution and reducing frames per second, video cartoonization was taking 2.5 minutes for a 10 second video. It was still too high considering user experience. And hence, converting a video into a cartoon required some additional artillery.
即使降低分辨率并減少每秒幀數(shù),視頻卡通化也需要2.5分鐘才能播放10秒的視頻。 考慮到用戶體驗,它仍然太高。 因此,將視頻轉(zhuǎn)換為卡通需要額外的大炮。
GPU上的速度優(yōu)勢? (Speed advantage on GPU ?)
Using a GPU gave a 10x increase in speed for an image. Inference time came down to 50 ms/image. That meant we could cartoonize and serve a 10 second video in 7 seconds! Now we are in business.😉
使用GPU可使圖像速度提高10倍。 推理時間降至50毫秒/圖像。 這意味著我們可以將其卡通化并在7秒鐘內(nèi)播放10秒的視頻! 現(xiàn)在我們經(jīng)商了。😉
Or so we thought. There were 2 questions that haunted us -
還是我們認(rèn)為。 有兩個困擾我們的問題-
One way would have been to deploy the model on a Google Compute Engine instance as an API but that defeated the purpose of Cloud Run scaling. All the concurrent requests would queue up and GPU would become a bottleneck in our pipeline. Also, running an instance 24/7 is not cheap 💰
一種方法是將模型作為API部署在Google Compute Engine實例上,但這違反了Cloud Run擴展的目的。 所有并發(fā)請求將排隊,GPU將成為我們管道中的瓶頸。 另外,運行實例24/7并不便宜💰
如何使用GPU進行擴展(價格便宜)? (How to scale using a GPU (inexpensively)?)
Cloud Run being a managed stateless container service cannot afford to provide GPU support. Hence, we outsourced our GPU computation to a service called Algorithmia instead of renting out an expensive Google Compute Engine server. The reason is two-folds -
Cloud Run作為托管的無狀態(tài)容器服務(wù)無法提供GPU支持 。 因此,我們將GPU計算外包給名為Algorithmia的服務(wù),而不是租用昂貴的Google Compute Engine服務(wù)器。 原因有兩個:
First of all, it boasts the ability to scale deployed Deep Learning models in production! It can handle 24 concurrent requests per GPU-computation instance. Additionally, it can automatically scale to 80 instances at any given point in time.
首先,它具有在生產(chǎn)中擴展已部署的深度學(xué)習(xí)模型的能力! 每個GPU計算實例可以處理24個并發(fā)請求。 此外,它可以在任何給定的時間點自動擴展到80個實例。
This meant we could satisfy around 1500+ video requests concurrently AND comparatively inexpensively!
這意味著我們可以同時且相對便宜地滿足大約1500多個視頻請求!
學(xué)問 (Learnings)
一個實例有80個請求! 🤔 (80 requests on one instance! 🤔)
Our Flask API was coded to handle 8 concurrent requests by spawning 8 workers using Gunicorn BUT we didn’t change the default setting of 80 requests per instance in Cloud Run.
我們的Flask API編碼為通過使用Gunicorn But產(chǎn)生8個工作程序來處理8個并發(fā)請求,我們沒有在Cloud Run中更改每個實例80個請求的默認(rèn)設(shè)置。
This meant only one instance was spawned the whole time and perhaps user requests queued up on our Flask server. The downside being user had to wait longer to get their cartoonized images and videos ??
這意味著整個時間內(nèi)僅生成一個實例,也許用戶請求在我們的Flask服務(wù)器上排隊。 缺點是用戶不得不等待更長的時間才能獲得卡通化的圖像和視頻??
The upside being we were billed for only one instance. Lower the number of requests per instance, greater the number of instances spawned thus increasing your billable instance time. But rerouting requests to separate instances means better and faster user satisfaction. 😉
有利的一面是,我們只被收取了一次費用。 每個實例的請求數(shù)越少,產(chǎn)生的實例數(shù)越多,從而增加了可計費實例的時間。 但是將請求重新路由到單獨的實例意味著更好,更快的用戶滿意度。 😉
Highest number of requests per second (2 req/sec)每秒最高請求數(shù)(2 req / sec)未來范圍 (Future scope)
We envision this being used for the following —
我們設(shè)想將此用于以下目的-
Churn out quick prototypes or sprites for animes, cartoons and games
制作出動畫,卡通和游戲的快速原型或精靈
Since it subdues facial features and information in general, it can be used to generate minimal art
由于它通常可柔化面部特征和信息,因此可用于產(chǎn)生最少的藝術(shù)效果
Games can import short cut scenes very easily without using motion-capture
游戲無需使用運動捕捉即可非常輕松地導(dǎo)入捷徑場景
- Can be modelled as an assistant to graphic designers or animators. 可以建模為圖形設(shè)計師或動畫師的助手。
If you have something interesting to demo, hit us up!
如果您有什么有趣的演示,請聯(lián)系我們!
Code for the webapp demo is available on Github! Try the demo here!
Github上提供了webapp演示的代碼! 在這里嘗試演示!
翻譯自: https://towardsdatascience.com/how-we-built-an-inexpensive-scalable-architecture-to-cartoonize-the-world-8610050f90a0
通化紅燈
總結(jié)
以上是生活随笔為你收集整理的通化红灯_我们如何构建廉价,可扩展的架构来对世界进行卡通化!的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 苹果8plus和7plus对比
- 下一篇: 机器学习学习吴恩达逻辑回归_机器学习基础