snowflake 使用_如何使用机器学习模型直接从Snowflake进行预测
snowflake 使用
Often, we are faced with the scenarios (and myself recently), where the model which was deployed by the data scientist runs on a schedule and whether that’s once an hour, once a day, or once a week…you get the point. However, there are times when out-of-schedule results are required to make decisions for a meeting or analysis.
通常,我們面臨著場景(以及我最近遇到的場景),其中數據科學家部署的模型按計劃運行,而無論是每小時,每天還是每周一次……您都明白了。 但是,有時需要超出計劃范圍的結果才能為會議或分析做出決策。
With that being said, there are a few ways to get out-of-schedule predictions…
話雖這么說,但是有幾種方法可以進行超出預期的預測…
超出計劃外的預測 (Getting Out-of-schedule Predictions)
Consequent, even though the Data Scientist & Co could implement a batch prediction application for others to use for out-of-schedule case, it would be intuitive to bring non-technical users closer to the model themselves and give them the power to run predictions from SQL.
因此,即使Data Scientist&Co可以實施批處理預測應用程序以供其他人在計劃外的情況下使用,也可以很直觀地使非技術用戶更接近模型本身,并賦予他們進行預測的能力從SQL。
縮小Snowflake上運行預測與SQL之間的差距 (Bridging the gap between running prediction and SQL on Snowflake)
Inspired by Amazon Aurora Machine Learning, I spent a couple of days thinking about how to bridge this gap, and put together an architecture and build that will allow non-technical users to perform batch prediction from the comfort of SQL. This is all within Snowflake using Stored Procedure, Snowpipe, Streams and Tasks, and SageMaker’s batch prediction job (Batch Transform), to create a batch inference data pipeline.
受Amazon Aurora機器學習的啟發,我花了幾天的時間思考如何彌合這一差距,并構建了一個架構和構建,它將允許非技術用戶從SQL的舒適性中執行批量預測。 這一切都在Snowflake中完成,使用存儲過程,Snowpipe,流和任務以及SageMaker的批處理預測作業(批處理轉換)來創建批處理推理數據管道。
雪花機器學習-建筑設計 (Snowflake Machine Learning - Architectural Design)
Architectural diagram of the build構建的架構圖卸載到S3上-使用存儲過程 (Unloading onto S3 — Use of Stored Procedure)
Flow diagram of Unloading onto S3卸載到S3的流程圖創建輸入表 (Creating the input table)
In order for the user to make a call to Batch Transform, the user will need to create an input table that contains the data for the model, and mandatory fields, the predictionid which is a uuid for the job, record_seq which is a unique identifier for reach input rows, a NULL prediction column which is the target of interest.
為了使用戶能夠調用Batch Transform,用戶將需要創建一個輸入表,其中包含模型的數據和必填字段, predictionid是作業的uuid, record_seq是唯一標識符。對于覆蓋率輸入行,則為目標目標NULL prediction列。
Input Data: hotel_cancellation輸入數據:hotel_cancellation卸載到S3 (Unloading onto S3)
The call_ml_prediction Stored Procedure takes in a user-defined job name and input table name. Calling it will unload the file (using predictionid as the name) onto S3 bucket in the /input path and create an entry in the prediction_status table. From there, Batch Transform will be called to predict on the inputted data.
call_ml_prediction存儲過程采用用戶定義的作業名稱和輸入表名稱。 調用它將把文件(使用predictionid作為名稱)卸載到/input路徑中的S3存儲桶上,并在prediction_status表中創建一個條目。 從那里,將調用Batch Transform來預測輸入的數據。
To ensure there aren’t multiple requests being submitted, only one job is able to run at a time. For simplicity, I also ensured only a single file is unloaded onto S3, but Batch Transform can handle multiple input files.
為了確保不會提交多個請求,一次只能運行一個作業。 為簡單起見,我還確保僅將單個文件卸載到S3上,但是Batch Transform可以處理多個輸入文件。
Prediction status table預測狀態表預測—使用SageMaker批量轉換 (Prediction — Use of SageMaker Batch Transform)
Flow diagram of Triggering Batch Transform觸發批量轉換的流程圖觸發SageMaker批量轉換 (Triggering SageMaker Batch Transform)
Once the data is unloaded onto the S3 bucket /input, a Lambda gets fired which makes a call SageMaker Batch Transform to read in the input data and output inferences to the /sagemaker path.
將數據卸載到S3存儲桶/input ,將觸發Lambda,該Lambda調用SageMaker Batch Transform讀取輸入數據,并將推斷輸出到/sagemaker路徑。
If you’re familiar with Batch Transform, you can set the input_filter, join and output_filter to your likings for the output prediction file.
如果您熟悉Batch Transform,則可以根據自己的喜好設置output_filter,join和output_filter,以適應輸出預測文件。
批量轉換輸出 (Batch Transform Output)
Once Batch Transform completes, it outputs the result as a .csv.out in the /sagemaker path. Another Lambda gets fired which will copy and rename the file as .csv to the /snowflake path where SQS is setup for Snowpipe auto-ingest.
一旦批量變換完成時,它輸出該結果作為一個.csv.out在/sagemaker路徑。 觸發另一個Lambda,它將把文件復制為.csv并將其重命名為/snowflake路徑,在該路徑中為Snowpipe自動攝取設置了SQS。
結果-使用Snowpipe,流和任務 (The Result — Use of Snowpipe, Stream and Task)
Flow diagram of pipping the data into Snowflake將數據放入Snowflake的流程圖通過雪管攝取 (Ingestion through Snowpipe)
Once the data is dropped onto the /snowflake path, it is inserted into the prediction_result table via Snowpipe. For simplicity, since SageMaker Batch Transform maintains the order of the prediction, the row number was used as the identifier to join to the input table. You can do the postprocessing step within Batch Transform itself.
數據放到/snowflake路徑后,便會通過Snowpipe將其插入prediction_result表。 為簡單起見,由于SageMaker Batch Transform保持了預測的順序,因此將行號用作連接到輸入表的標識符。 您可以在Batch Transform本身中執行后處理步驟。
流數據并觸發任務 (Streaming the data and triggering Tasks)
A stream is created is on the prediction_result table which will populate prediction_result_stream after Snowpipe delivers the data. This stream, specifically the system$stream_has_data('prediction_result_stream, will be used by the scheduled task populate_prediction_result to call the stored procedure populate_prediction_result to populate the prediction data on the hotel_cancellation table, only if there is a stream. The unique identifier, predictionid, is also set as a task session variable.
創建一個流是在prediction_result表,該表將填充prediction_result_stream Snowpipe開出數據之后。 調度的任務populate_prediction_result將使用此流,特別是system$stream_has_data('prediction_result_stream調用存儲過程populate_prediction_result以在hotel_cancellation表上填充預測數據,唯一的hotel_cancellation是唯一流標識符predictionid ID為。還設置為任務會話變量。
The Result from the Batch Transform批處理轉換的結果完成工作 (Completing the job)
At the end of the job, and after populate_prediction_result completes, using the system task session variable, the next task update_prediction_status updates the prediction status from Submitted to Completed. This concludes the entire “Using SQL to run Batch Prediction” pipeline.
在作業結束時,并在populate_prediction_result完成之后,使用系統任務會話變量,下一個任務update_prediction_status將預測狀態從Submitted更改為Completed 。 這樣就構成了整個“使用SQL運行批處理預測”管道的整個過程。
Updated prediction status更新了預測狀態做得更好 (Doing it better)
Snowflake provides a lot of power through Snowpipe, Streams, Stored Procedure and Task to create a data pipeline which can be used for different applications. When combined with SageMaker, Users will be able to send inputs directly from Snowflake and interact with the prediction results.
Snowflake通過Snowpipe,流,存儲過程和任務提供了大量功能,以創建可用于不同應用程序的數據管道。 與SageMaker結合使用時,用戶將能夠直接從Snowflake發送輸入并與預測結果進行交互。
Nonetheless, there are some wishlist items which will improve the whole experience and that is:
盡管如此,仍有一些愿望清單項可以改善整體體驗,即:
I hope you find this article useful and enjoyed the read.
希望您覺得這篇文章對您有幫助,并喜歡閱讀。
關于我 (About Me)
I love writing medium articles, and sharing my ideas and learnings with everyone. My day-to-day job involves helping businesses build scalable cloud and data solutions, and trying new food recipes. Feel free to connect with me for a casual chat, just let me know you’re from Medium
我喜歡寫中篇文章,并與所有人分享我的想法和經驗。 我的日常工作涉及幫助企業構建可擴展的云和數據解決方案,并嘗試新的食品食譜。 隨時與我聯系以進行休閑聊天,只需讓我知道您來自中
— Jeno Yamma
— 杰諾·雅瑪 ( Jeno Yamma)
翻譯自: https://towardsdatascience.com/using-machine-learning-models-to-make-prediction-directly-from-snowflake-2471b2f71b68
snowflake 使用
總結
以上是生活随笔為你收集整理的snowflake 使用_如何使用机器学习模型直接从Snowflake进行预测的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 2月2日或能肉眼看见5万年一遇的绿色彗星
- 下一篇: 红白机平台“五大最硬核格斗游戏”盘点