當前位置：首頁 > 人工智能 > ChatGpt >内容正文

ChatGpt

cloud 部署_使用Google Cloud AI平台开发，训练和部署TensorFlow模型

發布時間：2023/12/15 ChatGpt 26 豆豆

生活随笔收集整理的這篇文章主要介紹了 cloud 部署_使用Google Cloud AI平台开发，训练和部署TensorFlow模型小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

cloud 部署

實用指南 (A Practical Guide)

The TensorFlow ecosystem has become very popular for developing applications involving deep learning. One of the reasons is that it has a strong community and a lot of tools have been developed around the core library to support developers. In this tutorial, I will guide you through how to prototype models in google colab, train it on Google Cloud AI Platform, and deploy the finalized model on Google Cloud AI Platform for production. I will include the working Google colab notebooks to recreate the work.

TensorFlow生態系統已經非常流行用于開發涉及深度學習的應用程序。原因之一是它具有強大的社區，并且圍繞核心庫開發了許多工具來支持開發人員。在本教程中，我將指導您完成如何在google colab中對模型進行原型制作，如何在Google Cloud AI Platform上進行訓練以及如何在Google Cloud AI Platform上部署最終模型進行生產。我將包括工作中的Google colab筆記本以重新創建工作。

Google colab is a free resource for prototyping models in TensorFlow and comes with various runtime. Preparing a machine with GPU or TPU could be costly to start with however users can start with free GPU with google colab. Bear in mind, colab has limited resources and might not be suitable for properly training models requiring large compute resources. Nonetheless, colab is a perfect tool for prototyping your models and some initial experimentation.

Google colab是TensorFlow中用于模型原型的免費資源，并帶有各種運行時。首先，準備一臺配備GPU或TPU的機器可能會很昂貴，但是用戶可以使用帶有Google colab的免費GPU來開始。請記住，colab資源有限，可能不適用于需要大量計算資源的正確培訓模型。盡管如此，colab是用于制作模型原型和進行一些初始實驗的理想工具。

A block diagram to visualize the workflow可視化工作流程的框圖

Google Cloud Platform上的培訓模型 (Training Model on Google Cloud Platform)

Once you are satisfied with your model pipeline, it is time to train the model with the proper number of EPOCHS and full datasets. As you might know, training deep learning models requires a long time and a large cluster of CPU’s GPUs or TPU’s. One option is that users set up their own computing cluster which is costly and time-consuming most of the time. Another option is to use cloud computing to training the model and pay as you go. TensorFlow team has released a package called Tensorflow Cloud to let users train the models on the Google Cloud platform without any hassle. I have followed steps from Train your TensorFlow model on Google Cloud using TensorFlow Cloud blog and will share some issues I have faced to make it work. Some of the pre-requisites are ad defined in the project guidelines for submitting a training job to the GCP platform.

對模型管道感到滿意之后，就該使用適當數量的EPOCHS和完整數據集來訓練模型。如您所知，訓練深度學習模型需要很長時間，并且需要大量的CPU GPU或TPU。一種選擇是用戶設置自己的計算集群，這在大多數情況下既昂貴又耗時。另一種選擇是使用云計算來訓練模型并按需付費。 TensorFlow團隊發布了一個名為Tensorflow Cloud的軟件包，使用戶可以輕松地在Google Cloud平臺上訓練模型。我已按照使用TensorFlow Cloud博客在Google Cloud上訓練您的TensorFlow模型的步驟進行了操作，并將分享我為使其工作而面臨的一些問題。在項目準則中已定義了一些先決條件，以便向GCP平臺提交培訓工作。

Python >= 3.5
Python> = 3.5
A Google Cloud project
Google Cloud專案
An authenticated GCP account
經過驗證的GCP帳戶
Google AI platform APIs enabled for your GCP account. We use the AI platform for deploying docker images on GCP.
為您的GCP帳戶啟用了Google AI平臺 API。我們使用AI平臺在GCP上部署docker鏡像。
Either a functioning version of docker if you want to use a local docker process for your build, or create a cloud storage bucket to use with Google Cloud build for docker image build and publishing.
如果要使用本地docker進程進行構建，請使用功能正常的docker版本，或者創建一個云存儲分區以與Google Cloud build一起使用以進行docker鏡像構建和發布。

After creating the GCP project, follow the below steps to config the environment with Google cloud authentication.

創建GCP項目后，請按照以下步驟使用Google云身份驗證配置環境。

# Authenticate
from google.colab import auth
auth.authenticate_user()

Set PROJECT_ID in the environment

在環境中設置PROJECT_ID

os.environ['PROJECT_ID']='gcpessentials-rz'!gcloud config set project $PROJECT_ID

Create the Service Account and set some permissions needed for tensorflow-cloud the package. Download the service account key and add as an environment variable as GOOGLE_APPLICATION_CREDENTIALS. These credentials will be needed to submit the job to the google cloud platform.

創建服務帳戶并設置一些必要的權限以將tensorflow-cloud打包。下載服務帳戶密鑰，并作為環境變量添加為GOOGLE_APPLICATION_CREDENTIALS。 需要這些憑據才能將作業提交到Google云平臺。

os.environ['PROJECT_ID']='gcpessentials-rz'!gcloud config set project $PROJECT_IDos.environ['SA_NAME']='gcpessentials-rz'!gcloud iam service-accounts create $SA_NAME!gcloud projects add-iam-policy-binding $PROJECT_ID \--member serviceAccount:$SA_NAME@$PROJECT_ID.iam.gserviceaccount.com \--role 'roles/editor'!gcloud iam service-accounts keys create key.json --iam-account $SA_NAME@$PROJECT_ID.iam.gserviceaccount.comos.environ['GOOGLE_APPLICATION_CREDENTIALS']='key.json'

Even after following the above steps, I have to enable service Accounts and cloud_build Status under Cloud Build settings. Below is the snapshot of my project settings.

即使執行了上述步驟，我也必須在“云構建”設置下啟用service Accounts和cloud_build狀態。以下是我的項目設置的快照。

The next step is to create GCP_BUCKET to store the data needed for submitting the job to Google Cloud.

下一步是創建GCP_BUCKET，以存儲將作業提交到Google Cloud所需的數據。

BUCKET = 'tf2-model-training'
!gsutil mb gs://$BUCKET

Once we have set up the environment in our google colab and created the GCP_BUCKET, we have to prepare the notebook with training the model. Below are some key points to consider while preparing a notebook.

在Google colab中設置環境并創建GCP_BUCKET之后，我們必須準備帶有模型訓練的筆記本。以下是準備筆記本時要考慮的一些關鍵點。

Test the code in a notebook using google colab for a small number of EPOCHS

使用Google colab在筆記本中測試少量EPOCHS的代碼

Make sure, there are no errors in the notebook and remove any unnecessary code

確保筆記本中沒有錯誤，并刪除所有不必要的代碼

Prepare the requirements.txt and upload to google colab environment.

準備requirements.txt并上傳到Google colab環境。

Save the trained model in GCP_BUCKET (will be used for Deployment)

將訓練好的模型保存在GCP_BUCKET中(將用于部署)

Once we are ready with the notebook, we submit the training on Google Cloud platform using tensorflow-cloud package.

準備好筆記本后，我們將使用tensorflow-cloud軟件包在Google Cloud平臺上提交培訓。

import tensorflow_cloud as tfcBUCKET = 'tf2-model-training'labels= {'phase': 'test','owner': 'raza',}tfc.run(requirements_txt="requirements.txt",distribution_strategy="auto",chief_config='auto',docker_image_bucket_name=BUCKET,job_labels=labels,)

The above code will convert the notebook into identifier-Fg45-a.py and submit the training on the Google Cloud platform in the docker form. After you submit the job, you can see the message like the below figure.

上面的代碼會將筆記本轉換為identifier-Fg45-a.py -Fg45-a.py，并以docker形式在Google Cloud平臺上提交培訓。提交作業后，您可以看到如下圖所示的消息。

Status of Submission of training job to GCP向GCP提交培訓工作的狀態

Wait for a few minutes before the job starts actually training

等待幾分鐘，然后再開始實際培訓

By clicking on the provided link, you will see a page like below figure

通過單擊提供的鏈接，您將看到如下圖所示的頁面

A link to see the progress of training using AI platform查看使用AI平臺進行培訓的進度的鏈接

You can see the logs by clicking the view logs and will look like the below figure. Logs are helpful to see what sort of Exceptions happened in your code.

您可以通過單擊view logs來view logs ，如下圖所示。日志有助于查看代碼中發生了哪種異常。

Training logs訓練記錄

Before submitting the huge job, test your pipeline submission script for few EPOCHS and then submit the complete job

提交大量工作之前，請測試您的管道提交腳本中是否包含一些EPOCHS，然后提交完整的工作

If all goes well, your model will be trained using the Google AI platform and saved in GCP_BUCKET along with other resources.

如果一切順利，您的模型將使用Google AI平臺進行訓練，并與其他資源一起保存在GCP_BUCKET中。

Google Colab筆記本培訓 (Google Colab Notebook For Training)

Below is the working notebook to run the training on Google AI Platform using the TensorFlow-Cloud package.

以下是使用TensorFlow-Cloud軟件包在Google AI平臺上運行培訓的工作筆記本。

在Google Cloud Platform上部署經過訓練的模型 (Deploying the trained model on Google Cloud Platform)

Once the model is trained and finalized, users would like to deploy the model on a scale-able infrastructure. Google Cloud also provides the necessary infrastructure to deploy TensorFlow models on its platform without a lot of modifications. I will show you how to deploy the model on a google cloud platform. At the end of the training, we save the finalized model in Google Cloud Bucket. Below are the steps to deploy the model.

一旦對模型進行了培訓和最終確定，用戶將希望在可擴展的基礎架構上部署模型。 Google Cloud還提供了無需大量修改即可在其平臺上部署TensorFlow模型的必要基礎架構。我將向您展示如何在Google Cloud Platform上部署模型。培訓結束時，我們將最終模型保存在Google Cloud Bucket中。以下是部署模型的步驟。

Configure the models serving using AI Platform
配置使用AI平臺提供服務的模型
Perform the prediction on deployed model
對部署的模型執行預測

Use the below set of commands to configure the model for deployment withing Google colab. I have explained in detail how to deploy the model on Google Cloud in Model with TensorFlow and Serve on Google Cloud Platform.

使用以下命令集來配置要與Google colab一起部署的模型。我已經詳細解釋了如何使用TensorFlow在Model中的 Google Cloud上部署模型，以及如何在Google Cloud Platform上進行服務。

I have configured the model to serve as v2 following the same steps in the above tutorial.

我已按照上述教程中的相同步驟將模型配置為v2 。

預測已部署的模型 (Getting Prediction on Deployed Model)

Once the model is deployed and you can see the green tick, it is time to test the model for predictions. Some important points to consider.

部署模型后，您會看到綠色的勾號，現在該測試模型進行預測了。需要考慮的一些重要點。

Test the model prediction using Test & Use tab under your version or model tab.
使用版本或模型標簽下的“ 測試和使用”標簽測試模型預測。
Input JSON data depends on how you define your model
輸入JSON數據取決于您如何定義模型

# [Reference](https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/ml_engine/online_prediction/predict.py)
import googleapiclient.discovery
def predict_json(project, model, instances, version=None):
service = googleapiclient.discovery.build('ml', 'v1')
name = 'projects/{}/models/{}'.format(project, model)
if version is not None:
name += '/versions/{}'.format(version)
response = service.projects().predict(
name=name,
body={'instances': instances}
).execute()
if 'error' in response:
raise RuntimeError(response['error'])
return response['predictions']

Load the unseen data for predictions

加載看不見的數據進行預測

df_new_products = pd.read_csv(data_path + 'proposed_new_product.csv')
tdata_instances = {'dense_input':df_new_products.values[0].tolist()}
predictions_gcloud = predict_json(CLOUD_PROJECT, 'earnings_prediction', tdata_instances, version='v2')
predictions_gcloud = predictions_gcloud[0]['dense_3'][0] + 0.1159
predictions_gcloud = predictions_gcloud/0.0000036968
print('Earnings predictions for Proposed product - ${}'.format(predictions_gcloud))Earnings predictions for Proposed product - $259671.15201209177

Google Colab Notebook進行預測 (Google Colab Notebook For Prediction)

Below is the working notebook to perform prediction on the deployed model

以下是用于對已部署模型進行預測的工作筆記本

結論 (Conclusions)

In this guide, we have learned about training the deep learning models with TensorFlow 2.3.0 using the Google Cloud AI platform with the help of tensorflow-cloud package and deployed the trained model on the Google Cloud AI platform too. Below are some key lessons

在本指南中，我們學習了有關在TensorFlow tensorflow-cloud軟件包的幫助下使用Google Cloud AI平臺使用TensorFlow 2.3.0訓練深度學習模型的方法，并將訓練后的模型也部署在Google Cloud AI平臺上。以下是一些關鍵課程

Use Google Colab to Building TensorFlow models
使用Google Colab構建TensorFlow模型
Train the Model on Google Cloud AI Platform using TensorFlow-Cloud
使用TensorFlow-Cloud在Google Cloud AI平臺上訓練模型
Save the trained model in GCP Bucket after training
訓練后將訓練后的模型保存在GCP桶中
Set up Model serving on GCP
在GCP上設置模型投放
Make Prediction using deployed model in the cloud
使用云中已部署的模型進行預測

參考讀物/鏈接 (References Readings/Links)

https://blog.tensorflow.org/2020/04/how-to-deploy-tensorflow-2-models-on-cloud-ai-platform.html

https://blog.tensorflow.org/2020/08/train-your-tensorflow-model-on-google.html

https://scikit-learn.org/stable/modules/preprocessing.html

翻譯自: https://towardsdatascience.com/develop-train-and-deploy-tensorflow-models-using-google-cloud-ai-platform-32b47095878b