當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

DeepR —训练TensorFlow模型进行生产

發布時間：2023/12/15 编程问答 22 豆豆

生活随笔收集整理的這篇文章主要介紹了 DeepR —训练TensorFlow模型进行生产小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Authors Guillaume Genthial, Romain Beaumont, Denis Kuzin, Amine Benhalloum

作家 Guillaume Genthial ， Romain Beaumont ， Denis Kuzin ， Amine Benhalloum

Links Github, Docs, Quickstart, MovieLens Example

鏈接 Github ，文檔，快速入門， MovieLens示例

介紹 (Introduction)

At Criteo we build recommendation engines. We make millions of recommendations every second with milliseconds latencies from billions of products.

在Criteo，我們構建推薦引擎。我們每秒從數十億種產品中提出數百萬個建議，而毫秒級的延遲 。

While scalability and speed are hard requirements, we also need to optimize for multiple, often changing, criteria: Maximize the number of sales, clicks, etc. Concretely, this translates into dozens of models trained every day, quickly evolving with new losses, architectures, and approaches.

雖然擴展性和速度是硬性要求，但我們還需要針對多個(經常更改)的標準進行優化：最大化銷售，點擊次數等。具體而言，這轉化為每天訓練的數十種模型，隨著新的損失和體系結構快速發展和方法。

DeepR is a Python library to build complex pipelines as easily as possible on top of Tensorflow. It combines ease-of-use with performance and production-oriented capabilities.

DeepR是一個Python庫，用于在Tensorflow上盡可能輕松地構建復雜的管道。它結合了易于使用，性能和面向生產的功能。

At Criteo, we use DeepR to define multi-step training pipelines, preprocess data, design custom models and losses, submit jobs to remote clusters, integrate with logging platforms such as MLFlow and Graphite, and prepare graphs for Ahead-Of-Time XLA compilation and inference.

在Criteo，我們使用DeepR定義多步訓練管道 ，預處理數據，設計自定義模型和損失，將作業提交到遠程集群，與MLFlow和Graphite等日志記錄平臺集成，并準備圖形以進行提前XLA編譯和推論。

DeepR和TensorFlow (DeepR and TensorFlow)

TensorFlow provides great production-oriented capabilities, thanks to (in part) the ability to translate graphs defined in Python into self-contained protobufs that can be reloaded by other services with more efficient backends. More importantly, this alleviates the need to rely on Python for serving and simplifies model deployment.

TensorFlow提供了出色的面向生產的功能，這在一定程度上要歸功于能夠將Python中定義的圖形轉換為自包含的protobuf ，這些protobuf可以由具有更高效率后端的其他服務重新加載。更重要的是，這減輕了依賴Python服務的需求，并簡化了模型部署 。

The switch to eager execution with TF2 opens new possibilities in terms of control flow. However, the static graph approach taken by TF1 and the Estimator API has some advantages in terms of serving capabilities and distributed training support. For legacy and maturity reasons, we chose to work with TF1 for now, waiting for TF2 to stabilize.

切換到急切執行TF2，在控制流程方面開辟了新的可能性。但是，TF1和Estimator API所采用的靜態圖方法在服務能力和分布式培訓支持方面具有一些優勢。出于遺留和成熟的原因，我們選擇暫時使用TF1，以等待TF2穩定下來。

While pure Deep Learning libraries like TensorFlow, PyTorch, Jax, etc. typically focus on defining computational graphs, providing (relatively) low-level tools, DeepR is a collection of higher-level functionality to help with both the model definition and everything that goes around it.

雖然像TensorFlow，PyTorch，Jax等純粹的深度學習庫通常專注于定義計算圖 ，提供(相對)低級工具，但DeepR是高級功能的集合，可幫助模型定義和所有后續工作周圍。

Main pain-points are usually around job scheduling, configuration, logging, code flexibility and reuse rather than new layers implementation. As we keep adding more and more models, we want them to coexist into one consistent codebase, limiting backwards compatibility issues.

主要的痛點通常是圍繞作業調度，配置， 日志記錄 ，代碼 靈活性和重用，而不是新的層實現。隨著我們不斷添加越來越多的模型，我們希望它們共存于一個一致的代碼庫中，從而限制了向后兼容性問題。

DeepR的優勢是什么？ (What are DeepR strengths?)

DeepR comes with a config system built for Machine Learning. One of the biggest problems DeepR addresses is configuration, a major challenge in machine learning codebases, as parameters of any function may have to be exposed as hyper-parameters.

DeepR隨附針對機器學習而構建的配置系統。 DeepR解決的最大問題之一是配置，這是機器學習代碼庫中的主要挑戰，因為任何函數的參數可能都必須公開為超參數。

When a parameter is far down the call stack, it can be particularly tricky and cause all sorts of issues. Defaults might be difficult to change for example.

當參數遠遠超出調用堆棧時，它可能會特別棘手，并會引起各種問題。例如，默認值可能很難更改。

More importantly, it impacts how jobs are launched, and it is not uncommon to end up with long commands with hundreds of parameters, half of them not even being used but kept for backward compatibility.

更重要的是，它影響作業的啟動方式，并以帶有數百個參數的長命令結束并不罕見，其中一半甚至沒有使用，而是為了向后兼容而保留。

DeepR comes with a config system, most similar to Thinc / Gin-config, that makes it possible to configure arbitrarily nested trees of objects, interfacing nicely with json.

DeepR帶有一個配置系統，最??類似于Thinc / Gin-config ，它可以配置任意嵌套的對象樹，并與json很好地接口。

The DeepR config system relies on dictionaries. Using the import string in the “type” key, the config system resolves the class to instantiate.DeepR配置系統依賴于字典。配置系統使用“ type”鍵中的導入字符串，將類解析為實例化。

The config system, combined with Python-fire, results in a flexible and powerful Command Line Interface at no additional cost. This lets us submit jobs on a remote cluster, as sending a config file along with a command is usually the easiest way to interact with a job scheduler.

配置系統與Python-fire結合使用，可產生靈活而強大的命令行界面 ，而無需支付額外費用。這使我們可以在遠程群集上提交作業，因為發送配置文件和命令通常是與作業計劃程序進行交互的最簡單方法。

Parse and instantiate a config into an object using a config and macros使用配置和宏將配置解析并實例化為對象

In other words, we allow the code to stay the same by moving “breaking” changes to the configs. Also, by being able to define the object’s dependencies at the config level, we do not need “assemblers” and other dependency injection mechanisms in the code, simplifying the overall design and maintainability. Finally, because parsing .json files is straightforward, we can reload and update a config programmatically, which is something we need to do for hyper-params search or scheduling.

換句話說，我們通過將“ breaking”更改移到configs來使代碼保持不變。另外，通過能夠在配置級別定義對象的依賴關系，我們在代碼中不需要“匯編程序”和其他依賴關系注入機制，從而簡化了總體設計和可維護性。最后，由于解析.json文件非常簡單，因此我們可以以編程方式重新加載和更新配置 ，這是超參數搜索或調度所需要執行的操作。

DeepR鼓勵您撰寫工作 (DeepR encourages you to write jobs)

A client submits jobs to yarn. Tf-yarn distributes training on other workers, while we can use Spark to speedup inference and validation.客戶將作業提交給紗線。 Tf-yarn向其他工作人員分發培訓，而我們可以使用Spark來加快推理和驗證的速度。

Another problem that DeepR addresses is pipelining.

DeepR解決的另一個問題是流水線。

Training a model is just another ETL, where the input usually is a dataset and the output a protobuf with the model’s graph and weights.

訓練模型只是另一個ETL ，其中輸入通常是數據集，輸出通常是帶有模型圖形和權重的protobuf 。

Calling model.train()is merely one of the multiple steps. In general, we need to preprocess the data, initialize checkpoints, select the best model, compute predictions, export the model in different formats, etc.

調用model.train()只是多個步驟之一。通常，我們需要預處理數據，初始化檢查點，選擇最佳模型，計算預測，以不同格式導出模型等。

DeepR adopts an approach similar to Spark with the Job abstraction that while being flexible, encourages modular logic and code reuse.

DeepR采用類似于Spark的Job抽象方法，該方法既靈活又鼓勵模塊化邏輯和代碼重用 。

DeepR adopts an approach similar to Spark with the Job abstraction that while being flexible, encourages modular logic and code reuse.

DeepR采用與Job抽象類似的Spark方法，該方法既靈活又鼓勵模塊化邏輯和代碼重用 。

As a bonus, jobs can be run on different machines with different hardware requirements: preprocessing probably needs a machine with a good IO, while training would be faster on a GPU.

另外，可以在具有不同硬件要求的不同計算機上運行作業：預處理可能需要一臺具有良好IO的計算機，而在GPU上進行培訓會更快。

And it’s worth the effort: we were able to reduce our memory footprint by 4 and speedup training by 2, not only saving cost on the infra side but also lifting limitations that existed because of memory.

這值得付出努力：我們能夠將內存占用減少4倍，將培訓速度提高2倍，不僅節省了基礎方面的成本，而且還消除了由于內存而存在的限制。

A pipeline made of a build and a training job由構建和培訓工作組成的管道

DeepR可與Hadoop(HDFS，Yarn)，MlFlow和Graphite一起使用 (DeepR works with Hadoop (HDFS, Yarn), MlFlow and Graphite)

One of DeepR’s strengths is its tight integration with Hadoop, especially HDFS and Yarn, thanks in part to

DeepR的優勢之一是其與Hadoop的緊密集成，尤其是HDFS和Yarn，這部分要歸功于

tf-yarn (a library to train Estimators on yarn, also created at Criteo)
tf-yarn (用于在紗線上訓練估算器的庫，也是在Criteo創建的)
pex (a library to generate Python executables)
pex (用于生成Python可執行文件的庫)
pyarrow (Apache Arrow binding)
pyarrow (Apache Arrow綁定)

In practice, this means that there is no additional development time to train a model locally or on a Yarn cluster.

實際上，這意味著沒有額外的開發時間來在本地或在Yarn集群上訓練模型。

Package the Python environment as a pex on HDFS, upload the config to MlFlow, and run the job on Yarn.將Python環境打包為HDFS上的pex，將配置上傳到MlFlow，然后在Yarn上運行作業。

DeepR also provides a suite of tools to use MLFlow for logging metrics and parameters, with support for distributed training and job scheduling on remote clusters.

DeepR還提供了一套工具來使用MLFlow記錄指標和參數，并支持遠程集群上的分布式培訓和作業計劃。

Add the ability to save config files as artifacts, and now we have full-reproducibility, an easy way to track and compare experiments as well as a centralized place for all config files, ready for deployment!

添加將配置文件保存為工件的功能，現在，我們具有完全可重復性 ，這是一種簡單的方法來跟蹤和比較實驗，還為所有配置文件提供了一個集中的位置，可供部署！

DeepR還可以幫助估計器進行模型定義 (DeepR can also help with models definition for Estimators)

DeepR also adopts a functional approach to model and layers definition, similar to TRAX, Thinc, or the Keras functional API.

DeepR還采用功能性方法來建模和定義圖層，類似于TRAX ， Thinc或Keras 功能性API 。

While TensorFlow and PyTorch provide a low-level declarative approach to graph definition, the Estimator API around which DeepR is built works better with functional programming (especially true with TF1 variable management), and we found it easier to manipulate higher-level logic blocks (layers) as functions, chaining them in Directed Acyclic Graphs.

盡管TensorFlow和PyTorch提供了一種低級的聲明式方法來定義圖形，但構建DeepR的Estimator API可以更好地與函數式編程(尤其是TF1變量管理)兼容，并且我們發現更容易操作高級邏輯塊(層)作為函數 ，將它們鏈接到有向非循環圖 。

A model made of one embedding layer and a Transformer.由一個嵌入層和一個Transformer組成的模型。

In that way, the Layer abstraction provided by DeepR can be seen as a simple way to define graphs for the Estimator API. However, note that it provides this capability as a bonus, since the rest of the code base makes no assumption on how graphs are created.

這樣，DeepR提供的“ Layer抽象”可以看作是為Estimator API定義圖形的簡單方法。但是，請注意，它提供了此功能作為獎勵，因為其余的代碼庫均未假設如何創建圖形。

DeepR附帶了一套用于操作TF對象的工具 (DeepR comes with a suite of tools to manipulate TF objects)

Finally, DeepR comes with some custom hooks, readers, predictors, jobs, preprocessors, etc. that bundle TensorFlow code. It is very similar to the legacy tf.contrib module, as a collection of missing higher-level tools to manipulate native types.

最后，DeepR附帶了一些捆綁了TensorFlow代碼的自定義掛鉤， 閱讀器 ， 預測變量，作業，預處理器等。它與遺留的tf.contrib模塊非常相似，只是缺少了一些用于處理本機類型的高級工具。

ToExample and 用β-和TFRecordWriterTFRecordWriter tfrecords

開始吧 (Get started)

You can use DeepR as a simple Python library, reusing only a subset of the concepts or build your extension as a standalone Python package that depends on deepr. DeepR includes a few pre-made layers and preprocessors, as well as jobs to train models on yarn.

您可以將DeepR用作簡單的Python庫 ，僅重用概念的一部分，也可以將擴展構建為依賴于deepr的獨立Python包。 DeepR包括一些預制層和預處理器，以及在紗線上訓練模型的工作。

For a short introduction on DeepR, have a look at the quickstart (on Colab).

對于DeepR的簡短介紹，請查看快速入門 (在Colab上)。

The submodule examples of deepr illustrates what packages built on top of DeepR would look like. It defines custom jobs, layers, preprocessors, macros as well as configs. Once your custom components are packaged in a library, you can easily run any pipeline, locally or on Yarn with

deepr的子模塊示例說明了構建在DeepR之上的軟件包的外觀。它定義了自定義作業，層，預處理器，宏以及配置。將自定義組件打包到庫中后，您可以輕松地在本地或在Yarn上運行任何管道

deepr run config.json macros.json

關于Criteo的推薦系統的信息 (A word about recommender systems at Criteo)

Some of our recommendation systems adopt the following approach. Given a timeline of items represented as vectors, a model predicts another vector meant to capture the user’s interests in the same space. Recommendations are the nearest neighbors of that user’s embedding.

我們的某些推薦系統采用以下方法。給定以矢量表示的項目的時間軸，模型會預測另一個矢量，該矢量旨在捕捉用戶在相同空間中的興趣。建議是該用戶嵌入的最近鄰居 。

At training time, the model contains a lot of parameters: one embedding for each item as well as model parameters.

在訓練時，模型包含許多參數：每個項目都嵌入一個參數以及模型參數。

At inference time, we cannot reasonably imagine scoring a user embedding against all possible products. Using fast neighbor search algorithms like HNSW we delegate the ranking step to another service. The model only predicts the user embedding.

在推論時， 我們無法合理地想象對所有可能產品進行嵌入的用戶得分 。使用HNSW之類的快速鄰居搜索算法，我們將排名步驟委托給另一項服務。該模型僅預測用戶嵌入。

At inference time, the model sends a user embedding to an HNSW index instead of computing a Softmax over all possible movies.在推論時，模型會發送嵌入到HNSW索引的用戶，而不是在所有可能的電影上計算Softmax。

This use-case illustrates how much more complex real-life machine learning pipelines are. Not only do we need to define a graph and train the model, but we also have to support a different behavior at inference time, export some of the variables to other services (in this example, the items embeddings), etc.

該用例說明了現實生活中的機器學習管道要復雜得多。我們不僅需要定義圖并訓練模型，而且還必須在推理時支持不同的行為 ，將一些變量導出到其他服務(在此示例中為項目嵌入)，等等。

Such a formulation has the advantage of transparency, as you can easily retrieve similar products with nearest neighbors search. In this example, you can see that picking a very specific product from a Criteo partner (board games) returns very similar products in another partner completely, compared to other more standard approaches (Best Ofs, etc.)

這樣的表述具有透明性的優勢，因為您可以通過最近鄰居搜索輕松檢索相似的產品。在此示例中，您可以看到，與其他更標準的方法(Best Ofs等)相比，從Criteo合作伙伴(棋盤游戲)中挑選非常特定的產品可以完全在另一個合作伙伴中返回非常相似的產品。

Comparison of different recommendations比較不同建議

在MovieLens數據集上使用DeepR (Using DeepR on the MovieLens dataset)

MovieLens is a standard dataset for recommendation tasks. It consists of movie ratings, anonymously aggregated by users. For a given user with some viewing history, the goal is to make the best movie recommendations.

MovieLens是推薦任務的標準數據集。它包含電影分級，由用戶匿名匯總。對于具有一定觀看歷史記錄的給定用戶，目標是提出最佳的電影推薦 。

We implement a simple baseline. Each movie is associated with an embedding and a bias. Given a user, a representation is computed as the average of the embeddings of movies seen in the past. The score of any recommendable movie is the inner product of the user embedding with the movie’s embedding + the movie’s bias.

我們實現了一個簡單的基準。每部電影都與嵌入和偏差相關聯。給定用戶，表示形式將作為過去看過的電影嵌入的平均值進行計算。任何推薦電影的得分都是用戶嵌入的內積與電影的嵌入量+電影的偏見。

During training, we train the embeddings and the biases, optimizing a BPR loss that encourages “good” recommendations to get better scores than “bad” recommendations.

在訓練期間，我們訓練嵌入和偏差，優化BPR損失，鼓勵“好”建議獲得比“差”建議更好的分數。

It is also possible to use fancier models, like a Transformer, but we found it to be relatively unstable to train and not necessarily worth the effort from a production perspective.

也可以使用更高級的模型，例如Transformer ，但是我們發現它的訓練相對不穩定，從生產的角度來看不一定值得付出努力。

You can have a look at the AverageModel as well as the corresponding BPRLoss implementations on Github, or train your model using either the config files or the Notebook on Google Colab.

您可以在Github上查看A verageModel以及相應的BPRLoss實現，或者使用config文件或Google Colab上的Notebook訓練模型。

The pipeline is made of 4 steps

管道由4個步驟組成

Step 1: Given the MovieLens ratings.csv file, create tfrecords for the training, evaluation, and test sets.
第1步：考慮到MovieLens ratings.csv文件，創建tfrecords的培訓，評估和測試集。
Step 2: Train an AverageModel (optionally, use tf-yarn to distribute training and evaluation on a cluster) and export the embeddings as well as the graph.
步驟2：訓練AverageModel (可選，使用tf-yarn在群集上分配訓練和評估)，并導出嵌入以及圖形。
Step 3: Write predictions, i.e. for all the test timelines, compute the user representation.
步驟3：編寫預測，即針對所有測試時間線，計算用戶表示。
Step 4: Evaluate predictions, i.e. look how similar the recommendations provided by the model are to the actual movies seen by the user in the “future”
步驟4：評估預測，即查看模型提供的建議與用戶在“未來”中看到的實際電影的相似程度

Parameters and metrics (screenshot from MlFlow) — recall@50 is 37%參數和指標(來自MlFlow的屏幕截圖)—召回率50為37％

We run the pipeline on Yarn, monitoring progress with MlFlow, and then reload the embeddings to visualize their properties.

我們在Yarn上運行管道，使用MlFlow監視進度，然后重新加載嵌入以可視化其屬性。

Five most similar movies五部最相似的電影

結論 (Conclusion)

Go ahead, start playing with the notebooks, and please report any feedback on issues!

繼續，開始使用筆記本電腦，請報告對問題的任何反饋！