當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

通讯网关 api网关_伟大的api网关迁移

發布時間：2023/12/20 编程问答 39 豆豆

生活随笔收集整理的這篇文章主要介紹了通讯网关 api网关_伟大的api网关迁移小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

通訊網關 api網關

背景 (Background)

We did not always have a standalone API Gateway.

我們并不總是擁有獨立的API網關。

The first version of the application that we launched in 2015 was a .NET monolith that served all the borrowers and investors on our platform. Authentication, authorization, and business logic resided in the same place. This made sense at that point because we had a really small team of engineers and this set-up enabled us to be nimble and move fast.

我們于2015年推出的應用程序的第一個版本是.NET整體，可為平臺上的所有借款人和投資者提供服務。身份驗證，授權和業務邏輯位于同一位置。當時這很有意義，因為我們只有一個非常小的工程師團隊，并且這種設置使我們能夠靈活敏捷并快速行動。

However, over the course of the subsequent year, more engineers joined our team in both Singapore and Indonesia. By mid-2016, it became clear to us, given the size of the team and the product roadmap at hand, that the monolith was quickly becoming a serious bottleneck. It was getting hard to orchestrate the deployment of new features especially because coordination could only happen asynchronously over Slack, considering the distributed nature of the team.

但是，在接下來的一年中，更多的工程師加入了我們在新加坡和印度尼西亞的團隊。鑒于團隊的規模和手頭的產品路線圖，到2016年年中，我們已經很清楚，整體已經Swift成為嚴重的瓶頸。考慮到團隊的分布式特性，協調新功能的部署變得越來越困難，特別是因為協調只能在Slack上異步進行。

To tackle this problem, we decided to move to a service-oriented architecture. The core components were rewritten as NodeJS web services, and many other ancillary services were written in either Python or Java. It took us around six months to migrate the whole system. It was in early 2017 that we launched our micro-services backed production system. It also marked the debut of our own in-house API Gateway built on top of the Spring framework.

為了解決這個問題，我們決定轉向面向服務的體系結構。核心組件被重寫為NodeJS Web服務，而許多其他輔助服務則用Python或Java編寫。我們花了大約六個月的時間來遷移整個系統。在2017年初，我們推出了微服務支持的生產系統。這也標志著我們自己的基于Spring框架構建的內部API網關的首次亮相。

High-level of our architecture with our in-house API Gateway內部API網關可提供高層架構

With this new architecture, our engineers could now work more efficiently. We had smaller teams, each responsible for one or more of the new micro-services. They could develop and deploy new features themselves whenever they wanted as long as they could keep the API contracts intact. At this point, our public API set was still small. We were mostly focusing on rewriting and optimising our business logic. There were not many requests to add new endpoints to the API Gateway or update the configuration of existing ones. Therefore, we felt that we were in good shape.

通過這種新架構，我們的工程師現在可以更高效地工作。我們有較小的團隊，每個團隊負責一項或多項新的微服務。只要可以保持API合同完整，他們就可以隨時開發和部署新功能。此時，我們的公共API集仍然很小。我們主要專注于重寫和優化業務邏輯。將新端點添加到API網關或更新現有端點的配置的請求并不多。因此，我們覺得自己身體狀況良好。

At the end of 2018 however, our API Gateway had become the new bottleneck. Our core systems were now stable after months of work but we now had a backlog of feature requests from Business & Product teams to take on. Many of these features, which were targeted at either external users or internal staff members, required the addition of new APIs to the Gateway. It was at this juncture that the limitations of the custom API Gateway became apparent.

但是在2018年底，我們的API網關已成為新的瓶頸。經過幾個月的工作，我們的核心系統現已穩定，但現在我們的業務和產品團隊需要處理大量功能請求。其中許多針對外部用戶或內部工作人員的功能都需要向網關添加新的API。正是在這一時刻，定制API網關的局限性顯而易見。

I must clarify first that I don’t think there is anything wrong with the API Gateway pattern. The problem was with the way we had built our in-house API Gateway. It was not designed to be extensible or configurable. There was no Admin API available for either run-time operations or inspection. To add a new API endpoint or to enable a new behaviour (eg. rate-limiting) for an existing endpoint, we had to add some glue code, then rebuild and re-deploy the service which was a slow and painful process.

首先，我必須澄清一下，我認為API網關模式沒有任何問題。問題出在我們內部API網關的構建方式上。它并非設計為可擴展或可配置的。沒有可用于運行時操作或檢查的Admin API。要添加新的API端點或為現有端點啟用新行為( 例如，速率限制 )，我們必須添加一些粘合代碼，然后重建并重新部署服務，這是一個緩慢而痛苦的過程。

The code snippet above illustrates how a new route was exposed on the Gateway上面的代碼段說明了如何在網關上公開新路由

While the design had seemed reasonable in early 2017, we realised that we had outgrown it when we observed the following limitations:

盡管該設計在2017年初看起來是合理的，但是當我們觀察到以下限制時，我們意識到我們已經超出了設計的范圍：

在上面開發并不容易 (It was not easy to develop on)

Even though we had a micro-services architecture, the custom Java Gateway project was still shared across the team. If a feature required new API endpoints to be added to the Gateway, developers had to branch from the master branch; add in their code to expose the API endpoints; and open a pull request. It was not a lot of work but was still cumbersome. There were also times when we had to cherry-pick or deal with merge conflicts.

即使我們擁有微服務架構，定制的Java Gateway項目仍在整個團隊中共享。如果某個功能需要將新的API端點添加到網關，則開發人員必須從master分支分支；添加其代碼以公開API端點；并打開請求請求。這不是很多工作，但仍然很麻煩。有時候，我們不得不選擇或解決合并沖突。

測試起來并不容易 (It was not easy to test with)

We had a single QA environment where only one version of the API Gateway could run. Since each feature that had API endpoints to expose required its own code branch and an equivalent build of the Gateway, the QA engineers were forced to test features in a sequential manner, which was not ideal.

我們只有一個QA環境，只能運行一個版本的API網關。由于每個具有暴露于API端點的功能都需要其自己的代碼分支和網關的等效構建，因此QA工程師被迫按順序方式測試功能，這并不理想。

檢查起來不容易 (It was not easy to inspect)

Our Product Security team wanted to run automated tests periodically against the Gateway configuration of all Internet-facing API endpoints to identify misconfigured access control/security rules. While all these endpoints were consolidated in the custom API Gateway, the registry could not be programmatically fetched and analysed very easily. We had to write a program that used Java Reflection to extract the information that we needed but this approach had its limitations.

我們的產品安全團隊希望針對所有面向Internet的API端點的網關配置定期運行自動化測試，以識別配置錯誤的訪問控制/安全規則。盡管所有這些端點都已合并到自定義API網關中，但是無法以編程方式輕松獲取和分析注冊表。我們必須編寫一個程序，使用Java Reflection提取所需的信息，但是這種方法有其局限性。

觀察起來并不容易 (It was not easy to observe)

The custom Gateway produced a lot of application logs but it was not available in a structured format (e.g., JSON). Hence, our DevOps team had to spend a lot of time and effort to extract the data to build dashboards and add required alerts. However, the set-up was brittle and error-prone as there were many edge cases that weren’t accounted for.

定制網關生成了許多應用程序日志，但是它沒有結構化格式(例如JSON)。因此，我們的DevOps團隊不得不花費大量時間和精力來提取數據以構建儀表板并添加所需的警報。但是，由于存在很多未說明的極端情況，因此設置非常脆弱且容易出錯。

With all these challenges in mind, we decided that it was time to upgrade our standalone API Gateway. We could either rewrite it or look for a suitable solution externally (by adopting a Proudly Found Elsewhere mindset). As the API Gateway is an important component in our architecture, we knew that replacing it would require a lot of human collaboration. We were also aware that we had already spent a lot of effort in building our custom API Gateway and it was one of the most stable (albeit inflexible) components of our system. However, we all agreed that moving to an extensible, community-maintained solution could boost our productivity and allow us to focus more on our core competencies. Hence, we recruited senior team members from the different squads and kickstarted the Open Source Gateway project internally, codename OHGEE!.

考慮到所有這些挑戰，我們決定是時候升級獨立的API網關了。我們可以重寫它，也可以在外部尋找合適的解決方案(通過采用Proudly Found Elsewhere 心態)。由于API網關是我們體系結構中的重要組成部分，我們知道替換它需要大量的人工協作。我們還意識到，我們已經在構建自定義API網關上花費了很多精力，它是系統中最穩定(盡管不靈活)的組件之一。但是，我們所有人都同意，轉向可擴展的，社區維護的解決方案可以提高我們的生產力，并使我們能夠更加專注于我們的核心競爭力。因此，我們從不同的團隊招募了高級團隊成員，并在內部啟動了代號為OHGEE！的開源網關項目。

市場調查 (Market Research)

The first step was to enumerate all the requirements that we wanted to have in the new solution. Here are some of them:

第一步是枚舉我們希望在新解決方案中具有的所有要求。這里是其中的一些：

Support for our existing authentication mechanism and previously issued access tokens.
支持我們現有的身份驗證機制和先前發布的訪問令牌。
Support for delegated authentication (e.g., Google), which would be required for the Gateway used by our internal applications.
支持委托身份驗證(例如Google)，這是我們內部應用程序使用的網關所必需的。
Extensibility through plugins that we can write.
通過我們可以編寫的插件進行擴展。
Support for structured logging (e.g., JSON) for requests & responses.
支持針對請求和響應的結構化日志記錄(例如JSON)。
Configurability via an Admin API, a configuration file (declarative), or the database.
通過Admin API，配置文件(聲明性文件)或數據庫進行可配置性。
Strong community support.
強大的社區支持。

At that time (late 2018), these were some of the good open-source API Gateway projects that we found:

那時(2018年末)，這些是我們發現的一些優秀的開源API Gateway項目：

Kong
Kong
Zuul
祖爾
KrakenD
KrakenD
Spring Cloud Gateway
Spring Cloud Gateway

We evaluated each of these solutions against our requirements above and came up with the following matrix:

我們根據上述要求評估了每種解決方案，并得出以下矩陣：

Feature Matrix (numbers collected in November 2018)功能矩陣(2018年11月收集的數字)

At this point, Kong clearly stood out. While we were impressed by KrakenD, it was still a fairly young project for us to adopt. We wanted to give it more time to see how it would go on to mature.

在這一點上， Kong顯然很突出。雖然我們對KrakenD印象深刻，但是對于我們來說，這仍然是一個相當年輕的項目。我們想給它更多的時間，看看它會如何成熟。

Meanwhile, we decided to set up a Kong cluster to try out its features and evaluate its performance. We agreed that we would consider moving to Kong only if it passed our performance evaluation.

同時，我們決定建立一個Kong集群，以測試其功能并評估其性能。我們同意，只有在通過我們的績效評估后，我們才會考慮遷移到香港。

績效評估 (Performance Evaluation)

Our goal was to measure the overhead that Kong was going to add to the response latency of requests to the backend micro-services and also to gauge the median throughput that we can expect to achieve.

我們的目標是衡量Kong將增加對后端微服務的請求響應延遲的開銷，并衡量我們可以預期達到的平均吞吐量。

組件 (Components)

Resource server: nginx web server configured to serve a static file
資源服務器：配置為提供靜態文件的Nginx Web服務器
Kong server (located on the same virtual subnet as the resource server)
Kong服務器( 與資源服務器位于同一虛擬子網中 )
Load test client: wrk accessing Kong over the public Internet
負載測試客戶端： wrk通過公共Internet訪問Kong

測試方案 (Test Scenarios)

Base scenario: Client calls the resource server directly
基本方案：客戶端直接調用資源服務器
Test scenario 1: Client calls a public API (no authorization) of the resource server via Kong
測試場景1：客戶端通過Kong調用資源服務器的公共API( 無授權 )
Test scenario 2: Client calls a private API (with authorization) of the resource server via Kong
測試方案2：客戶端通過Kong調用資源服務器的私有API( 具有授權 )

結果 (Results)

# Base Case
$ wrk -c 400 -d60s -t200 --latency .../test.json
...
200 threads and 400 connections
...
Latency Distribution
... 90% 583.05ms
99% 1.38s
...
Requests/sec: 2894.53
...
# Scenario 1
$ wrk -c 400 -d60s -t200 --latency https://*****/test.json
...
200 threads and 400 connections
...
Latency Distribution
... 90% 531.99ms
99% 1.29s
...
Requests/sec: 2793.61# Scenario 2
$ wrk -c 400 -d60s -t200 --latency \
-H 'Authorization: Bearer ************' \
https://*****/test_secured.json
...
200 threads and 400 connections
...
Latency Distribution
... 90% 598.01ms
99% 1.36s
...
Requests/sec: 2590.24

The latency distribution looked good to us at both the 90th and 99th percentiles for the three scenarios tested. We now decided to proceed with the migration to Kong.

對于我們測試的三個場景，延遲分布在第90個百分位和第99個百分位對我們來說都不錯。我們現在決定繼續移民到香港。

部署方式 (Deployment)

For the deployment, we had some non-negotiable requirements:

對于部署，我們有一些不可協商的要求：

Canary Release: Ability to selectively roll-out to some users or cohorts. Ability to stagger roll-out as a percentage of requests received. Eg. 20% of all requests going to Kong. This threshold had to be configurable.
Canary發布：能夠選擇性地向某些用戶或同類群組推廣。能夠錯開推出與所接收請求的百分比。例如。所有請求中有20％會發送到Kong 。此閾值必須是可配置的。
Easy Rollback: Ability to revert quickly in case of any unforeseen problems.
輕松回滾：如有任何不可預見的問題，可以快速還原。
No Downtime: Users should ideally not be aware of this change happening behind the scenes.
沒有停機時間：理想情況下，用戶應該不知道幕后發生的這種變化。

To achieve these, we decided to go with the following architecture:

為了實現這些目標，我們決定采用以下架構：

Deployment architecture部署架構

We used two Kong clusters: One (Kong 1) with just a custom routing plugin that could help us perform the canary roll-out. The other (Kong 2) was our primary cluster with all our routes registered and configured with custom plugins. Kong 1 would route a given request to either the custom Java Gateway or Kong 2 depending on the configuration of its routing plugin. We could update its behaviour dynamically using Kong’s nifty plugin configuration API.

我們使用了兩個Kong集群：一個(Kong 1)只有一個自定義路由插件，可以幫助我們執行Canary推出。另一個(Kong 2)是我們的主要集群，所有路由都通過自定義插件注冊和配置。 Kong 1會將給定的請求路由到自定義Java網關或Kong 2，具體取決于其路由插件的配置。我們可以使用Kong的漂亮插件配置API動態更新其行為。

The first step was to route all requests to Kong 1 instead of our custom Java Gateway. There was no downtime involved in making this switch. We simply updated the DNS record on Cloudflare to point to the new load balancer that now proxied to Kong 1 instead of the old load balancer that had been proxying to the Java Gateway.

第一步是將所有請求路由到Kong 1，而不是我們的自定義Java網關。進行此切換沒有停機時間。我們只是更新了Cloudflare上的DNS記錄，以指向現在代理到Kong 1的新負載均衡器，而不是以前代理Java Gateway的舊負載均衡器。

We started our canary roll-out by only enabling Kong 2 for our internal team members. Issues that were highlighted on Slack were investigated and resolved. We slowly increased the percentage of requests going to Kong 2 by 20% every 2-3 days. While doing this, we kept a close eye on Customer Support tickets coming in to ensure that any issues that were being faced by our customers were not inadvertently caused by the ongoing migration. After two weeks, all the requests were routed to Kong 2. We still kept the set-up running for another two weeks until we were confident that Kong 2 was working well. Then, we executed another switch on Cloudflare by pointing the DNS record to the load balancer that proxied to Kong 2 directly. Finally, we removed all the extraneous components in the migration path and killed the custom Java gateway.

我們僅通過為內部團隊成員啟用Kong 2來開始我們的金絲雀推廣。研究并解決了Slack上突出顯示的問題。我們每2-3天將提交給Kong 2的請求的百分比緩慢地增加20％。在執行此操作時，我們會密切關注即將到來的客戶支持票證，以確保我們的客戶面臨的任何問題都不會由于正在進行的遷移而無意中引起。兩周后，所有請求都路由到Kong 2。我們仍然將設置運行了另外兩個星期，直到我們確信Kong 2運行良好。然后，通過將DNS記錄指向直接代理到Kong 2的負載均衡器，在Cloudflare上執行了另一個切換。最后，我們刪除了遷移路徑中的所有無關組件，并終止了自定義Java網關。

重要要點 (Key Takeaways)

It is almost always a bad idea to accept the first solution that you come across. While it may be tempting to do so as a symptom of a ‘Get things done’ mindset, it is best to not immediately act on that impulse. Instead, list all the possible solution candidates; set the evaluation criteria; evaluate; and then pick the best solution.

幾乎總是一個壞主意接受您遇到的第一個解決方案。盡管這樣做可能很誘人，但卻帶有“把事情做好”的癥狀心態，最好不要立即對這種沖動采取行動。而是列出所有可能的解決方案候選者；設定評估標準；評估; 然后選擇最佳解決方案。

Planning is key to the success of a migration. ‘Failing to plan is planning to fail’, as they say.

規劃是遷移成功的關鍵。他們說，“沒有計劃就是要失敗”。

Know that things can and will go wrong during migrations. Do canary roll-outs for your critical components to minimise risk.

知道在遷移過程中事情可能會出錯。對關鍵組件進行金絲雀推廣，以最大程度地降低風險。

Our methodology to solve problems我們解決問題的方法

And that concludes the story of how we planned and executed our migration to Kong. Hope you found this useful.

這樣就結束了我們如何計劃和執行向香港的遷移的故事。希望您覺得這個有用。

I would like to use this opportunity to thank my colleagues: Quang, Yubo, and Nikolay for working with me during the migration and for helping with the ongoing maintenance of the Kong cluster and our custom plugins.

我想借此機會感謝我的同事： Quang ， Yubo和Nikolay在遷移過程中與我合作，并為持續維護Kong集群和我們的自定義插件提供了幫助。

回饋 (Giving Back)

We are also proud to have been able to contribute back to the Kong community in a small way: Nikolay found an issue with the way the multipart plugin processes requests with multiple files. He went on to address this in a pull request which got merged and will likely be released along with Kong 2.0.5.

我們也為能夠以較小的方式為Kong社區做出貢獻而感到自豪：Nikolay發現multipart 插件處理帶有多個文件的請求的方式存在問題。他繼續在合并請求中提出請求，以解決此問題，該請求可能會與Kong 2.0.5一起發布。

Do check out Nikolay’s post on how we set up a seamless workflow for our engineers to control the registration of Kong routes.

請查看Nikolay的帖子，其中介紹了我們如何為工程師設置無縫工作流以控制Kong路線的注冊。

翻譯自: https://medium.com/fsmk-engineering/the-great-api-gateway-migration-b8e32a9843b4