當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

rcp rapido_Rapido使用数据改善乘车调度

發(fā)布時(shí)間：2023/11/29 编程问答 29 豆豆

生活随笔收集整理的這篇文章主要介紹了 rcp rapido_Rapido使用数据改善乘车调度小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

rcp rapido

Given our last blog post of the series, which can be found here :

鑒于我們?cè)谠撓盗兄械淖詈笠黄┛臀恼?#xff0c;可以在這里找到：

We thought it would be helpful to explain how we implemented all of the above into an on-ground experiment. We mentioned above about how the lack of a logical time-based control group forced us to pivot to geo-temporal control formation. I would like to take this opportunity to talk about an experiment we ran as part of the Dispatch team @ Rapido.

我們認(rèn)為將上述所有內(nèi)容如何實(shí)施到地面實(shí)驗(yàn)中會(huì)有所幫助。上面我們提到了缺乏基于時(shí)間的邏輯控制組如何迫使我們轉(zhuǎn)向地時(shí)控制結(jié)構(gòu)。我想借此機(jī)會(huì)談?wù)撟鳛镈ispatch團(tuán)隊(duì)@ Rapido的一部分進(jìn)行的一項(xiàng)實(shí)驗(yàn)。

什么是乘車調(diào)度？ (What is a Ride Dispatch?)

The system that decides which order request (when you tap the Request Rapido button, aka the Book my Ride button, on your app) should be sent to which particular Captain(s) to ensure that the Captain reaches the customer in the quickest and most efficient way possible, is called ‘Dispatch’. It is an homage to the days of old when Taxi services were run over the telephone and a Customer who had called in for a pickup would be patched through to an Agent who would find a willing cabbie (often after multiple calls) and that driver was “dispatched” for that order.

決定哪個(gè)訂單請(qǐng)求(當(dāng)您點(diǎn)擊應(yīng)用上的Request Rapido按鈕，也就是“預(yù)訂我的乘車”按鈕時(shí) )的系統(tǒng)應(yīng)該發(fā)送給哪個(gè)特定船長(zhǎng)，以確保船長(zhǎng)以最快，最快捷的方式到達(dá)客戶高效的方法稱為“調(diào)度”。這是對(duì)過去的日子的敬意，當(dāng)時(shí)出租車服務(wù)是通過電話運(yùn)行的，而要求接機(jī)的客戶會(huì)被派遣到一個(gè)代理商，該代理商會(huì)找到愿意的出租車司機(jī)(通常是在多次打電話之后)，而那個(gè)司機(jī)是“派遣”該訂單。

Dispatch is one of the key levers of a ride-hailing marketplace. It is one of those systems that EVERY ride request has to propagate through, hence the room for error is low, with the stakes being very high.

調(diào)度是乘車市場(chǎng)的關(guān)鍵杠桿之一。它是每個(gè)乘車請(qǐng)求都必須傳播的系統(tǒng)之一，因此錯(cuò)誤空間很小，風(fēng)險(xiǎn)很高。

One of the first questions we had to answer while even thinking of a product to build was, “What metrics do we look at to see if marketplace conditions are being improved”? Is the ETA the gold metric for this system, or do we look at other things like Matching Time, Distance Driven by the captain to get to the customer, and cancellations from both the demand and supply sides? We definitely had to be cognizant of these metrics while evaluating any changes to our system.

我們甚至在考慮要生產(chǎn)的產(chǎn)品時(shí)，必須回答的第一個(gè)問題是：“ 我們看什么指標(biāo)才能確定市場(chǎng)條件是否正在改善 ”？ ETA是該系統(tǒng)的黃金指標(biāo)，還是我們要考慮其他方面，例如比賽時(shí)間，由船長(zhǎng)駕駛到達(dá)客戶的距離以及需求方和供應(yīng)方的取消？在評(píng)估我們系統(tǒng)的任何更改時(shí)，我們絕對(duì)必須意識(shí)到這些指標(biāo)。

在我們開始重建它之前，Dispatch @ Rapido是什么樣的？ (What was Dispatch @ Rapido like before we started rebuilding it?)

Going into the rebuilding process, the current dispatch system was a simple radial system, where a customer requests a ride on the app, and the system draws a circle of radius say 2 km, and looks at all the captains in that area, calculates the crow-flying distance to the customer, and propagates the ping in order.

進(jìn)入重建過程，當(dāng)前的調(diào)度系統(tǒng)是一個(gè)簡(jiǎn)單的放射狀系統(tǒng)，客戶請(qǐng)求在應(yīng)用程序上乘車，該系統(tǒng)繪制一個(gè)半徑為2 km的圓，并查看該區(qū)域中的所有機(jī)長(zhǎng)，烏鴉飛到客戶的距離，并按順序傳播ping。

As a first solution, this is fine, but discerning data enthusiasts can probably find many issues with this system — how to design the optimal radius, what happens if there is a huge divider like a ring-road or a railway crossing that results in a short euclidean distance but long route based distance. In the latter case, this would be categorized as a sub-optimal match, as now the captain has to spend more time driving empty kilometers to reach the customer, and the customer gets frustrated about being matched to a captain who looks close by but takes twice the time to reach the pickup location.

作為第一個(gè)解決方案，這很好，但是有眼光的數(shù)據(jù)愛好者可能會(huì)發(fā)現(xiàn)此系統(tǒng)存在許多問題-如何設(shè)計(jì)最佳半徑，如果存在像環(huán)形公路或鐵路交叉路口這樣的巨大分隔線而導(dǎo)致行駛速度變慢，會(huì)發(fā)生什么情況？歐幾里得距離短，但基于路徑的距離長(zhǎng)。在后一種情況下，這將歸類為次優(yōu)比賽，因?yàn)楝F(xiàn)在船長(zhǎng)不得不花更多時(shí)間駕駛空曠的里程才能到達(dá)客戶，并且客戶對(duì)與看上去很近但是卻要走近的船長(zhǎng)感到沮喪到達(dá)取件地點(diǎn)的時(shí)間兩次。

This specific use case can be reduced to a higher-level question: for a given pickup location, is there a corresponding nearby area that should be geo-fenced when considering it to be a part of the “dispatch radius”?

可以將這個(gè)特定的用例簡(jiǎn)化為一個(gè)更高層次的問題：對(duì)于給定的取貨地點(diǎn)，在將其視為“派發(fā)半徑”的一部分時(shí)，是否應(yīng)該對(duì)相應(yīng)的附近區(qū)域進(jìn)行地理圍欄？

Furthermore, is there a location that is potentially further away in a euclidean sense, but closer by in terms of driving time?

此外，是否存在一個(gè)可能在歐幾里得距離更遠(yuǎn)但在行車時(shí)間更近的位置？

通過支付maps API可以緩解這個(gè)問題嗎？ (Won’t this problem be alleviated by paying for a maps API?)

Too expensive at a per-request level. Right now, even though we are at 20% of our pre-COVID levels (and recovering every week!), servicing each request via google-maps API would be prohibitively expensive for a growing startup like Rapido, especially in these times where innovation is warranted. The goal was to deploy a smart solution, without breaking the bank, that would still have a high impact on the ground.

在每個(gè)請(qǐng)求級(jí)別上太貴了。現(xiàn)在，即使我們的使用率達(dá)到了COVID認(rèn)證前的水平的20％(并且每周都在恢復(fù)！)，對(duì)于像Rapido這樣的新興創(chuàng)業(yè)公司而言，通過google-maps API服務(wù)每個(gè)請(qǐng)求的費(fèi)用實(shí)在是太高了，尤其是在這些創(chuàng)新的時(shí)代保證。我們的目標(biāo)是在不中斷資金的情況下部署智能解決方案，這仍然會(huì)對(duì)現(xiàn)場(chǎng)產(chǎn)生重大影響。

建立行車時(shí)間估算 (Building the driving time estimates)

The most crucial component of a smart Dispatch system is having reliable driving time estimates. This is essentially built by leveraging the huge store of data available to us from our historical rides. As part of our internal logging, we record the time taken from :

智能調(diào)度系統(tǒng)最關(guān)鍵的部分是可靠的行駛時(shí)間估算。這本質(zhì)上是通過利用我們過去的經(jīng)驗(yàn)為我們提供的大量數(shù)據(jù)來構(gòu)建的。作為內(nèi)部日志記錄的一部分，我們記錄以下時(shí)間：

The captain to the customer aka the ETA

客戶的船長(zhǎng)又稱ETA

The customer’s pickup to the customer’s drop aka the Ridetime

顧客接送顧客的乘車時(shí)間

Each part of this gives us more coverage within a city in terms of pickup-to-drop driving times. The ETA gives us short-distance coverage, and the Ridetime gives us longer-distance coverage. We combine the two sources of data and group-by at a time-of-day and a day-of-week level, remove outliers, add a few filters for the minimum amount of rides being done in that bucket to be considered valid, and store the output in a dataset to be consumed by any concerned team.

從接送車的時(shí)間來看，每個(gè)部分都為我們提供了更多城市覆蓋范圍。 ETA給我們短距離覆蓋，而Ridetime給我們長(zhǎng)距離覆蓋。我們將兩種數(shù)據(jù)來源結(jié)合在一起，并按一天中的某天和一周中的某天進(jìn)行分組，刪除異常值，添加一些過濾器以使在該存儲(chǔ)分區(qū)中執(zhí)行的最少乘車次數(shù)被視為有效，并將輸出存儲(chǔ)在數(shù)據(jù)集中以供任何相關(guān)團(tuán)隊(duì)使用。

設(shè)計(jì)實(shí)驗(yàn) (Designing the experiment)

Once we have a pickup-to-drop driving time map, at a time-of-day and day-of-week level, we now get to the dirty work of actually designing an experiment. The first step was to answer the question of, “for a pickup location, can we find a close-by area that has a worse driving time to the source than a further away area”. I will use this segue to introduce some of the terminologies we use in this regard :

一旦有了一天中和一周中某天的上下班駕駛時(shí)間圖，我們就可以開始實(shí)際設(shè)計(jì)實(shí)驗(yàn)的工作了。第一步是回答以下問題：“對(duì)于接送地點(diǎn)，我們能找到距離源頭行駛時(shí)間比遠(yuǎn)離區(qū)域更差的附近區(qū)域”的問題。我將使用這種方法來介紹我們?cè)谶@方面使用的一些術(shù)語：

source_hex : the Uber h3 derived hex8 in which the ride request originates

source_hex ：Uber h3派生的hex8，乘車請(qǐng)求起源于此

bad_hex : the Uber h3 derived hex8, which is closer to the source_hex geometrically, but not while driving

bad_hex ：Uber h3派生的hex8，在幾何上更接近source_hex，但在行駛時(shí)不是

good_hex : the Uber h3 derived hex8, which is further away from the source_hex geometrically, but has a faster driving time than the bad_hex

good_hex ：Uber h3派生的hex8，在幾何上距離source_hex較遠(yuǎn)，但是驅(qū)動(dòng)時(shí)間比bad_hex快

We do this analysis at a time_of_day and day_of_week level, so a trio of HexA HexB and HexC could be mapped as : Source_hex -> HexA, Bad_hex -> HexB, Good_hex -> HexC on a Monday morning, but on a Sunday evening, it is not necessary that HexB and HexC’s relative driving times to HexA are the same. We were cognizant to not make too many dangerous assumptions here.

我們?cè)趖ime_of_day和day_of_week級(jí)別進(jìn)行此分析，因此可以將HexA HexB和HexC的三個(gè)映射為： Source_hex-> HexA，Bad_hex-> HexB，Good_hex-> HexC在星期一的早晨，但是在周日的晚上， HexB和HexC相對(duì)于HexA的相對(duì)行駛時(shí)間不必相同。我們意識(shí)到在這里不要做太多危險(xiǎn)的假設(shè)。

An example of a source_hex, bad_hex and good_hexsource_hex，bad_hex和good_hex的示例

Here the brown hex is the source_hex, the yellow hex is the bad-level-1-hex and the orange one is the good-level-2-hex. Now, from this map, it is not clear what is the reason for the increase in ridetime from yellow to brown as opposed to orange to brown. But when we look at the google maps view it becomes evident :

這里棕色的十六進(jìn)制是source_hex，黃色的十六進(jìn)制是壞1進(jìn)制，而橙色的是好2進(jìn)制。現(xiàn)在，從這張地圖上，不清楚從黃色到棕色而不是橙色到棕色的行駛時(shí)間增加的原因是什么。但是，當(dāng)我們查看谷歌地圖視圖時(shí)，它變得顯而易見：

We see that the brown and orange hex8s are bifurcated by a huge railway track ( Vijayawada is one of the biggest railway junctions in the country and regularly reports trains crossing road tracks ). On the other hand, the orange hex8 has clear unfettered access to the source_hex.

我們看到棕色和橙色的hex8s被一條巨大的鐵路軌道分叉(維賈亞瓦達(dá)(Vijayawada)是該國(guó)最大的鐵路樞紐之一，并定期報(bào)告火車穿越道路)。另一方面，橙色的hex8可以不受限制地訪問source_hex。

Once we have the universe of such hex trios, we are back to the problem or how to do a test-control split. Given that time-based control is not an option, we tried to use features ( relevant to dispatch) of each hex-trio and passed it through a vector-similarity measure to calculate the similarity scores of each pair restricted by both of them having the same day and time at which the hex-trio is valid ( aka both test and control source hexes have bad and good hexes on the same day and time period ).

一旦有了這樣的十六進(jìn)制三重奏的宇宙，我們就回到了問題或如何進(jìn)行測(cè)試控制拆分。鑒于基于時(shí)間的控制不是一種選擇，我們嘗試使用每個(gè)十六進(jìn)制三元組的功能(與分派有關(guān))，并將其通過向量相似度度量傳遞，以計(jì)算受其限制的每對(duì)相似度得分。十六進(jìn)制三重奏有效的同一天和同一時(shí)間(又稱測(cè)試和對(duì)照源十六進(jìn)制在同一天和同一時(shí)間段內(nèi)有壞和好十六進(jìn)制)。

Example of a test group測(cè)試組示例 Example of a control group對(duì)照組的例子

It doesn’t make a lot of sense to say HexA on a Monday morning is similar to HexB on a Wednesday afternoon. So we only do the split if HexA and HexB are both source_hexes on the same day and time period.

在星期一早上說HexA與在星期三下午說HexB并沒有多大意義。因此，僅當(dāng)HexA和HexB均為同一日期和時(shí)間段的source_hexes時(shí)，才進(jìn)行拆分。

Once the above is done for each pair in the universe, we start building the test control split to ensure that no hex in the test group is also in the control group through some other mapping, as this would contaminate the experiment results.

一旦對(duì)Universe中的每個(gè)對(duì)完成上述操作，我們便開始構(gòu)建測(cè)試控件組，以確保通過其他映射，測(cè)試組中的十六進(jìn)制也不會(huì)出現(xiàn)在對(duì)照組中，因?yàn)檫@會(huì)污染實(shí)驗(yàn)結(jié)果。

Given that we now have our test-control split, the measure we take is that the test-group source_hexes will have the good_hex included and bad_hex excluded when creating the “dispatch radius”, whereas the control-group will not have the bad_hex excluded and good_hex included. Given that everything else remains the same, the test group should show a reduced ETA compared to the control group post-experiment.

鑒于我們現(xiàn)在已經(jīng)進(jìn)行了測(cè)試控制拆分，因此我們采取的措施是，在創(chuàng)建“分發(fā)半徑”時(shí)，測(cè)試組source_hexes將包含good_hex，而bad_hex將被排除，而控制組將不排除bad_hex，并且包括good_hex。考慮到其他所有條件均保持不變，與實(shí)驗(yàn)后的對(duì)照組相比，測(cè)試組的ETA應(yīng)當(dāng)降低。

We then ran this experiment for 2 weeks and tried to get 1000+ orders cumulatively in both the test group and the control group, so we don’t suffer from data-sparsity while analysing what happened.

然后，我們進(jìn)行了2周的實(shí)驗(yàn)，并試圖在測(cè)試組和對(duì)照組中累計(jì)獲得1000多個(gè)訂單，因此在分析發(fā)生的情況時(shí)，我們不會(huì)遭受數(shù)據(jù)稀疏的困擾。

實(shí)驗(yàn)結(jié)果 (Experiment Results)

We ran this experiment in Hyderabad where we saw an ETA reduction in test group vs control group of around 9% when comparing the Median ETAs and almost 13% when comparing Mean ETAs. Pre experiment the test and control groups had a difference of only about 3% when looking at both Mean and Median ETAs, thus showing us that the changes we made actually added value to on-ground ETAs.

我們?cè)诤５美瓦M(jìn)行了該實(shí)驗(yàn)，與中位數(shù)ETA相比，測(cè)試組與對(duì)照組的ETA降低了約9％，而平均ETA則降低了近13％。實(shí)驗(yàn)前，測(cè)試組和對(duì)照組在平均和中位數(shù)ETA上的差異僅為3％，因此向我們表明，所做的更改實(shí)際上為地面ETA增值。

We know that no experiment can be called successful without statistical tests of significance, so we went into the experiment having defined our hypothesis as follows:

我們知道，沒有顯著性統(tǒng)計(jì)檢驗(yàn)就不能說成功就是實(shí)驗(yàn)，因此我們按照以下假設(shè)進(jìn)行假設(shè)的實(shí)驗(yàn)：

H0 ( Null Hypothesis ) : Hex based swaps have no effect on realized ETAs

H0(零假設(shè)) ：基于十六進(jìn)制的交換對(duì)已實(shí)現(xiàn)的ETA無效

H1 ( Alternate Hypothesis ) : Hex based swaps DO have an effect on realized ETAs

H1(備用假設(shè)) ：基于十六進(jìn)制的互換確實(shí)會(huì)影響已實(shí)現(xiàn)的ETA

Rejecting the null hypothesis at a significance level of above 95% is the gold standard that we were striving for, and we are happy to report that we achieved statistically significant results at a level of around 98%, with p-values in the 0.01 range when using a few statistical tests of significance.

我們一直追求的金標(biāo)準(zhǔn)是拒絕高于95％的顯著性水平的零假設(shè)，并且我們很高興地報(bào)告我們?cè)?8％左右的水平上取得了統(tǒng)計(jì)學(xué)上顯著的結(jié)果，p值在0.01范圍內(nèi)在使用一些有意義的統(tǒng)計(jì)檢驗(yàn)時(shí)。

When viewed visually, what we got was something similar to this :

從視覺上看，我們得到的類似于以下內(nèi)容：

Test vs Control group change visualized -> Blue vertical line represents the mean of the test-group, and the Purple vertical line represents the mean of the control-group測(cè)試組與對(duì)照組的變化可視化->藍(lán)色豎線表示測(cè)試組的平均值，紫色豎線表示對(duì)照組的平均值

What this image is telling us, is that when viewed on a relative scale and after adjusting for pre-experiment ETA delta, we have shifted the center of the test group ETA distribution towards the lower side when compared to the control ETA group, thus showing that our changes have made an impact in lowering ETAs as we expected.

這張圖片告訴我們的是，以相對(duì)比例查看并調(diào)整了實(shí)驗(yàn)前的ETA增量后，與對(duì)照組ETA組相比，我們已將測(cè)試組ETA分布的中心向下方移動(dòng)，我們的更改對(duì)降低ETA產(chǎn)生了預(yù)期的影響。

結(jié)論 (Conclusion)

The high-level goal as mentioned at the start was to improve a key aspect of dispatch: ETA. We wanted to add a good amount of value by doing something that was not cost-intensive, rather by doing something that leveraged the technology and information we already had. This is the hallmark of any data-science team, to use common sense and best practices to uncover hidden insights using as simple an approach as possible.

一開始提到的高級(jí)目標(biāo)是改進(jìn)調(diào)度的一個(gè)關(guān)鍵方面：ETA。我們想通過做一些不耗費(fèi)成本的事情，而是通過利用我們已經(jīng)擁有的技術(shù)和信息，來增加很多價(jià)值。這是任何數(shù)據(jù)科學(xué)團(tuán)隊(duì)的標(biāo)志，可以使用常識(shí)和最佳實(shí)踐以盡可能簡(jiǎn)單的方法來發(fā)現(xiàn)隱藏的見解。

If you enjoyed this blog post, check out what we’ve posted so far over here, and keep an eye out on the same space for some really cool upcoming blogs in the near future. If you have any questions about the problems we face as Data Scientists at Rapido, about transitioning to a start-up after a few years in a different field, or about anything else, please reach out to me on LinkedIn or on siddharth.p@rapido.bike, I look forward to answering any questions!

如果您喜歡這篇博客文章，請(qǐng)查看我們到目前為止在這里發(fā)布的內(nèi)容，并在不久的將來留意相同的空間來關(guān)注一些即將發(fā)布的非常酷的博客。如果您對(duì)我們?cè)赗apido擔(dān)任數(shù)據(jù)科學(xué)家時(shí)遇到的問題，在其他領(lǐng)域工作幾年后要過渡到初創(chuàng)公司或其他任何問題有任何疑問，請(qǐng)通過LinkedIn或siddharth.p @與我聯(lián)系。 Rapido.bike ，我期待回答任何問題！

翻譯自: https://medium.com/rapido-labs/improving-dispatch-with-data-6a307dab7ecc