當前位置：首頁 > 运维知识 > windows >内容正文

windows

一个IO的传奇一生(8) -- elevator子系统

發布時間：2025/5/22 windows 58 豆豆

生活随笔收集整理的這篇文章主要介紹了一个IO的传奇一生(8) -- elevator子系统小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Elevator子系統介紹

Elevator子系統是IO?路徑上非常重要的組成部分，前面已經分析過，elevator中實現了多種類型的調度器，用于滿足不同應用的需求。那么，從整個IO路徑的角度來看，elevator這層主要解決IO的QoS問題，通常需要解決如下兩大問題：

1）Bio的合并問題。主要考慮bio是否可以和scheduler中的某個request進行合并。因為從磁盤的角度來看，臨近的請求需要合并，所有的IO需要順序化處理，這樣磁頭才能往一個方向運行，避免無規則的亂序運行。

2）Request的調度問題。request在何時可以從scheduler中取出，并且送入底層驅動程序繼續進行處理？不同的應用可能需要不同的帶寬資源，讀寫請求的帶寬、延遲控制也可以不一樣，因此，需要解決request的調度處理，從而可以更好的控制IO的QoS。

通過上面分析，一個IO在經過塊設備層處理之后，終于來到了elevator層。我們熟知，一個request在送往設備之前會被放入到每個設備所對應的request?queue。其實，通過分析一個IO在elevator層其實會經過很多request?queue，不同的request?queue會有不同的作用。如下圖所示，一個IO在經歷很多層queue的調度處理之后，最后才能達到每個設備的request?queue。Linux中各個request?queue之間的關系如下圖所示：

在Linux-3.2中，已經采用新的unplug機制對請求進行批量unplug處理，相對于2.6.23?kernel這是新的一層。在老Kernel中，沒有這層unplug機制，request請求可以直接進入elevator，然后通過內核中的unplug定時器對elevator中的request進行unplug調度處理。在新kernel中，每個線程可以對自己的request進行unplug調度處理。例如，ext3文件系統的writeback線程可以主動unplug自己的request，這種application?awareness的方法可以最大限度的減少請求處理的延遲時間。

從上圖可以看出，一個IO請求首先進入每個線程域所在的unplug請求隊列。如果這個線程沒有unplug請求隊列，那么IO?request直接被送入elevator。在unplug請求隊列中等待的request會在請求unplug的過程中被送入elevator的請求隊列。每個設備可以采用不同類型的IO調度方法，因此，在elevator中的IO分類方法會有所不同。這里Elevator的類型也就是我們通常所說的Noop、deadline以及CFQ方法。最后，Elevator中的request會在一定的策略控制下被送入每個設備的request?queue。從這個結構中，我們可以看出，只要控制住了elevator的調度器，那么我們就可以控制每個設備IO的優先級，從而達到IO?QoS的目的。

通過分析，我們已經知道Request在三類request?queue中被調度處理，其主要處理時機點可以描述如下：

在一般的請求處理過程中，request被創建并且會被掛載到unplug?request?queue中，然后通過flush?request方法將request從unplug?request?queue中移至elevator?request?queue中。當一個新的BIO需要被處理時，其可以在unplug?request?queue或者elevator?request?queue中進行合并。當需要將請求發送到底層設備時，可以通過調用run_queue的方法將elevator分類處理過的request轉移至device?request?queue中，最后調用scsi_dispatch_cmd方法將請求發送到HBA。在這個過程有一些問題需要處理：底層設備可能存在故障；HBA的處理隊列是有長度限制的。因此，如何連續調度device?request?queue及重新調度request成了一個需要考慮的問題。在Linux中，如果scsi層需要重新調度一個request，可以通過blk_requeue_request接口來完成。通過該接口，可以把request重新放回到device?request?queue中。另外，在一個request結束之后的回調函數中，需要通過scsi_run_queue函數來再次調度處理device?request?queue中的剩余請求，從而可以保證批量處理device?request?queue中的請求，HBA也一直運行在最大的queue?depth深度。

Elevator層關鍵函數分析

Elv_merge

當一個IO離開塊設備層，需要發送到底層設備時，首先需要判斷該IO是否可以和正在等待處理的request進行合并。這一步主要是通過elv_merge()函數來實現的，需要注意的是，在調用elv_merge進行合并操作之前，首先需要判斷unplug?request?queue是否可以進行合并，如果不能合并，那么才調用elv_merge進行elevator?request?queue的合并操作。一旦bio找到了可以合并的request，那么，這個IO就會合并放入對應的request中，否則需要創建一個新的request，并且放入到unplug?request?queue中。

Elevator層提供的bio合并函數分析如下：

int elv_merge(struct request_queue *q, struct request **req, struct bio *bio) { struct elevator_queue *e = q->elevator; struct request *__rq; int ret; /* * Levels of merges: * nomerges: No merges at all attempted * noxmerges: Only simple one-hit cache try * merges: All merge tries attempted */ if (blk_queue_nomerges(q)) return ELEVATOR_NO_MERGE; /* * First try one-hit cache. * 嘗試和最近的request進行合并 */ if (q->last_merge) { ret = elv_try_merge(q->last_merge, bio); if (ret != ELEVATOR_NO_MERGE) { /* 可以和last_merge進行合并 */ *req = q->last_merge; return ret; } } if (blk_queue_noxmerges(q)) return ELEVATOR_NO_MERGE; /* * See if our hash lookup can find a potential backmerge. * 查找elevator中的后向合并的hash table，獲取可以合并的request */ __rq = elv_rqhash_find(q, bio->bi_sector); if (__rq && elv_rq_merge_ok(__rq, bio)) { *req = __rq; return ELEVATOR_BACK_MERGE; } /* 查找scheduler檢查是否可以進行前向合并，如果可以，那么進行前向合并 */ if (e->ops->elevator_merge_fn) return e->ops->elevator_merge_fn(q, req, bio); return ELEVATOR_NO_MERGE; }

__elv_add_request

需要將一個request加入到request?queue中時，可以調用__elv_add_request函數。通過該函數可以將request加入到elevator?request?queue或者device?request?queue中。該函數的實現如下：

void __elv_add_request(struct request_queue *q, struct request *rq, int where) { trace_block_rq_insert(q, rq); rq->q = q; if (rq->cmd_flags & REQ_SOFTBARRIER) { /* barriers are scheduling boundary, update end_sector */ if (rq->cmd_type == REQ_TYPE_FS || (rq->cmd_flags & REQ_DISCARD)) { q->end_sector = rq_end_sector(rq); q->boundary_rq = rq; } } else if (!(rq->cmd_flags & REQ_ELVPRIV) && (where == ELEVATOR_INSERT_SORT || where == ELEVATOR_INSERT_SORT_MERGE)) where = ELEVATOR_INSERT_BACK; switch (where) { case ELEVATOR_INSERT_REQUEUE: case ELEVATOR_INSERT_FRONT: /* 將request加入到device request queue的隊列前 */ rq->cmd_flags |= REQ_SOFTBARRIER; list_add(&rq->queuelist, &q->queue_head); break; case ELEVATOR_INSERT_BACK: /* 將request 加入到device request queue的隊列尾 */ rq->cmd_flags |= REQ_SOFTBARRIER; elv_drain_elevator(q); list_add_tail(&rq->queuelist, &q->queue_head); /* * We kick the queue here for the following reasons. * - The elevator might have returned NULL previously * to delay requests and returned them now. As the * queue wasn't empty before this request, ll_rw_blk * won't run the queue on return, resulting in hang. * - Usually, back inserted requests won't be merged * with anything. There's no point in delaying queue * processing. */ __blk_run_queue(q); break; case ELEVATOR_INSERT_SORT_MERGE: /* 嘗試對request進行合并操作，如果無法合并將request加入到elevator request queue中 */ /* * If we succeed in merging this request with one in the * queue already, we are done - rq has now been freed, * so no need to do anything further. */ if (elv_attempt_insert_merge(q, rq)) break; case ELEVATOR_INSERT_SORT: /* 將request加入到elevator request queue中 */ BUG_ON(rq->cmd_type != REQ_TYPE_FS && !(rq->cmd_flags & REQ_DISCARD)); rq->cmd_flags |= REQ_SORTED; q->nr_sorted++; if (rq_mergeable(rq)) { elv_rqhash_add(q, rq); if (!q->last_merge) q->last_merge = rq; } /* * Some ioscheds (cfq) run q->request_fn directly, so * rq cannot be accessed after calling * elevator_add_req_fn. */ q->elevator->ops->elevator_add_req_fn(q, rq); break; case ELEVATOR_INSERT_FLUSH: rq->cmd_flags |= REQ_SOFTBARRIER; blk_insert_flush(rq); break; default: printk(KERN_ERR "%s: bad insertion point %d\n", __func__, where); BUG(); } }

Elv_dispatch_sort

當elevator?request?queue中的request需要發送到device?request?queue中時，可以調用elv_dispatch_sort函數，通過該函數可以對request進行排序，插入到合適的位置。Elv_dispatch_sort函數的實現如下：

void elv_dispatch_sort(struct request_queue *q, struct request *rq) { sector_t boundary; struct list_head *entry; int stop_flags; if (q->last_merge == rq) q->last_merge = NULL; elv_rqhash_del(q, rq); q->nr_sorted--; boundary = q->end_sector; stop_flags = REQ_SOFTBARRIER | REQ_STARTED; list_for_each_prev(entry, &q->queue_head) { struct request *pos = list_entry_rq(entry); if ((rq->cmd_flags & REQ_DISCARD) != (pos->cmd_flags & REQ_DISCARD)) break; if (rq_data_dir(rq) != rq_data_dir(pos)) break; if (pos->cmd_flags & stop_flags) break; if (blk_rq_pos(rq) >= boundary) { if (blk_rq_pos(pos) < boundary) continue; } else { if (blk_rq_pos(pos) >= boundary) break; } if (blk_rq_pos(rq) >= blk_rq_pos(pos)) break; } list_add(&rq->queuelist, entry); }

Elevator子系統小結

Elevator子系統是實現IO調度處理的框架，功能不同的scheduler可以做為一種elevator?type加入到這個框架中來。所以，如果需要設計實現一個自定義的scheduler，那么首先必須需要了解elevator子系統。

轉載于:https://blog.51cto.com/alanwu/1391156

總結

以上是生活随笔為你收集整理的一个IO的传奇一生(8) -- elevator子系统的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： apache的“按周配置访问日志轮询”，
下一篇： Word画线条5大技巧，简单实用！