Redis:缩容、扩容、渐进式rehash
目錄
1、縮容 擴(kuò)容
2、漸進(jìn)式rehash
1、縮容 擴(kuò)容
隨著redis的操作的不斷執(zhí)行,哈希表保存的鍵值會(huì)逐漸地增多或者減少,為了讓哈希表的負(fù)載因子(ratio)維持在一個(gè)合理的范圍之內(nèi),當(dāng)哈希表保存的鍵值對(duì)數(shù)量太多或者太少時(shí),程序需要對(duì)哈希表的大小進(jìn)行相應(yīng)的擴(kuò)展或者收縮。
ratio?= ht[0].used / ht[0].size
比如,hash表的size為4,如果已經(jīng)插入了4個(gè)k-v的話,則ratio為1。
redis的默認(rèn)負(fù)載因子為1,負(fù)載因子最大可以達(dá)到5(持久化的時(shí)候,需要fork操作,這個(gè)時(shí)候不會(huì)分配內(nèi)存,所以redis源碼中有判斷,如果大于數(shù)據(jù)長(zhǎng)度的5倍(5*used),則馬上擴(kuò)容)。
擴(kuò)展和收縮哈希表的工作可以執(zhí)行rehash(重新散列)操作來(lái)完成,Redis對(duì)字典的哈希表執(zhí)行rehash的策略如下:
1、如果ratio小于0.1,則會(huì)對(duì)hash表進(jìn)行收縮操作
#define HASHTABLE_MIN_FILL 10 /* Minimal hash table fill 10% *//* Expand or create the hash table,* when malloc_failed is non-NULL, it'll avoid panic if malloc fails (in which case it'll be set to 1).* Returns DICT_OK if expand was performed, and DICT_ERR if skipped. */ int _dictExpand(dict *d, unsigned long size, int* malloc_failed) {if (malloc_failed) *malloc_failed = 0;/* the size is invalid if it is smaller than the number of* elements already inside the hash table */if (dictIsRehashing(d) || d->ht[0].used > size)return DICT_ERR;dictht n; /* the new hash table */unsigned long realsize = _dictNextPower(size);/* Rehashing to the same table size is not useful. */if (realsize == d->ht[0].size) return DICT_ERR;/* Allocate the new hash table and initialize all pointers to NULL */n.size = realsize;n.sizemask = realsize-1;if (malloc_failed) {n.table = ztrycalloc(realsize*sizeof(dictEntry*));*malloc_failed = n.table == NULL;if (*malloc_failed)return DICT_ERR;} elsen.table = zcalloc(realsize*sizeof(dictEntry*));n.used = 0;/* Is this the first initialization? If so it's not really a rehashing* we just set the first hash table so that it can accept keys. */if (d->ht[0].table == NULL) {d->ht[0] = n;return DICT_OK;}/* Prepare a second hash table for incremental rehashing */d->ht[1] = n;d->rehashidx = 0;return DICT_OK; }/* return DICT_ERR if expand was not performed */ int dictExpand(dict *d, unsigned long size) {return _dictExpand(d, size, NULL); }/* Resize the table to the minimal size that contains all the elements,* but with the invariant of a USED/BUCKETS ratio near to <= 1 */ int dictResize(dict *d) {unsigned long minimal;if (!dict_can_resize || dictIsRehashing(d)) return DICT_ERR;minimal = d->ht[0].used;if (minimal < DICT_HT_INITIAL_SIZE)minimal = DICT_HT_INITIAL_SIZE;return dictExpand(d, minimal); }int htNeedsResize(dict *dict) {long long size, used;size = dictSlots(dict);used = dictSize(dict);return (size > DICT_HT_INITIAL_SIZE &&(used*100/size < HASHTABLE_MIN_FILL)); }/* 當(dāng)HashTable的使用率為10%的時(shí)候,開始縮容,進(jìn)行rehash操作。 */ /* If the percentage of used slots in the HT reaches HASHTABLE_MIN_FILL* we resize the hash table to save memory */ void tryResizeHashTables(int dbid) {if (htNeedsResize(server.db[dbid].dict))dictResize(server.db[dbid].dict);if (htNeedsResize(server.db[dbid].expires))dictResize(server.db[dbid].expires); }2、服務(wù)器目前沒(méi)有在執(zhí)行BGSAVE命令或者BGREWRITEAOF命令,并且哈希表的負(fù)載因子大于等于1,則哈希表擴(kuò)容,擴(kuò)容大小為當(dāng)前ht[0].used*2
3、服務(wù)器目前正在執(zhí)行BGSAVE命令或者BGREWRITEAOF命令,并且哈希表的負(fù)載因子大于等于5,則擴(kuò)容hash表,并且擴(kuò)容為當(dāng)前ht[0].used*2
上面的說(shuō)法稍微有點(diǎn)偏頗,實(shí)際上雖然傳進(jìn)去的參數(shù)是這樣,比如你的ht[0].used為5,傳進(jìn)去就是10,但是擴(kuò)容會(huì)是2^4 = 16,即實(shí)際擴(kuò)容量為2^n。
/* Our hash table capability is a power of two */ static unsigned long _dictNextPower(unsigned long size) {unsigned long i = DICT_HT_INITIAL_SIZE;if (size >= LONG_MAX) return LONG_MAX + 1LU;while(1) {if (i >= size)return i;i *= 2; //體現(xiàn)了擴(kuò)容的大小不是傳入的size(也就是ht[0].used*2),而是距離這個(gè)size最近的2^n。} } #define dictIsRehashing(d) ((d)->rehashidx != -1)/* Because we may need to allocate huge memory chunk at once when dict* expands, we will check this allocation is allowed or not if the dict* type has expandAllowed member function. */ static int dictTypeExpandAllowed(dict *d) {if (d->type->expandAllowed == NULL) return 1;return d->type->expandAllowed(_dictNextPower(d->ht[0].used + 1) * sizeof(dictEntry*),(double)d->ht[0].used / d->ht[0].size); }/* Expand the hash table if needed */ static int _dictExpandIfNeeded(dict *d) {/* Incremental rehashing already in progress. Return. */if (dictIsRehashing(d)) return DICT_OK;/* If the hash table is empty expand it to the initial size. */if (d->ht[0].size == 0) return dictExpand(d, DICT_HT_INITIAL_SIZE);/* If we reached the 1:1 ratio, and we are allowed to resize the hash* table (global setting) or we should avoid it but the ratio between* elements/buckets is over the "safe" threshold, we resize doubling* the number of buckets. */if (d->ht[0].used >= d->ht[0].size &&(dict_can_resize ||d->ht[0].used/d->ht[0].size > dict_force_resize_ratio) &&dictTypeExpandAllowed(d)){return dictExpand(d, d->ht[0].used + 1);}return DICT_OK; }擴(kuò)容的步驟如下:
1、為字典ht[1]哈希表分配合適的空間;
2、將ht[0]中所有的鍵值對(duì)rehash到ht[1]:rehash 指的是重新計(jì)算鍵的哈希值和索引值, 然后將鍵值對(duì) 放置到 ht[1] 哈希表的指定位置上;
3、當(dāng) ht[0] 包含的所有鍵值對(duì)都遷移到了 ht[1] 之后 (ht[0] 變?yōu)榭毡?#xff09;, 釋放 ht[0] , 將 ht[1] 設(shè)置 為 ht[0] , 并在 ht[1] 新創(chuàng)建?個(gè)空?哈希表, 為下?次 rehash 做準(zhǔn)備。
?
2、漸進(jìn)式rehash
擴(kuò)展或收縮哈希表需要將ht[0]里面的所有鍵值對(duì)rehash到ht[1]里面,但是,這個(gè)rehash動(dòng)作并不是一次性、集中式地完成的,而是分多次、漸進(jìn)式地完成的。
這樣做的原因在于,如果ht[0]里保存著四個(gè)鍵值對(duì),那么服務(wù)器可以在瞬間就將這些鍵值對(duì)全部rehash到ht[1];但是,如果hash表里保存的鍵值對(duì)不是四個(gè),而是四百萬(wàn)、四千萬(wàn)甚至四億個(gè)鍵值對(duì),那么要一次性將這些鍵值對(duì)全部rehash到ht[1]的話,龐大的計(jì)算量可能會(huì)導(dǎo)致服務(wù)器在一段時(shí)間內(nèi)停止服務(wù)。
因此,為了避免rehash對(duì)服務(wù)器性能造成影響,服務(wù)器不是一次性將對(duì)ht[0]里面所有鍵值對(duì)全部rehash到ht[1],而是分多次、漸進(jìn)式地將ht[0]里面的鍵值對(duì)慢慢地rehash到ht[1]。
以下是哈希漸進(jìn)式rehash的詳細(xì)步驟:
1、為ht[1]分配空間,讓字典同時(shí)持有ht[0]和ht[1]兩個(gè)哈希表。
2、在字典中維持一個(gè)索引計(jì)數(shù)器變量rehashidx,并將它的指設(shè)置為0,表示rehash工作正式開始。
3、在rehash進(jìn)行期間,每次對(duì)字典執(zhí)行添加、刪除、查找或者更新操作時(shí),程序除了執(zhí)行指定的操作以外,還會(huì)順帶將ht[0]哈希表在rehashidx索引上的所有鍵值對(duì)rehash到ht[1],當(dāng)rehash工作完成之后,程序?qū)ehashidx屬性的值一。
4、隨著字典操作的不斷執(zhí)行,最終在某個(gè)時(shí)間點(diǎn),ht[0]的所有鍵值對(duì)都會(huì)被rehash至ht[1],這時(shí)程序?qū)ehashidx屬性設(shè)置為-1,表示rehash已經(jīng)操作完成
漸進(jìn)式rehash的好處在于它采取分而治之的方式,將rehash鍵值對(duì)所需的計(jì)算工作均攤到對(duì)字典的每個(gè)crud操作上,甚至是后臺(tái)啟動(dòng)一個(gè)定時(shí)器,每次時(shí)間循環(huán)時(shí)只工作一毫秒,從而避免了集中式rehash而帶來(lái)的龐大計(jì)算量。
我們以dictAddRaw為例:
#define dictIsRehashing(d) ((d)->rehashidx != -1)/* Low level add or find:* This function adds the entry but instead of setting a value returns the* dictEntry structure to the user, that will make sure to fill the value* field as they wish.** This function is also directly exposed to the user API to be called* mainly in order to store non-pointers inside the hash value, example:** entry = dictAddRaw(dict,mykey,NULL);* if (entry != NULL) dictSetSignedIntegerVal(entry,1000);** Return values:** If key already exists NULL is returned, and "*existing" is populated* with the existing entry if existing is not NULL.** If key was added, the hash entry is returned to be manipulated by the caller.*/ dictEntry *dictAddRaw(dict *d, void *key, dictEntry **existing) {long index;dictEntry *entry;dictht *ht;if (dictIsRehashing(d)) _dictRehashStep(d); //如果rehash操作未完成,進(jìn)行一次rehash操作/* Get the index of the new element, or -1 if* the element already exists. */if ((index = _dictKeyIndex(d, key, dictHashKey(d,key), existing)) == -1)return NULL;/* Allocate the memory and store the new entry.* Insert the element in top, with the assumption that in a database* system it is more likely that recently added entries are accessed* more frequently. */ht = dictIsRehashing(d) ? &d->ht[1] : &d->ht[0];entry = zmalloc(sizeof(*entry)); //為ht[1]開辟新的空間entry->next = ht->table[index];ht->table[index] = entry;ht->used++;/* Set the hash entry fields. */dictSetKey(d, entry, key);return entry; }可以看到在上面添加過(guò)程中,會(huì)先判斷一下還有沒(méi)有有需要執(zhí)行的rehash操作,如果有,那就順帶進(jìn)行rehash操作,那每次rehash都搬運(yùn)多少個(gè)元素呢?
/* This function performs just a step of rehashing, and only if hashing has* not been paused for our hash table. When we have iterators in the* middle of a rehashing we can't mess with the two hash tables otherwise* some element can be missed or duplicated.** This function is called by common lookup or update operations in the* dictionary so that the hash table automatically migrates from H1 to H2* while it is actively used. */ static void _dictRehashStep(dict *d) {if (d->pauserehash == 0) dictRehash(d,1); }/* Performs N steps of incremental rehashing. Returns 1 if there are still* keys to move from the old to the new hash table, otherwise 0 is returned.** Note that a rehashing step consists in moving a bucket (that may have more* than one key as we use chaining) from the old to the new hash table, however* since part of the hash table may be composed of empty spaces, it is not* guaranteed that this function will rehash even a single bucket, since it* will visit at max N*10 empty buckets in total, otherwise the amount of* work it does would be unbound and the function may block for a long time. */ int dictRehash(dict *d, int n) {int empty_visits = n*10; /* Max number of empty buckets to visit. */if (!dictIsRehashing(d)) return 0;while(n-- && d->ht[0].used != 0) {dictEntry *de, *nextde;/* Note that rehashidx can't overflow as we are sure there are more* elements because ht[0].used != 0 */assert(d->ht[0].size > (unsigned long)d->rehashidx);while(d->ht[0].table[d->rehashidx] == NULL) {d->rehashidx++;if (--empty_visits == 0) return 1;}de = d->ht[0].table[d->rehashidx];/* Move all the keys in this bucket from the old to the new hash HT */while(de) {uint64_t h;nextde = de->next;/* Get the index in the new hash table */h = dictHashKey(d, de->key) & d->ht[1].sizemask;de->next = d->ht[1].table[h];d->ht[1].table[h] = de;d->ht[0].used--;d->ht[1].used++;de = nextde;}d->ht[0].table[d->rehashidx] = NULL;d->rehashidx++;}/* Check if we already rehashed the whole table... */if (d->ht[0].used == 0) {zfree(d->ht[0].table); //當(dāng)ht[0]的元素全部rehash到ht[1]的時(shí)候釋放ht[0]的空間d->ht[0] = d->ht[1]; //將原來(lái)的ht[1]設(shè)置為ht[0]_dictReset(&d->ht[1]); //ht[1] 設(shè)置為NULLd->rehashidx = -1;return 0;}/* More to rehash... */return 1; }根據(jù)代碼顯然我們可以看出每次最多rehash10個(gè)元素。
總結(jié)
以上是生活随笔為你收集整理的Redis:缩容、扩容、渐进式rehash的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: windows 彻底删除360文件 36
- 下一篇: 实时环境映射贴图技术(Real-time