當(dāng)前位置：首頁 > 运维知识 > 数据库 >内容正文

数据库

Redis深入浅出—hash、set

發(fā)布時間：2024/3/13 数据库 29 豆豆

生活随笔收集整理的這篇文章主要介紹了 Redis深入浅出—hash、set 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

一、Hash

1.1 介紹

Redis中的字典采用哈希表作為底層實現(xiàn)，一個哈希表有多個節(jié)點，每個節(jié)點保存一個鍵值對。

在Redis源碼文件中，字典的實現(xiàn)代碼在dict.c和dict.h文件中。

Redis的數(shù)據(jù)庫就是使用字典作為底層實現(xiàn)的，通過key和value的鍵值對形式，代表了數(shù)據(jù)庫中全部數(shù)據(jù)。而且，所有對數(shù)據(jù)庫的增、刪、查、改的命令，都是建立在對字典的操作上。

同時，字典還是Redis中哈希鍵的底層實現(xiàn)，當(dāng)一個哈希鍵包含的鍵值對比較多，或者鍵值對中的元素都是比較長的字符串時，Redis就會使用字典作為哈希鍵的底層實現(xiàn)。

當(dāng) hash 移除了最后一個元素之后，該數(shù)據(jù)結(jié)構(gòu)自動被刪除，內(nèi)存被回收。

字典又稱為符號表（symbol table）、關(guān)聯(lián)數(shù)組（associative array）或映射（map），是一種用于保存鍵值對（key-value pair）的抽象數(shù)據(jù)結(jié)構(gòu)。

1.2 命令的用法

hset(key, field, value)：向名稱為key的hash中添加元素field<—>value
hget(key, field)：返回名稱為key的hash中field對應(yīng)的value
hmget(key, field1, …,field N)：返回名稱為key的hash中field i對應(yīng)的value
hmset(key, field1, value1,…,field N, value N)：向名稱為key的hash中添加元素field i<—>value i
hincrby(key, field, integer)：將名稱為key的hash中field的value增加integer
hexists(key, field)：名稱為key的hash中是否存在鍵為field的域
hdel(key, field)：刪除名稱為key的hash中鍵為field的域
hlen(key)：返回名稱為key的hash中元素個數(shù)
hkeys(key)：返回名稱為key的hash中所有鍵
hvals(key)：返回名稱為key的hash中所有鍵對應(yīng)的value
hgetall(key)：返回名稱為key的hash中所有的鍵（field）及其對應(yīng)的value

1.3 源碼分析

hash 結(jié)構(gòu)的數(shù)據(jù)主要用到的是字典結(jié)構(gòu)。

其實除了hash會使用到字典，整個 Redis 數(shù)據(jù)庫的所有 key 和 value 也組成了一個全局字典，還有帶過期時間的 key 集合也是一個字典。

//該結(jié)構(gòu)體在server.h中定義 typedef struct redisDb {dict *dict; /*整個數(shù)據(jù)庫的字典 The keyspace for this DB */dict *expires; /*有過期時間的字典 Timeout of keys with a timeout set */dict *blocking_keys; /* Keys with clients waiting for data (BLPOP)*/dict *ready_keys; /* Blocked keys that received a PUSH */dict *watched_keys; /* WATCHED keys for MULTI/EXEC CAS */int id; /* Database ID */long long avg_ttl; /* Average TTL, just for stats */ } redisDb;

zset 集合中存儲 value 和 score 值的映射關(guān)系也是通過 dict 結(jié)構(gòu)實現(xiàn)的。

//該結(jié)構(gòu)體在server.h中定義 typedef struct zset {dict *dict;zskiplist *zsl; } zset;

Redis定義了dictEntry(哈希表結(jié)點)，dictType(字典類型函數(shù))，dictht(哈希表)和dict(字典)四個結(jié)構(gòu)體來實現(xiàn)字典結(jié)構(gòu)，下面來分別介紹這四個結(jié)構(gòu)體。

//哈希表的table指向的數(shù)組存放這dictEntry類型的地址。定義在dict.h/dictEntryt中 typedef struct dictEntry {//字典的節(jié)點void *key;union {//使用的聯(lián)合體void *val;uint64_t u64;//這兩個參數(shù)很有用int64_t s64;} v;struct dictEntry *next;//指向下一個hash節(jié)點，用來解決hash鍵沖突（collision） } dictEntry;//dictType類型保存著操作字典不同類型key和value的方法的指針 typedef struct dictType {unsigned int (*hashFunction)(const void *key); //計算hash值的函數(shù)void *(*keyDup)(void *privdata, const void *key); //復(fù)制key的函數(shù)void *(*valDup)(void *privdata, const void *obj); //復(fù)制value的函數(shù)int (*keyCompare)(void *privdata, const void *key1, const void *key2); //比較key的函數(shù)void (*keyDestructor)(void *privdata, void *key); //銷毀key的析構(gòu)函數(shù)void (*valDestructor)(void *privdata, void *obj); //銷毀val的析構(gòu)函數(shù) } dictType;//redis中哈希表定義dict.h/dictht typedef struct dictht { //哈希表dictEntry **table; //存放一個數(shù)組的地址，數(shù)組存放著哈希表節(jié)點dictEntry的地址。unsigned long size; //哈希表table的大小，初始化大小為4unsigned long sizemask; //用于將哈希值映射到table的位置索引。它的值總是等于(size-1)。unsigned long used; //記錄哈希表已有的節(jié)點（鍵值對）數(shù)量。 } dictht;//字典結(jié)構(gòu)定義在dict.h/dict typedef struct dict {dictType *type; //指向dictType結(jié)構(gòu)，dictType結(jié)構(gòu)中包含自定義的函數(shù)，這些函數(shù)使得key和value能夠存儲任何類型的數(shù)據(jù)。void *privdata; //私有數(shù)據(jù)，保存著dictType結(jié)構(gòu)中函數(shù)的參數(shù)。dictht ht[2]; //兩張哈希表。long rehashidx; //rehash的標(biāo)記，rehashidx==-1，表示沒在進行rehashint iterators; //正在迭代的迭代器數(shù)量 } dict;

從源碼中可以看出，dict 結(jié)構(gòu)內(nèi)部包含兩個 hashtable，通常情況下只有一個 hashtable 是有值的。

但是在 dict 擴容縮容時，需要分配新的 hashtable，然后進行漸進式搬遷，這時候兩個 hashtable 存儲的分別是舊的 hashtable 和新的 hashtable。待搬遷結(jié)束后，舊的 hashtable 被刪除，新的 hashtable 取而代之。

字典數(shù)據(jù)結(jié)構(gòu)的精華就落在了dictht所表示的 hashtable 結(jié)構(gòu)上了。hashtable 的結(jié)構(gòu)和 Java 的 HashMap 幾乎是一樣的，都是通過分桶的方式解決 hash 沖突。第一維是數(shù)組，第二維是鏈表。數(shù)組中存儲的是第二維鏈表的第一個元素的指針。

1.4 哈希算法

當(dāng)往字典中添加鍵值對時，需要根據(jù)鍵的大小計算出哈希值和索引值，然后再根據(jù)索引值，將包含新鍵值對的哈希表節(jié)點放到哈希表數(shù)組的指定索引上面。

// 計算哈希值 h = dictHashKey(d, key); // 調(diào)用哈希算法計算哈希值 #define dictHashKey(d, key) (d)->type->hashFunction(key)

Redis提供了三種計算哈希值的函數(shù)，其分別是：

Thomas Wang’s 32 bit Mix函數(shù)，對一個整數(shù)進行哈希，該方法在dictIntHashFunction中實現(xiàn)

unsigned int dictIntHashFunction(unsigned int key) //用于計算int整型哈希值的哈希函數(shù) {key += ~(key << 15);key ^= (key >> 10);key += (key << 3);key ^= (key >> 6);key += ~(key << 11);key ^= (key >> 16);return key; }

使用MurmurHash2哈希算法對字符串進行哈希，該方法在dictGenHashFunction中實現(xiàn)

unsigned int dictGenHashFunction(const void *key, int len) { //用于計算字符串的哈希值的哈希函數(shù)/* 'm' and 'r' are mixing constants generated offline.They're not really 'magic', they just happen to work well. *///m和r這兩個值用于計算哈希值，只是因為效果好。uint32_t seed = dict_hash_function_seed;const uint32_t m = 0x5bd1e995;const int r = 24;/* Initialize the hash to a 'random' value */uint32_t h = seed ^ len; //初始化/* Mix 4 bytes at a time into the hash */const unsigned char *data = (const unsigned char *)key;//將字符串key每四個一組看成uint32_t類型，進行運算的到hwhile(len >= 4) {uint32_t k = *(uint32_t*)data;k *= m;k ^= k >> r;k *= m;h *= m;h ^= k;data += 4;len -= 4;}/* Handle the last few bytes of the input array */switch(len) {case 3: h ^= data[2] << 16;case 2: h ^= data[1] << 8;case 1: h ^= data[0]; h *= m;};/* Do a few final mixes of the hash to ensure the last few* bytes are well-incorporated. */h ^= h >> 13;h *= m;h ^= h >> 15;return (unsigned int)h; }

使用基于djb哈希的一種簡單的哈希算法，該方法在dictGenCaseHashFunction中實現(xiàn)。

unsigned int dictGenCaseHashFunction(const unsigned char *buf, int len) { //用于計算字符串的哈希值的哈希函數(shù)unsigned int hash = (unsigned int)dict_hash_function_seed;while (len--)hash = ((hash << 5) + hash) + (tolower(*buf++)); /* hash * 33 + c */return hash; }

計算出哈希值之后，需要計算其索引。Redis采用下列算式來計算索引值。

// 舉例：h為5，哈希表的大小初始化為4，sizemask則為size-1， // 于是h&sizemask = 2， // 所以該鍵值對就存放在索引為2的位置 idx = h & d->ht[table].sizemask;

1.4 rehash

當(dāng)哈希表的大小不能滿足需求，就可能會有兩個或者以上數(shù)量的鍵被分配到了哈希表數(shù)組上的同一個索引上，于是就發(fā)生沖突（collision）.

在Redis中解決沖突的辦法是鏈接法（separate chaining）。但是需要盡可能避免沖突，希望哈希表的負載因子（load factor），維持在一個合理的范圍之內(nèi)，就需要對哈希表進行擴展或收縮。

擴容：當(dāng) hash 表中元素的個數(shù)等于第一維數(shù)組的長度時，就會開始擴容，擴容的新數(shù)組是原數(shù)組大小的 2 倍。不過如果 Redis 正在做 bgsave，為了減少內(nèi)存頁的過多分離 (Copy On Write)，Redis 盡量不去擴容 (dict_can_resize)，但是如果 hash 表已經(jīng)非常滿了，元素的個數(shù)已經(jīng)達到了第一維數(shù)組長度的 5 倍 (dict_force_resize_ratio)，說明 hash 表已經(jīng)過于擁擠了，這個時候就會強制擴容。

縮容：當(dāng) hash 表因為元素的逐漸刪除變得越來越稀疏時，，Redis 會對 hash 表進行縮容來減少 hash 表的第一維數(shù)組空間占用。縮容的條件是元素個數(shù)低于數(shù)組長度的 10%。縮容不會考慮 Redis 是否正在做 bgsave。

大字典的擴容是比較耗時間的，需要重新申請新的數(shù)組，然后將舊字典所有鏈表中的元素重新掛接到新的數(shù)組下面，這是一個O(n)級別的操作.

擴容操作具體源碼：

static int _dictExpandIfNeeded(dict *d) //擴展d字典，并初始化 {if (dictIsRehashing(d)) return DICT_OK; //正在進行rehash，直接返回if (d->ht[0].size == 0) return dictExpand(d, DICT_HT_INITIAL_SIZE); //如果字典（的 0 號哈希表）為空，那么創(chuàng)建并返回初始化大小的 0 號哈希表//1. 字典的總元素個數(shù)和字典的數(shù)組大小之間的比率接近 1：1//2. 能夠擴展的標(biāo)志為真//3. 已使用節(jié)點數(shù)和字典大小之間的比率超過 dict_force_resize_ratioif (d->ht[0].used >= d->ht[0].size && (dict_can_resize || d->ht[0].used/d->ht[0].size > dict_force_resize_ratio)) {return dictExpand(d, d->ht[0].used*2); //擴展為節(jié)點個數(shù)的2倍}return DICT_OK; }

收縮操作具體源碼：

int dictResize(dict *d) //縮小字典d {int minimal;//如果dict_can_resize被設(shè)置成0，表示不能進行rehash，或正在進行rehash，返回出錯標(biāo)志DICT_ERRif (!dict_can_resize || dictIsRehashing(d)) return DICT_ERR;minimal = d->ht[0].used; //獲得已經(jīng)有的節(jié)點數(shù)量作為最小限度minimalif (minimal < DICT_HT_INITIAL_SIZE)//但是minimal不能小于最低值DICT_HT_INITIAL_SIZE（4）minimal = DICT_HT_INITIAL_SIZE;return dictExpand(d, minimal); //用minimal調(diào)整字典d的大小 }

擴展和收縮操作都調(diào)用了dictExpand()函數(shù)，該函數(shù)通過計算傳入的第二個大小參數(shù)進行計算，算出一個最接近2n的realsize，然后進行擴展或收縮，dictExpand()函數(shù)源碼如下：

int dictExpand(dict *d, unsigned long size) //根據(jù)size調(diào)整或創(chuàng)建字典d的哈希表 {dictht n; unsigned long realsize = _dictNextPower(size); //獲得一個最接近2^n的realsizeif (dictIsRehashing(d) || d->ht[0].used > size) //正在rehash或size不夠大返回出錯標(biāo)志return DICT_ERR;if (realsize == d->ht[0].size) return DICT_ERR; //如果新的realsize和原本的size一樣則返回出錯標(biāo)志/* Allocate the new hash table and initialize all pointers to NULL *///初始化新的哈希表的成員n.size = realsize;n.sizemask = realsize-1;n.table = zcalloc(realsize*sizeof(dictEntry*));n.used = 0;/* Is this the first initialization? If so it's not really a rehashing* we just set the first hash table so that it can accept keys. */if (d->ht[0].table == NULL) { //如果ht[0]哈希表為空，則將新的哈希表n設(shè)置為ht[0]d->ht[0] = n;return DICT_OK;}d->ht[1] = n; //如果ht[0]非空，則需要rehashd->rehashidx = 0; //設(shè)置rehash標(biāo)志位為0，開始漸進式rehash（incremental rehashing）return DICT_OK; }

收縮或者擴展哈希表需要將ht[0]表中的所有鍵全部rehash到ht[1]中，但是rehash操作不是一次性、集中式完成的，而是分多次，漸進式，斷續(xù)進行的，這樣才不會對服務(wù)器性能造成影響。

1.5 漸進式rehash(incremental rehashing)

漸進式rehash的關(guān)鍵：

字典結(jié)構(gòu)dict中的一個成員rehashidx，當(dāng)rehashidx為-1時表示不進行rehash，當(dāng)rehashidx值為0時，表示開始進行rehash。

在rehash期間，每次對字典的添加、刪除、查找、或更新操作時，都會判斷是否正在進行rehash操作，如果是，則順帶進行單步rehash，并將rehashidx+1。

當(dāng)rehash時進行完成時，將rehashidx置為-1，表示完成rehash。

dictEntry *dictAddRaw(dict *d, void *key, dictEntry **existing) { long index; dictEntry *entry; dictht *ht; // 這里進行小步搬遷 if (dictIsRehashing(d)) _dictRehashStep(d); if ((index = _dictKeyIndex(d, key, dictHashKey(d,key), existing)) == -1) return NULL; // 如果字典處于搬遷過程中，要將新的元素掛接到新的數(shù)組下面 ht = dictIsRehashing(d) ? &d->ht[1] : &d->ht[0]; entry = zmalloc(sizeof(*entry)); entry->next = ht->table[index]; ht->table[index] = entry; ht->used++; dictSetKey(d, entry, key); return entry; }

搬遷操作埋伏在當(dāng)前字典的后續(xù)指令中(來自客戶端的hset/hdel指令等)，但是有可能客戶端閑下來了，沒有了后續(xù)指令來觸發(fā)這個搬遷，那么Redis還會在定時任務(wù)中對字典進行主動搬遷。

// 服務(wù)器定時任務(wù) void databaseCron() {... if (server.activerehashing) { for (j = 0; j < dbs_per_call; j++) {int work_done = incrementallyRehash(rehash_db); if (work_done) {break; } else {rehash_db++; rehash_db %= server.dbnum; } }} } static void _dictRehashStep(dict *d) { //單步rehashif (d->iterators == 0) dictRehash(d,1); //當(dāng)?shù)鲾?shù)量不為0，才能進行1步rehash }int dictRehash(dict *d, int n) { //n步進行rehashint empty_visits = n*10; /* Max number of empty buckets to visit. */if (!dictIsRehashing(d)) return 0; //只有rehashidx不等于-1時，才表示正在進行rehash，否則返回0while(n-- && d->ht[0].used != 0) { //分n步，而且ht[0]上還有沒有移動的節(jié)點dictEntry *de, *nextde;/* Note that rehashidx can't overflow as we are sure there are more* elements because ht[0].used != 0 *///確保rehashidx沒有越界，因為rehashidx是從-1開始，0表示已經(jīng)移動1個節(jié)點，它總是小于hash表的size的assert(d->ht[0].size > (unsigned long)d->rehashidx);//第一個循環(huán)用來更新 rehashidx 的值，因為有些桶為空，所以 rehashidx并非每次都比原來前進一個位置，而是有可能前進幾個位置，但最多不超過 10。//將rehashidx移動到ht[0]有節(jié)點的下標(biāo)，也就是table[d->rehashidx]非空while(d->ht[0].table[d->rehashidx] == NULL) {d->rehashidx++;if (--empty_visits == 0) return 1;}de = d->ht[0].table[d->rehashidx]; //ht[0]下標(biāo)為rehashidx有節(jié)點，得到該節(jié)點的地址/* Move all the keys in this bucket from the old to the new hash HT *///第二個循環(huán)用來將ht[0]表中每次找到的非空桶中的鏈表（或者就是單個節(jié)點）拷貝到ht[1]中while(de) {unsigned int h;nextde = de->next; //備份下一個節(jié)點的地址/* Get the index in the new hash table */h = dictHashKey(d, de->key) & d->ht[1].sizemask; //獲得計算哈希值并得到哈希表中的下標(biāo)h//將該節(jié)點插入到下標(biāo)為h的位置de->next = d->ht[1].table[h];d->ht[1].table[h] = de;//更新兩個表節(jié)點數(shù)目計數(shù)器d->ht[0].used--;d->ht[1].used++;//將de指向以一個處理的節(jié)點de = nextde;}d->ht[0].table[d->rehashidx] = NULL; //遷移過后將該下標(biāo)的指針置為空d->rehashidx++; //更新rehashidx}/* Check if we already rehashed the whole table... */if (d->ht[0].used == 0) { //ht[0]上已經(jīng)沒有節(jié)點了，說明已經(jīng)遷移完成zfree(d->ht[0].table); //釋放hash表內(nèi)存d->ht[0] = d->ht[1]; //將遷移過的1號哈希表設(shè)置為0號哈希表_dictReset(&d->ht[1]); //重置ht[1]哈希表d->rehashidx = -1; //rehash標(biāo)志關(guān)閉return 0; //表示前已完成}/* More to rehash... */return 1; //表示還有節(jié)點等待遷移 }

1.6 hash總結(jié)

結(jié)構(gòu)：一個字典對象代表了一個dict的實例，一個dict中包含了兩個dictht組成的哈希表素組和一個指向dicType的指針。定義兩個dictht的作用主要是為了擴容過程中，能夠保證讀取數(shù)據(jù)的一致性，dicType中定于了一系列的函數(shù)指針。每個dictht中，存在一個dicEntry變量，可以看做是字典數(shù)組，也就是俗稱中的bucket桶，存放數(shù)據(jù)的主要容器。每個dictEntry中，除了包括key和value的鍵值對以外，還包括指向下一個dictEntry對象的指針。

哈希算法：redis主要提供了三種Hash算法

rehash：rehash發(fā)生在擴容或縮容階段，擴容是發(fā)生在元素的個數(shù)等于哈希表數(shù)組的長度時，進行2倍的擴容；縮容發(fā)生在當(dāng)元素個數(shù)為數(shù)組長度的10%時，進行縮容。

漸進式rehash：

二、set

2.1 介紹

Redis 的set集合類似于 Java 語言里面的 HashSet，它內(nèi)部的鍵值對是無序的唯一的。它的內(nèi)部實現(xiàn)相當(dāng)于一個特殊的字典，字典中所有的 value 都是一個值NULL。

當(dāng)集合中最后一個元素移除之后，數(shù)據(jù)結(jié)構(gòu)自動刪除，內(nèi)存被回收。

set結(jié)構(gòu)是字典的衍生結(jié)構(gòu)，而且它具有去重的功能，能夠保證每個key只出現(xiàn)一次。

2.2 使用

sadd(key, member)：向名稱為key的set中添加元素member
srem(key, member) ：刪除名稱為key的set中的元素member
spop(key) ：隨機返回并刪除名稱為key的set中一個元素
smove(srckey, dstkey, member) ：將member元素從名稱為srckey的集合移到名稱為dstkey的集合
scard(key) ：返回名稱為key的set的基數(shù)
sismember(key, member) ：測試member是否是名稱為key的set的元素
sinter(key1, key2,…key N) ：求交集
sinterstore(dstkey, key1, key2,…key N) ：求交集并將交集保存到dstkey的集合
sunion(key1, key2,…key N) ：求并集
sunionstore(dstkey, key1, key2,…key N) ：求并集并將并集保存到dstkey的集合
sdiff(key1, key2,…key N) ：求差集
sdiffstore(dstkey, key1, key2,…key N) ：求差集并將差集保存到dstkey的集合
smembers(key) ：返回名稱為key的set的所有元素
srandmember(key) ：隨機返回名稱為key的set的一個元素

hash和set的基本介紹就到此就告一段落了。

總結(jié)

以上是生活随笔為你收集整理的Redis深入浅出—hash、set的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：转换接头PL8000V-B 0-70MP
下一篇：契约介绍