當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

哈希值哈希码_什么是哈希？哈希码如何工作-带有示例

發(fā)布時(shí)間：2024/1/1 编程问答 33 豆豆

生活随笔收集整理的這篇文章主要介紹了哈希值哈希码_什么是哈希？哈希码如何工作-带有示例小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

哈希值哈希碼

哈希簡(jiǎn)介 (Introduction to hashing)

Hashing is designed to solve the problem of needing to efficiently find or store an item in a collection.

哈希設(shè)計(jì)用于解決需要在集合中有效查找或存儲(chǔ)項(xiàng)目的問(wèn)題。

For example, if we have a list of 10,000 words of English and we want to check if a given word is in the list, it would be inefficient to successively compare the word with all 10,000 items until we find a match. Even if the list of words are lexicographically sorted, like in a dictionary, you will still need some time to find the word you are looking for.

例如，如果我們有一個(gè)10,000個(gè)英語(yǔ)單詞的列表，并且想要檢查列表中是否有給定的單詞，那么將這個(gè)單詞與所有10,000個(gè)項(xiàng)目相繼進(jìn)行比較直到找到匹配項(xiàng)都是無(wú)效的。即使單詞列表按字典順序排序，就像在字典中一樣，您仍然需要一些時(shí)間才能找到所需的單詞。

Hashing is a technique to make things more efficient by effectively narrowing down the search at the outset.

散列是一種通過(guò)一開始就有效縮小搜索范圍來(lái)提高效率的技術(shù)。

什么是哈希？ (What is hashing?)

Hashing means using some function or algorithm to map object data to some representative integer value.

散列表示使用某種函數(shù)或算法將對(duì)象數(shù)據(jù)映射到某個(gè)代表整數(shù)值。

This so-called hash code (or simply hash) can then be used as a way to narrow down our search when looking for the item in the map.

然后，可以使用這種所謂的哈希碼(或簡(jiǎn)稱為哈希)來(lái)縮小我們?cè)诘貓D中查找項(xiàng)目時(shí)的搜索范圍。

Generally, these hash codes are used to generate an index, at which the value is stored.

通常，這些哈希碼用于生成索引，在該索引中存儲(chǔ)值。

哈希如何工作 (How hashing works)

In hash tables, you store data in forms of key and value pairs. The key, which is used to identify the data, is given as an input to the hashing function. The hash code, which is an integer, is then mapped to the fixed size we have.

在哈希表中，您以鍵和值對(duì)的形式存儲(chǔ)數(shù)據(jù)。用來(lái)標(biāo)識(shí)數(shù)據(jù)的密鑰作為哈希函數(shù)的輸入提供。然后將哈希碼(是整數(shù))映射到我們擁有的固定大小。

Hash tables have to support 3 functions.

哈希表必須支持3個(gè)功能。

insert (key, value)
插入(鍵，值)
get (key)
獲取(密鑰)
delete (key)
刪除(鍵)

Purely as an example to help us grasp the concept, let us suppose that we want to map a list of string keys to string values (for example, map a list of countries to their capital cities).

純粹以幫助我們理解該概念為例，讓我們假設(shè)我們想將字符串鍵列表映射到字符串值(例如，將國(guó)家列表映射到其首都)。

So let’s say we want to store the data in Table in the map.

假設(shè)我們要將數(shù)據(jù)存儲(chǔ)在地圖的Table中。

And let us suppose that our hash function is to simply take the length of the string.

讓我們假設(shè)我們的哈希函數(shù)只是獲取字符串的長(zhǎng)度。

For simplicity, we will have two arrays: one for our keys and one for the values.So to put an item in the hash table, we compute its hash code (in this case, simply count the number of characters), then put the key and value in the arrays at the corresponding index.

為了簡(jiǎn)單起見，我們將有兩個(gè)數(shù)組：一個(gè)用于我們的鍵，另一個(gè)用于值。因此，將一個(gè)項(xiàng)放入哈希表中，我們計(jì)算其哈希碼(在這種情況下，只需計(jì)算字符數(shù))，然后將鍵和值在數(shù)組中的對(duì)應(yīng)索引處。

For example, Cuba has a hash code (length) of 4. So we store Cuba in the 4th position in the keys array, and Havana in the 4th index of the values array etc. And we end up with the following:

例如，古巴的哈希碼(長(zhǎng)度)為4。因此，我們將古巴存儲(chǔ)在鍵數(shù)組的第4個(gè)位置，將哈瓦那存儲(chǔ)在values數(shù)組的第4個(gè)索引中，等等。最后得到以下內(nèi)容：

Now, in this specific example things work quite well. Our array needs to be big enough to accommodate the longest string, but in this case that’s only 11 slots.We do waste a bit of space because, for example, there are no 1-letter keys in our data, nor keys between 8 and 10 letters.

現(xiàn)在，在此特定示例中，一切工作正常。我們的數(shù)組必須足夠大以容納最長(zhǎng)的字符串，但是在這種情況下只有11個(gè)插槽。我們確實(shí)浪費(fèi)了一些空間，因?yàn)槔缥覀兊臄?shù)據(jù)中沒(méi)有1個(gè)字母的鍵，也沒(méi)有8到8之間的鍵。 10個(gè)字母。

But in this case, the wasted space isn’t so bad either. Taking the length of a string is nice and fast, and so is the process of finding the value associated with a given key (certainly faster than doing up to five string comparisons).

但是在這種情況下，浪費(fèi)的空間也不是那么糟糕。取得字符串的長(zhǎng)度既好又快速，找到與給定鍵關(guān)聯(lián)的值的過(guò)程也是如此(肯定比進(jìn)行最多五個(gè)字符串比較要快)。

But, what do we do if our dataset has a string which has more than 11 characters?What if we have one another word with 5 characters, “India”, and try assigning it to an index using our hash function. Since the index 5 is already occupied, we have to make a call on what to do with it. This is called a collision.

但是，如果我們的數(shù)據(jù)集包含一個(gè)包含11個(gè)以上字符的字符串，我們?cè)撛趺崔k？如果我們有另一個(gè)5個(gè)字符的單詞“印度”，并嘗試使用我們的哈希函數(shù)將其分配給索引，該怎么辦。由于索引5已被占用，因此我們必須調(diào)用如何處理它。這稱為碰撞。

If our dataset had a string with thousand characters, and you make an array of thousand indices to store the data, it would result in a wastage of space. If our keys were random words from English, where there are so many words with same length, using length as a hashing function would be fairly useless.

如果我們的數(shù)據(jù)集包含一個(gè)包含數(shù)千個(gè)字符的字符串，并且您創(chuàng)建了一個(gè)包含數(shù)千個(gè)索引的數(shù)組來(lái)存儲(chǔ)數(shù)據(jù)，則將導(dǎo)致空間的浪費(fèi)。如果我們的關(guān)鍵字是來(lái)自英語(yǔ)的隨機(jī)單詞，其中有那么多個(gè)長(zhǎng)度相同的單詞，那么將length用作哈希函數(shù)將毫無(wú)用處。

碰撞處理 (Collision Handling)

Two basic methods are used to handle collisions.

使用兩種基本方法來(lái)處理沖突。

Separate Chaining

單獨(dú)鏈接

Open Addressing

開放式尋址

單獨(dú)鏈接 (Separate Chaining)

Hash collision handling by separate chaining, uses an additional data structure, preferrably linked list for dynamic allocation, into buckets. In our example, when we add India to the dataset, it is appended to the linked list stored at the index 5, then our table would look like this.

通過(guò)單獨(dú)的鏈進(jìn)行哈希沖突處理，將其他數(shù)據(jù)結(jié)構(gòu)(最好是用于動(dòng)態(tài)分配的鏈表)使用到存儲(chǔ)桶中。在我們的示例中，當(dāng)將印度添加到數(shù)據(jù)集時(shí)，它將印度附加到存儲(chǔ)在索引5的鏈表中，那么我們的表將如下所示。

To find an item we first go to the bucket and then compare keys. This is a popular method, and if a list of links is used the hash never fills up. The cost for get(k) is on average O(n) where n is the number of keys in the bucket, total number of keys be N.

要查找物品，我們首先進(jìn)入存儲(chǔ)桶，然后比較鍵。這是一種流行的方法，如果使用鏈接列表，則哈希永遠(yuǎn)不會(huì)填滿。 get(k)的成本平均為O(n) ，其中n是存儲(chǔ)桶中的密鑰數(shù)，密鑰總數(shù)為N。

The problem with separate chaining is that the data structure can grow with out bounds.

單獨(dú)鏈接的問(wèn)題在于數(shù)據(jù)結(jié)構(gòu)可以無(wú)限制地增長(zhǎng)。

開放式尋址 (Open Addressing)

Open addressing does not introduce any new data structure. If a collision occurs then we look for availability in the next spot generated by an algorithm. Open Addressing is generally used where storage space is a restricted, i.e. embedded processors. Open addressing not necessarily faster then separate chaining.

開放式尋址不會(huì)引入任何新的數(shù)據(jù)結(jié)構(gòu)。如果發(fā)生沖突，那么我們會(huì)在算法生成的下一個(gè)位置中尋找可用性。開放式尋址通常用于存儲(chǔ)空間有限(即嵌入式處理器)的地方。開放式尋址不一定比單獨(dú)鏈接要快。

Methods for Open Addressing

開放式尋址方法

[Linear Probing
[線性探測(cè)
Quadratic Probing
二次探測(cè)
Double Hashing
雙重散列

如何在代碼中使用哈希。 (How to use hashing in your code.)

Python (Python)

# Few languages like Python, Ruby come with an in-built hashing support.# Declarationmy_hash_table = {}my_hash_table = dict()# Insertionmy_hash_table[key] = value# Look upvalue = my_hash_table.get(key) # returns None if the key is not present || Deferred in python 3, available in python 2value = my_hash_table[key] # throws a ValueError exception if the key is not present# Deletiondel my_hash_table[key] # throws a ValueError exception if the key is not present# Getting all keys and values stored in the dictionarykeys = my_hash_table.keys()values = my_hash_table.values()

Run Code

運(yùn)行代碼

Java (Java)

// Java doesn't include hashing by default, you have to import it from java.util library// Importing hashmapsimport java.util.HashMap;// DeclarationHashMap<Integer, Integer> myHashTable = new HashMap<Integer, Integer>(); // declares an empty map.// InsertionmyHashTable.put(key, value);// DeletionmyHashtable.remove(key);// Look upmyHashTable.get(key); // returns null if the key K is not presentmyHashTable.containsKey(key); // returns a boolean value, indicating the presence of a key// Number of key, value pairs in the hash tablemyHashTable.size();