那么到底什么是热点???
注意這里說的不是hot spot。。。
是hbase中的一個概念熱點。
在其官方文檔的行鍵設(shè)計中有著明確的說明:【谷歌翻譯,其中有些點,請自動忽略!!!】
熱點發(fā)現(xiàn)
HBase中的行按行鍵按字典順序排序。該設(shè)計針對掃描進行了優(yōu)化,使您可以將相關(guān)行或?qū)⒁黄鹱x取的行彼此靠近存儲。但是,設(shè)計不當?shù)男墟I是引起熱點的常見原因。當大量客戶端流量定向到群集的一個節(jié)點或僅幾個節(jié)點時,就會發(fā)生熱點。此流量可能表示讀取,寫入或其他操作。流量使負責托管該區(qū)域的單臺計算機不堪重負,從而導致性能下降并可能導致區(qū)域不可用。這也可能對由同一區(qū)域服務(wù)器托管的其他區(qū)域產(chǎn)生不利影響,因為該主機無法滿足請求的負載。設(shè)計數(shù)據(jù)訪問模式非常重要,這樣才能充分,均勻地利用群集。
為防止寫入時出現(xiàn)熱點,請設(shè)計行鍵,以使確實確實需要在同一區(qū)域中的行存在,但從更大的角度看,數(shù)據(jù)將被寫入集群中的多個區(qū)域,而不是一次寫入一個區(qū)域。下面介紹了一些避免熱點的常用技術(shù),以及它們的一些優(yōu)點和缺點。
鹽從這種意義上講,加鹽與加密無關(guān),而是指將隨機數(shù)據(jù)添加到行密鑰的開頭。在這種情況下,加鹽是指在行鍵上添加一個隨機分配的前綴,以使其排序不同于其他方式。可能的前綴數(shù)量對應(yīng)于您要分布數(shù)據(jù)的區(qū)域數(shù)量。如果您在其他分布更均勻的行中反復出現(xiàn)一些“熱”行鍵模式,則鹽析會有所幫助。考慮下面的示例,該示例表明加鹽可以將寫入負載分散到多個RegionServer上,并說明對讀取的某些負面影響。
例子11.加鹽的例子假設(shè)您具有以下行鍵列表,并且對表進行了拆分,以使字母表中的每個字母都有一個區(qū)域。前綴“ a”是一個區(qū)域,前綴“ b”是另一個區(qū)域。在此表中,所有以'f'開頭的行都在同一區(qū)域中。本示例重點介紹具有以下鍵的行:
foo0001 foo0002 foo0003 foo0004現(xiàn)在,假設(shè)您想將它們分布在四個不同的區(qū)域。您決定使用四個不同的鹽:a,b,c,和d。在這種情況下,這些字母前綴中的每一個都將位于不同的區(qū)域。應(yīng)用鹽后,將改為使用以下行鍵。由于您現(xiàn)在可以寫入四個單獨的區(qū)域,因此理論上您在寫入時的吞吐量是所有寫入相同區(qū)域時的吞吐量的四倍。
a-foo0003 b-foo0001 c-foo0004 d-foo0002然后,如果您添加另一行,則會隨機為其分配四個可能的鹽值之一,并最終靠近現(xiàn)有行之一。
a-foo0003 b-foo0001 c-foo0003 c-foo0004 d-foo0002由于此分配是隨機的,因此,如果要按字典順序檢索行,則需要做更多的工作。這樣,鹽化會嘗試增加寫入的吞吐量,但是會在讀取期間增加成本。
散列可以使用單向散列代替給定行,該散列將使給定的行始終使用相同的前綴“加鹽”,其方式是將負載分布在RegionServer上,但允許在讀取過程中進行可預測性。使用確定性哈希可以使客戶端重建完整的行鍵,并使用Get操作正常檢索該行。
例子12.散列例子 鑒于上述添加示例中的情況相同,您可以改為應(yīng)用單向哈希,這將導致具有鍵的行foo0003始終且可預測地接收a前綴。然后,要檢索該行,您將已經(jīng)知道密鑰。您還可以優(yōu)化事物,例如使某些對密鑰始終位于同一區(qū)域。 倒轉(zhuǎn)鑰匙防止熱點的第三個常見技巧是反轉(zhuǎn)固定寬度或數(shù)字行鍵,以便最頻繁更改(最低有效數(shù)字)的部分位于第一位。這有效地使行鍵隨機化,但犧牲了行排序?qū)傩浴?/span>
參見https://communities.intel.com/community/itpeernetwork/datastack/blog/2013/11/10/discussion-on-designing-hbase-tables, and?article on Salted Tables?from the Phoenix project, and the discussion in the comments of?HBASE-11682?for more information about avoiding hotspotting.
?
原文:?
Hotspotting
Rows in HBase are sorted lexicographically by row key. This design optimizes for scans, allowing you to store related rows, or rows that will be read together, near each other. However, poorly designed row keys are a common source of?hotspotting. Hotspotting occurs when a large amount of client traffic is directed at one node, or only a few nodes, of a cluster. This traffic may represent reads, writes, or other operations. The traffic overwhelms the single machine responsible for hosting that region, causing performance degradation and potentially leading to region unavailability. This can also have adverse effects on other regions hosted by the same region server as that host is unable to service the requested load. It is important to design data access patterns such that the cluster is fully and evenly utilized.
To prevent hotspotting on writes, design your row keys such that rows that truly do need to be in the same region are, but in the bigger picture, data is being written to multiple regions across the cluster, rather than one at a time. Some common techniques for avoiding hotspotting are described below, along with some of their advantages and drawbacks.
Salting
Salting in this sense has nothing to do with cryptography, but refers to adding random data to the start of a row key. In this case, salting refers to adding a randomly-assigned prefix to the row key to cause it to sort differently than it otherwise would. The number of possible prefixes correspond to the number of regions you want to spread the data across. Salting can be helpful if you have a few "hot" row key patterns which come up over and over amongst other more evenly-distributed rows. Consider the following example, which shows that salting can spread write load across multiple RegionServers, and illustrates some of the negative implications for reads.
Example 11. Salting Example
Suppose you have the following list of row keys, and your table is split such that there is one region for each letter of the alphabet. Prefix 'a' is one region, prefix 'b' is another. In this table, all rows starting with 'f' are in the same region. This example focuses on rows with keys like the following:
foo0001 foo0002 foo0003 foo0004Now, imagine that you would like to spread these across four different regions. You decide to use four different salts:?a,?b,?c, and?d. In this scenario, each of these letter prefixes will be on a different region. After applying the salts, you have the following rowkeys instead. Since you can now write to four separate regions, you theoretically have four times the throughput when writing that you would have if all the writes were going to the same region.
a-foo0003 b-foo0001 c-foo0004 d-foo0002Then, if you add another row, it will randomly be assigned one of the four possible salt values and end up near one of the existing rows.
a-foo0003 b-foo0001 c-foo0003 c-foo0004 d-foo0002Since this assignment will be random, you will need to do more work if you want to retrieve the rows in lexicographic order. In this way, salting attempts to increase throughput on writes, but has a cost during reads.
Hashing
Instead of a random assignment, you could use a one-way?hash?that would cause a given row to always be "salted" with the same prefix, in a way that would spread the load across the RegionServers, but allow for predictability during reads. Using a deterministic hash allows the client to reconstruct the complete rowkey and use a Get operation to retrieve that row as normal.
Example 12. Hashing Example
Given the same situation in the salting example above, you could instead apply a one-way hash that would cause the row with key?foo0003?to always, and predictably, receive the?a?prefix. Then, to retrieve that row, you would already know the key. You could also optimize things so that certain pairs of keys were always in the same region, for instance.
Reversing the Key
A third common trick for preventing hotspotting is to reverse a fixed-width or numeric row key so that the part that changes the most often (the least significant digit) is first. This effectively randomizes row keys, but sacrifices row ordering properties.
See?https://communities.intel.com/community/itpeernetwork/datastack/blog/2013/11/10/discussion-on-designing-hbase-tables, and?article on Salted Tables?from the Phoenix project, and the discussion in the comments of?HBASE-11682?for more information about avoiding hotspotting.
總結(jié)
以上是生活随笔為你收集整理的那么到底什么是热点???的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 马蜂窝数据仓库设计与实践
- 下一篇: 数据仓库、数据集市、数据湖,这些大数据名