kaldi中hashlist阅读总结
生活随笔
收集整理的這篇文章主要介紹了
kaldi中hashlist阅读总结
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
kaldi中的解碼算法里,需要記錄很多的令牌(token)。每個令牌,都是一條路徑的“頭”,通過這個令牌回溯,就可以得到一條完整的路徑。如果解碼到最后一幀,從所有的令牌中,找到得分最優(yōu)的那個的令牌,回溯得到路徑,其路徑上的輸出,就是識別結果。(one-bese結果)
在解碼過程中,會產生很多的令牌。需要設計一種數據結構和相關算法,用來保存和更新令牌。其設計要求可以簡單概括如下。
? 1、可以快速判斷某個令牌是否已經存在
? 2、可以快速插入令牌(令牌不存在的時候)
? 3、可以很容易地遍歷所有的令牌
kaldi的作者設計了一種HashList結果用來保存令牌。HashList跟一般數據結構書籍里介紹的hash表類似,但比它要復雜一些。
這里先介紹一下通常數據結構書籍里描述的hash表。
hash表,是人為設計的一種數據結構。通過一個hash函數,將某個集合中的元素,映射到一個hash表(一般用數組之類的來實現)中。通過這個hash表,可以判斷某個元素是否已經存在,并且可以快速插入新數據。
hash函數常用的一種方法是除留余數法,就是對某個數(一般是hash表的大小)做模運算,得到的余數作為索引index,依據index來存放數據。兩個數據通過hash函數運算之后,可以得到相同的索引值,就會產生沖突。hash表解決沖突的方法有鏈表法和開放地址法。
下面舉例說明下hash表
? 1、hash表大小為5。采用除留余數法作為hash函數。采用鏈地址法解決沖突。
? 2、依次插入的數據分布是1, 8, 3, 10, 15, 5, 21, 18。
通過上面的描述,我們可以得到下面的hash表。
kaldi中HashList也是一種hash表。它跟上面例子中介紹的鏈地址法hash表類似,但也有不一樣的地方。假如還是使用上面例子中的條件,則HashList產生的hash表,畫出來如下圖所示。
可以看出,HashList的結構,跟普通的鏈表hash不同的主要有以下幾點
1、HashList中hash項指向的元素,是鏈表的最后一個元素(last_elem)。
2、HashList中hash項有個特定的域(pre_bucket,上面圖片中紅色的數字),通過這個可以找到上一個hash項,然后找到本鏈表的頭結點
3、有一個額外的頭結點(list_head),通過它可以訪問所有的結點。
如果想要查找某個數是否在hash表中,通過取模得到余數,也就是索引index。如果對應的hash項為空,表示未找到。如果hash項不為空,則通過上面的1和2兩點,可以確定鏈表的頭和尾巴,在鏈表中搜索即可。
類的實現代碼 // util/hash-list-inl.h// Copyright 2009-2011 Microsoft Corporation // 2013 Johns Hopkins University (author: Daniel Povey)// See ../../COPYING for clarification regarding multiple authors // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY // KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED // WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE, // MERCHANTABLITY OR NON-INFRINGEMENT. // See the Apache 2 License for the specific language governing permissions and // limitations under the License.#ifndef KALDI_UTIL_HASH_LIST_INL_H_ #define KALDI_UTIL_HASH_LIST_INL_H_// Do not include this file directly. It is included by fast-hash.hnamespace kaldi {template<class I, class T> HashList<I, T>::HashList() {list_head_ = NULL;bucket_list_tail_ = static_cast<size_t>(-1); // invalid.hash_size_ = 0;freed_head_ = NULL; }template<class I, class T> void HashList<I, T>::SetSize(size_t size) {hash_size_ = size;KALDI_ASSERT(list_head_ == NULL &&bucket_list_tail_ == static_cast<size_t>(-1)); // make sure empty.if (size > buckets_.size())buckets_.resize(size, HashBucket(0, NULL)); }template<class I, class T> typename HashList<I, T>::Elem* HashList<I, T>::Clear() {// Clears the hashtable and gives ownership of the currently contained list// to the user.for (size_t cur_bucket = bucket_list_tail_;cur_bucket != static_cast<size_t>(-1);cur_bucket = buckets_[cur_bucket].prev_bucket) {buckets_[cur_bucket].last_elem = NULL; // this is how we indicate "empty".}bucket_list_tail_ = static_cast<size_t>(-1);Elem *ans = list_head_;list_head_ = NULL;return ans; }template<class I, class T> const typename HashList<I, T>::Elem* HashList<I, T>::GetList() const {return list_head_; }template<class I, class T> inline void HashList<I, T>::Delete(Elem *e) {e->tail = freed_head_;freed_head_ = e; }template<class I, class T> inline typename HashList<I, T>::Elem* HashList<I, T>::Find(I key) {size_t index = (static_cast<size_t>(key) % hash_size_);HashBucket &bucket = buckets_[index];if (bucket.last_elem == NULL) {return NULL; // empty bucket.} else {Elem *head = (bucket.prev_bucket == static_cast<size_t>(-1) ?list_head_ :buckets_[bucket.prev_bucket].last_elem->tail),*tail = bucket.last_elem->tail;for (Elem *e = head; e != tail; e = e->tail)if (e->key == key) return e;return NULL; // Not found.} }template<class I, class T> inline typename HashList<I, T>::Elem* HashList<I, T>::New() {if (freed_head_) {Elem *ans = freed_head_;freed_head_ = freed_head_->tail;return ans;} else {Elem *tmp = new Elem[allocate_block_size_];for (size_t i = 0; i+1 < allocate_block_size_; i++)tmp[i].tail = tmp+i+1;tmp[allocate_block_size_-1].tail = NULL;freed_head_ = tmp;allocated_.push_back(tmp);return this->New();} }template<class I, class T> HashList<I, T>::~HashList() {// First test whether we had any memory leak within the// HashList, i.e. things for which the user did not call Delete().size_t num_in_list = 0, num_allocated = 0;for (Elem *e = freed_head_; e != NULL; e = e->tail)num_in_list++;for (size_t i = 0; i < allocated_.size(); i++) {num_allocated += allocate_block_size_;delete[] allocated_[i];}if (num_in_list != num_allocated) {KALDI_WARN << "Possible memory leak: " << num_in_list<< " != " << num_allocated<< ": you might have forgotten to call Delete on "<< "some Elems";} }template<class I, class T> void HashList<I, T>::Insert(I key, T val) {size_t index = (static_cast<size_t>(key) % hash_size_);HashBucket &bucket = buckets_[index];Elem *elem = New();elem->key = key;elem->val = val;if (bucket.last_elem == NULL) { // Unoccupied bucket. Insert at// head of bucket list (which is tail of regular list, they go in// opposite directions).if (bucket_list_tail_ == static_cast<size_t>(-1)) {// list was empty so this is the first elem.KALDI_ASSERT(list_head_ == NULL);list_head_ = elem;} else {// link in to the chain of Elemsbuckets_[bucket_list_tail_].last_elem->tail = elem;}elem->tail = NULL;bucket.last_elem = elem;bucket.prev_bucket = bucket_list_tail_;bucket_list_tail_ = index;} else {// Already-occupied bucket. Insert at tail of list of elements within// the bucket.elem->tail = bucket.last_elem->tail;bucket.last_elem->tail = elem;bucket.last_elem = elem;} }template<class I, class T> void HashList<I, T>::InsertMore(I key, T val) {size_t index = (static_cast<size_t>(key) % hash_size_);HashBucket &bucket = buckets_[index];Elem *elem = New();elem->key = key;elem->val = val;KALDI_ASSERT(bucket.last_elem != NULL); // assume one element is already hereif (bucket.last_elem->key == key) { // standard behavior: add as last elementelem->tail = bucket.last_elem->tail;bucket.last_elem->tail = elem;bucket.last_elem = elem;return;}Elem *e = (bucket.prev_bucket == static_cast<size_t>(-1) ?list_head_ : buckets_[bucket.prev_bucket].last_elem->tail);// find place to insert in linked listwhile (e != bucket.last_elem->tail && e->key != key) e = e->tail;KALDI_ASSERT(e->key == key); // not found? - should not happenelem->tail = e->tail;e->tail = elem; }} // end namespace kaldi#endif // KALDI_UTIL_HASH_LIST_INL_H_
在解碼過程中,會產生很多的令牌。需要設計一種數據結構和相關算法,用來保存和更新令牌。其設計要求可以簡單概括如下。
? 1、可以快速判斷某個令牌是否已經存在
? 2、可以快速插入令牌(令牌不存在的時候)
? 3、可以很容易地遍歷所有的令牌
kaldi的作者設計了一種HashList結果用來保存令牌。HashList跟一般數據結構書籍里介紹的hash表類似,但比它要復雜一些。
這里先介紹一下通常數據結構書籍里描述的hash表。
hash表,是人為設計的一種數據結構。通過一個hash函數,將某個集合中的元素,映射到一個hash表(一般用數組之類的來實現)中。通過這個hash表,可以判斷某個元素是否已經存在,并且可以快速插入新數據。
hash函數常用的一種方法是除留余數法,就是對某個數(一般是hash表的大小)做模運算,得到的余數作為索引index,依據index來存放數據。兩個數據通過hash函數運算之后,可以得到相同的索引值,就會產生沖突。hash表解決沖突的方法有鏈表法和開放地址法。
下面舉例說明下hash表
? 1、hash表大小為5。采用除留余數法作為hash函數。采用鏈地址法解決沖突。
? 2、依次插入的數據分布是1, 8, 3, 10, 15, 5, 21, 18。
通過上面的描述,我們可以得到下面的hash表。
kaldi中HashList也是一種hash表。它跟上面例子中介紹的鏈地址法hash表類似,但也有不一樣的地方。假如還是使用上面例子中的條件,則HashList產生的hash表,畫出來如下圖所示。
可以看出,HashList的結構,跟普通的鏈表hash不同的主要有以下幾點
1、HashList中hash項指向的元素,是鏈表的最后一個元素(last_elem)。
2、HashList中hash項有個特定的域(pre_bucket,上面圖片中紅色的數字),通過這個可以找到上一個hash項,然后找到本鏈表的頭結點
3、有一個額外的頭結點(list_head),通過它可以訪問所有的結點。
如果想要查找某個數是否在hash表中,通過取模得到余數,也就是索引index。如果對應的hash項為空,表示未找到。如果hash項不為空,則通過上面的1和2兩點,可以確定鏈表的頭和尾巴,在鏈表中搜索即可。
kaldi中每計算一幀數據,都伴隨著HashList中元素的新建和銷毀。如果調用系統的new和delete操作,會帶來很大的影響。HashList中會預先申請一塊內存(一個數組),然后new和delete元素,就是在這些數組上操作的。HashList中的變量allocated_和freed_head_是跟內存相關的變量;New()和Delete()是跟分配和回收相關的操作。
下面是hashlist的原代碼,大家可以仔細閱讀體會。
// util/hash-list.h// Copyright 2009-2011 Microsoft Corporation // 2013 Johns Hopkins University (author: Daniel Povey)// See ../../COPYING for clarification regarding multiple authors // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY // KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED // WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE, // MERCHANTABLITY OR NON-INFRINGEMENT. // See the Apache 2 License for the specific language governing permissions and // limitations under the License.#ifndef KALDI_UTIL_HASH_LIST_H_ #define KALDI_UTIL_HASH_LIST_H_ #include <vector> #include <set> #include <algorithm> #include <limits> #include <cassert> #include "util/stl-utils.h"/* This header provides utilities for a structure that's used in a decoder (butis quite generic in nature so we implement and test it separately).Basically it's a singly-linked list, but implemented in such a way that wecan quickly search for elements in the list. We give it a slightly richerinterface than just a hash and a list. The idea is that we want to separatethe hash part and the list part: basically, in the decoder, we want to have asingle hash for the current frame and the next frame, because by the time weneed to access the hash for the next frame we no longer need the hash for theprevious frame. So we have an operation that clears the hash but leaves thelist structure intact. We also control memory management inside this object,to avoid repeated new's/deletes.See hash-list-test.cc for an example of how to use this object. */namespace kaldi {template<class I, class T> class HashList {public:struct Elem {I key;T val;Elem *tail;};/// Constructor takes no arguments./// Call SetSize to inform it of the likely size.HashList();/// Clears the hash and gives the head of the current list to the user;/// ownership is transferred to the user (the user must call Delete()/// for each element in the list, at his/her leisure).Elem *Clear();/// Gives the head of the current list to the user. Ownership retained in the/// class. Caution: in December 2013 the return type was changed to const/// Elem* and this function was made const. You may need to change some types/// of local Elem* variables to const if this produces compilation errors.const Elem *GetList() const;/// Think of this like delete(). It is to be called for each Elem in turn/// after you "obtained ownership" by doing Clear(). This is not the opposite/// of. Insert, it is the opposite of New. It's really a memory operation.inline void Delete(Elem *e);/// This should probably not be needed to be called directly by the user./// Think of it as opposite/// to Delete();inline Elem *New();/// Find tries to find this element in the current list using the hashtable./// It returns NULL if not present. The Elem it returns is not owned by the/// user, it is part of the internal list owned by this object, but the user/// is free to modify the "val" element.inline Elem *Find(I key);/// Insert inserts a new element into the hashtable/stored list. By calling/// this,/// the user asserts that it is not already present (e.g. Find was called and/// returned NULL). With current code, calling this if an element already/// exists will result in duplicate elements in the structure, and Find()/// will find the first one that was added./// [but we don't guarantee this behavior].inline void Insert(I key, T val);/// Insert inserts another element with same key into the hashtable//// stored list./// By calling this, the user asserts that one element with that key is/// already present./// We insert it that way, that all elements with the same key/// follow each other./// Find() will return the first one of the elements with the same key.inline void InsertMore(I key, T val);/// SetSize tells the object how many hash buckets to allocate (should/// typically be at least twice the number of objects we expect to go in the/// structure, for fastest performance). It must be called while the hash/// is empty (e.g. after Clear() or after initializing the object, but before/// adding anything to the hash.void SetSize(size_t sz);/// Returns current number of hash buckets.inline size_t Size() { return hash_size_; }~HashList();private:struct HashBucket {size_t prev_bucket; // index to next bucket (-1 if list tail). Note:// list of buckets goes in opposite direction to list of Elems.Elem *last_elem; // pointer to last element in this bucket (NULL if empty)inline HashBucket(size_t i, Elem *e): prev_bucket(i), last_elem(e) {}};Elem *list_head_; // head of currently stored list.size_t bucket_list_tail_; // tail of list of active hash buckets.size_t hash_size_; // number of hash buckets.std::vector<HashBucket> buckets_;Elem *freed_head_; // head of list of currently freed elements. [ready for// allocation]std::vector<Elem*> allocated_; // list of allocated blocks.static const size_t allocate_block_size_ = 1024; // Number of Elements to// allocate in one block. Must be largish so storing allocated_ doesn't// become a problem. };} // end namespace kaldi#include "util/hash-list-inl.h"#endif // KALDI_UTIL_HASH_LIST_H_類的實現代碼 // util/hash-list-inl.h// Copyright 2009-2011 Microsoft Corporation // 2013 Johns Hopkins University (author: Daniel Povey)// See ../../COPYING for clarification regarding multiple authors // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY // KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED // WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE, // MERCHANTABLITY OR NON-INFRINGEMENT. // See the Apache 2 License for the specific language governing permissions and // limitations under the License.#ifndef KALDI_UTIL_HASH_LIST_INL_H_ #define KALDI_UTIL_HASH_LIST_INL_H_// Do not include this file directly. It is included by fast-hash.hnamespace kaldi {template<class I, class T> HashList<I, T>::HashList() {list_head_ = NULL;bucket_list_tail_ = static_cast<size_t>(-1); // invalid.hash_size_ = 0;freed_head_ = NULL; }template<class I, class T> void HashList<I, T>::SetSize(size_t size) {hash_size_ = size;KALDI_ASSERT(list_head_ == NULL &&bucket_list_tail_ == static_cast<size_t>(-1)); // make sure empty.if (size > buckets_.size())buckets_.resize(size, HashBucket(0, NULL)); }template<class I, class T> typename HashList<I, T>::Elem* HashList<I, T>::Clear() {// Clears the hashtable and gives ownership of the currently contained list// to the user.for (size_t cur_bucket = bucket_list_tail_;cur_bucket != static_cast<size_t>(-1);cur_bucket = buckets_[cur_bucket].prev_bucket) {buckets_[cur_bucket].last_elem = NULL; // this is how we indicate "empty".}bucket_list_tail_ = static_cast<size_t>(-1);Elem *ans = list_head_;list_head_ = NULL;return ans; }template<class I, class T> const typename HashList<I, T>::Elem* HashList<I, T>::GetList() const {return list_head_; }template<class I, class T> inline void HashList<I, T>::Delete(Elem *e) {e->tail = freed_head_;freed_head_ = e; }template<class I, class T> inline typename HashList<I, T>::Elem* HashList<I, T>::Find(I key) {size_t index = (static_cast<size_t>(key) % hash_size_);HashBucket &bucket = buckets_[index];if (bucket.last_elem == NULL) {return NULL; // empty bucket.} else {Elem *head = (bucket.prev_bucket == static_cast<size_t>(-1) ?list_head_ :buckets_[bucket.prev_bucket].last_elem->tail),*tail = bucket.last_elem->tail;for (Elem *e = head; e != tail; e = e->tail)if (e->key == key) return e;return NULL; // Not found.} }template<class I, class T> inline typename HashList<I, T>::Elem* HashList<I, T>::New() {if (freed_head_) {Elem *ans = freed_head_;freed_head_ = freed_head_->tail;return ans;} else {Elem *tmp = new Elem[allocate_block_size_];for (size_t i = 0; i+1 < allocate_block_size_; i++)tmp[i].tail = tmp+i+1;tmp[allocate_block_size_-1].tail = NULL;freed_head_ = tmp;allocated_.push_back(tmp);return this->New();} }template<class I, class T> HashList<I, T>::~HashList() {// First test whether we had any memory leak within the// HashList, i.e. things for which the user did not call Delete().size_t num_in_list = 0, num_allocated = 0;for (Elem *e = freed_head_; e != NULL; e = e->tail)num_in_list++;for (size_t i = 0; i < allocated_.size(); i++) {num_allocated += allocate_block_size_;delete[] allocated_[i];}if (num_in_list != num_allocated) {KALDI_WARN << "Possible memory leak: " << num_in_list<< " != " << num_allocated<< ": you might have forgotten to call Delete on "<< "some Elems";} }template<class I, class T> void HashList<I, T>::Insert(I key, T val) {size_t index = (static_cast<size_t>(key) % hash_size_);HashBucket &bucket = buckets_[index];Elem *elem = New();elem->key = key;elem->val = val;if (bucket.last_elem == NULL) { // Unoccupied bucket. Insert at// head of bucket list (which is tail of regular list, they go in// opposite directions).if (bucket_list_tail_ == static_cast<size_t>(-1)) {// list was empty so this is the first elem.KALDI_ASSERT(list_head_ == NULL);list_head_ = elem;} else {// link in to the chain of Elemsbuckets_[bucket_list_tail_].last_elem->tail = elem;}elem->tail = NULL;bucket.last_elem = elem;bucket.prev_bucket = bucket_list_tail_;bucket_list_tail_ = index;} else {// Already-occupied bucket. Insert at tail of list of elements within// the bucket.elem->tail = bucket.last_elem->tail;bucket.last_elem->tail = elem;bucket.last_elem = elem;} }template<class I, class T> void HashList<I, T>::InsertMore(I key, T val) {size_t index = (static_cast<size_t>(key) % hash_size_);HashBucket &bucket = buckets_[index];Elem *elem = New();elem->key = key;elem->val = val;KALDI_ASSERT(bucket.last_elem != NULL); // assume one element is already hereif (bucket.last_elem->key == key) { // standard behavior: add as last elementelem->tail = bucket.last_elem->tail;bucket.last_elem->tail = elem;bucket.last_elem = elem;return;}Elem *e = (bucket.prev_bucket == static_cast<size_t>(-1) ?list_head_ : buckets_[bucket.prev_bucket].last_elem->tail);// find place to insert in linked listwhile (e != bucket.last_elem->tail && e->key != key) e = e->tail;KALDI_ASSERT(e->key == key); // not found? - should not happenelem->tail = e->tail;e->tail = elem; }} // end namespace kaldi#endif // KALDI_UTIL_HASH_LIST_INL_H_
總結
以上是生活随笔為你收集整理的kaldi中hashlist阅读总结的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: ArcGIS教程:地理处理服务示例(裁剪
- 下一篇: Single-stage目标检测网络YO