TRIE - Data Structure
Introduction
介紹
Trie,又稱單詞查找樹,是一種樹形結構,用于保存大量的字符串。它的優點是:利用字符串的公共前綴來節約存儲空間。
Trie is an ordered tree data structure that uses strings as keys. Unlike Binary Trees, Tries do not store keys associated with the node. The key is actually determined based on the position of the node on the tree. Any descendants of a node shares a common prefix of the key string associated with that node. Hence, trie is also called as Prefix Tree. The word "trie" comes from Retrieval, and it is pronounced as "try".
Trie是一種以字符串為鍵的有序樹狀數據結構,與二叉樹不同的是, Trie并不會存儲節點鍵值,節點鍵值是被節點在樹狀數據結構中的位置所決定的。一個特定節點的所有子孫的鍵值都有一個公共的前綴字符,所有Trie也被稱為前綴樹。 單詞“Trie”來自于英文單詞“Retrieve”,其發音與單詞“Try”相同。
Since this data structure is a prefix tree, trie is commonly used in Dictionaries, Phone Directories and matching algorithms. Trie is best-suited for phone directory (any matching application for that matter) because it is very efficient in matching strings.
因為這個數據結構是一個前綴樹,Trie通常被用在字典算法中,尤其電話號碼目錄的匹配算法中。在字符串匹配中Trie是非常高效的,對于電話號碼目錄是非常適合的。
So I have decided to implement Trie myself in C#. I have created three classes:
所以我決定使用C#來實現一個Trie數據結構,定義了以下一些類:
- Node: Represents a single tree node;
- Node:樹節點類;
- NodeCollection: Represents the children of a node;
- NodeCollection:樹節點結合類;
- Trie: Trie implementation to insert and search nodes.
- Trie:Trie類實現了節點的查找以及插入操作。
Implementation
實現
Node: Node represents a basic tree node. Node implements both Depth First and Breadth First algorithms to search its children. It also contains its Parent node and Children node. Node has a key and a Value. Key contains the single character and the value has the actual value of the node. The actual key of the node will be determined by suffixing the single character to its parent's key. Node has a special property called IsTerminal. This property is set to true if the key or a value represents a complete string.
Node:Node類代表了一個基本的節點類,該類同時實現了深度優先和廣度優先的子節點查找算法,該類包含了對父節點以及子節點的引用。Node類有兩個屬性Key和Value,Key是一個單個字符,節點類真正的鍵值是以其Key屬性為后綴,然后依次加上其父節點的Key屬性。另外Node還有一個屬性叫IsTerminal,來表明其Value或者Key是否表示了一個完整的字符串。
See the picture below:
參見下面的圖片:
?
NodeCollection: This class is a simple collection class which implements the IEnumerable interface for iteration operations. This class contains an internal List class of type Node. NodeCollection implements the standard list methods such as Add( ), Contains( ), Remove( ) etc.
NodeCollection:該類是一個簡單的集合類型,實現了用于枚舉的IEnumerable接口,其內部包含了一個List<Node>類型,支持一些集合類的標準操作例如添加,刪除以及判斷是否包含指定節點等等操作。
Trie: This class implements the Trie algorithms such as Insert and Find methods.
Trie:該類實現了Trie算法,實現了插入和查找操作。
示例代碼:
//Inserts?Names?into?the?Trie?data?structurepublic?static?Node?InsertNode(string?name,?Node?root)
{
????//Is?name?null?
????if?(string.IsNullOrEmpty(name))
????????throw?new?ArgumentNullException("Null?Key");
????//set?the?index,?start?inserting?characters
????int?index?=?1;
????//key
????string?key;
????//start?with?the?root?node
????Node?currentNode?=?root;
????//loop?for?all?charecters?in?the?name
????while?(index?<=?name.Length)
????{
????????//get?the?key?character
????????key?=?name[index?-?1].ToString();
????????//does?the?node?with?same?key?already?exist?
????????Node?resultNode?=?currentNode.Children.GetNodeByKey(key);
????????//No,?this?is?a?new?key
????????if?(resultNode?==?null)
????????{
????????????//Add?a?node
????????????Node?newNode?=?new?Node(key,?name.Substring(0,?index));
????????????//If?reached?the?last?charaecter,?this?is?a?valid?full?name
????????????if?(index?==?name.Length)
????????????????newNode.IsTerminal?=?true;
????????????//add?the?node?to?currentNode(i.e.?Root?node?for?the?first?time)
????????????currentNode.Children.Add(newNode);
????????????//set?as?the?current?node
????????????currentNode?=?newNode;
????????}
????????else
????????{
????????????//node?already?exist,?set?as?tghe?current?node
????????????//and?move?to?the?next?character?in?the?name
????????????currentNode?=?resultNode;
????????}
????????//move?to?the?next?character?in?the?name
????????index++;
????}
????//all?done,?return?root?node
????return?root;
}
The Insert method inserts the string as one character at a time. It starts with the first character; if the first character doesn't already exist in the root node it adds a new node with the new character and returns the new node. Otherwise it returns the node with the fist character for adding remaining characters. It loops until it adds the entire string. Once it reaches the last character, it marks that node as a terminal node because this node represents a complete string in the tree hierarchy.
插入操作再插入一個字符串的時候,從第一個字符開始,每次只處理一個字符;如果第一個字符在根節點的子節點中沒有存在,那么會使用該字符添加一個新的節點然后返回,否則返回已經存在的節點,然后依次循環后面的字符串。一旦到達最后一個字符串,就會標識該節點為一個終止節點(IsTerminal為True),因為在整個樹結構上其表示了一個完整的字符串。
The Find methods is implemented by Depth First search algorithm. The tree is searched until the complete string is found. Below is the code.
查找方法實現了深度優先的查找算法,整個樹形數據結構將被查找直至該字符串被找到。下面是示例代碼:
//Find?a?node?given?the?key("Jo")
public?static?bool?Find(Node?node,?string?key){????
????//Is?key?empty
????if?(string.IsNullOrEmpty(key))
????????return?true;//terminal?Node
????//get?the?first?character
????string?first?=?key.Substring(0,?1);
????//get?the?tail:?key?-?first?character
????string?tail?=?key.Substring(1);
????Node?curNode?=?node.Children.GetNodeByKey(first);
????//loop?until?you?locate?the?key?i.e.?"Jo"
????if?(curNode?!=?null)
????{
????????return?Find(curNode,?tail);
????}
????else
????{
????????//not?found,?return?false
????????return?false;
????}
}
?
?
I've attached the entire source code above. The source code contains the Trie class library and a console application to test the Trie library. The console application loads a set of names (stored in names.txt in debug folder) in to the tree and provides options to run Depth First & Breadth First algorithm. The application also provides options for Directory Look-Up and Find option.
我已經將源代碼添加在附件中了,源代碼中包含了Trie算法類庫一個Console測試程序。Console測試程序會加載一些字符串(存儲在Debug文件夾下的names.txt文件中)到Trie樹上,并且可以在深度優先以及廣度優先切換算法。
The class library can be further used to develop a web based phone directory. The data can also be stored on the client (it is too small) and the Trie can be implemented in JavaScript.
這個類庫可以被進一步開發成為一個基于Web的電話目錄,數據可以存儲在客戶端,然后使用JavaScript來實現Trie算法。
Happy Coding!
編程快樂!
原貼地址:http://www.codeproject.com/KB/recipes/PhoneDirectory.aspx
?
轉載于:https://www.cnblogs.com/tedzhao/archive/2008/10/19/1314665.html
總結
以上是生活随笔為你收集整理的TRIE - Data Structure的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: wmsys.WM_CONCAT
- 下一篇: 深入剖析C#继承机制