Prefix Operations and Tries
Last updated
Last updated
Tries is an ordered tree data structure used to store keys which can be broken into "characters" and share prefixes with other keys.
Every node stores only one letter.
Nodes can be shared by multiple keys.
The last character of the key is marked.
To check if the trie contains a key, walk down the tree from the root along the correct nodes. If the character does not exist or the final node is not marked, the key does not exist.
For example, the following trie contains ["sam", "sad", "sap", "same", "a", "awls"]
.
contains("a)
: true, the final node is marked
contains("sa")
: false, the final node is not marked
contains("saq")
: false, fell off tree
The DataIndexedCharMap
class is efficient implementation for a map which takes in character keys. The value R
represents the number of possible characters (128 for ASCII).
When building a trie, the DataIndexedCharMap
class provides a map to all of a nodes' children. However, the DataIndexedCharMap
object of a node with relatively few children will have mostly null links, which is wasting excess memories.
Since each link corresponds to a character if and only if that character exists, the character variable of each node can be removed. The value of the character is its position in the parent.
The problem of the implementation above is that it wastes excess memories with lots of null spots that don't contain any children. However, alternative ways to track children is available.
Hash-Table based Trie: Initialize the default value and resize the array only when necessary with the load factor.
BST based Trie: Create children pointers when necessary and store them in a BST. Runtime is not efficient.
DataIndexedCharMap
Space: 128 links per node
Runtime: Θ(1)
BST
Space: C links per node
Runtime: O(log R)
Hash Table
Space: C links per node
Runtime: O(R)
(C is the number of children. R is the size of the alphabet.)
If a Trie has N keys, the runtime for our Map/Set operations are as follows:
add: Θ(1)
contains: Θ(1)
The runtime is independent from the number of the keys, since the operations only traverse the length of one key in the worst case.
Tries could efficiently support specific string operations like prefix matching.
The following pseudocode of collect
operation could return all keys in a trie.
The following pseudocode of keyWithPrefix
operation could return all keys with specific prefix. (The collect
operation begins from the node at the end of the prefix.)
The search suggestion of Google is a kind of autocomplete, which could be implemented with a trie.
Values of nodes will represent the weight of the string.
Trie could store billions of strings efficiently since they share nodes.
When a user types a query, keysWithPrefix
operation will return several strings with the highest value.