is a data structure for storing key/value pairs; that is, an implementation
of the DictionaryDataStructure
. Hash tables can be startlingly efficient; insertion and
retrieval take expected O(1) time (at least for some sorts of HashTable
). The downside:
that word "expected" (if you are unlucky, everything can become O(N) instead), and highish
overhead cost (so that for very small tables, something much simpler may work better).
The basic idea behind hash tables is very simple. Imagine for a moment that all your
keys are small integers: say, none larger than 1000. Then you can just build an array
with 1000 elements, and insertion and retrieval just go via direct lookup, obviously
O(1). Unfortunately, in practice your keys might be strings or 1000-digit numbers or
something worse, so you define a hash function
which maps each key to a small integer;
then, to look up a given key, you feed it through the hash function and then do a
direct table lookup as before.
The difference between a hash table and a plain lookup table, then, is that
(1) you have to apply the hash function to the key before using it (adding
some overhead cost), and then (2) multiple keys might be sent to the same
hash value by the hash function (so you need a way to deal with that).
Let's think about #2. There are two main strategies. Open hashing
called open addressing, says:
when the table entry you need for a new key/value pair is already occupied,
find another unused entry somehow and put it there. Closed hashing
each entry in the table is a secondary data structure (usually a linked list,
but there are other possibilities) containing the actual data, and this
data structure can be extended without limit. [Caution
: some people use
the term "open hashing" to mean what I've called "closed hashing" here!
The usage here is in accordance with that in TheArtOfComputerProgramming
, both of which are recommended references if you
want to know more about hash tables.]
With open hashing, a new problem can arise: perhaps there isn't an unused
entry anywhere in the table. The table can fill up. This isn't an issue with
closed hashing, but that has a different problem as the number of items stored
in the table increases: those secondary data structures can start to get large,
so that insertion and lookup times are no longer O(1). The usual answer is the same
in both cases: allow the table to grow. If you always double its size (or,
more generally, always increase its size by at least a fixed constant factor)
then the cost of doing this is O(1) amortized.
Depending on the strategy used for finding unused entries, an open-addressing
hash table may become slow as the table gets close to being full. Typically,
unsuccessful searches are particularly bad. So it is wise to grow the table
before it is completely full.
There are other sorts of DictionaryDataStructure
s: notably, the many varieties
of a BinaryTree
. They may be preferable if you need a guarantee on the worst-case
access time, or if you want operations such as "find all keys between a and b"
to be efficient. When hash tables are possible at all, though, their typical
performance is better than that of tree-based data structures.
A hash table is a data structure for storing key/value pairs. A dictionary data structure is often made using a hash table. Don't confuse the two. Usually, dictionaries are made using a hash table, but not always. Sometimes, a BinaryTree
or other DataStructures
may be used. The runtime-performance of a hash table is fast. Insert and remove operations are expected to be O(1). A hash table is almost the only data structure where adding and looking up takes order 1.
Also see: http://www.nist.gov/dads/HTML/hashtab.html