Hash Table#
A hash table (or hash map) is a data structure that provides access to values through an associated keys. Python’s dictionary class uses a hash table behind the scenes.
Recommended Video (4 minutes)#
Concept#
A hash table used hash function to map keys to indices in a fixed-size array. The data is stored in the corresponding elements of the array.
The hash table contains:
a single fixed-size array of length m
Note
In practice this array is often re-allocated to a larger array as the input data grows. Python’s dictionary class for example will increase the size of this array by powers of two when more space is required.
When looking up, inserting, or deleting a value:
A hash is calculated from the provided
keyThis hash is translated to an index. A common method is a modulo of the array size e.g.
index = hash(k) % m
The value is retrieved, stored or removed from the array using
index
In the image below, names are mapped to phone numbers in a hash table.
Collision Resolution#
Most hash functions do not guarantee that different inputs will generate different values. Furthermore, a hash table has a fixed-size array to store values in. This means that there is a reasonable chance that there are collisions, which are situations where different keys are mapped to the same array index.
To resolve collisions there are two main methods:
Separate chaining
Open addressing
Separate Chaining
Instead of storing a single value as the array elements, we store a linked list containing multiple elements. These elements must contain both the original key and the value.
When performing a lookup for example we:
calculate the array index
search through the linked list at that position until an element with the matching key is found
Open Addressing
In open addressing, the array elements contain single key-value pairs. The collisions are resolved as follows
Insertion: the key-value pair is inserted at the next available position in the array
Retrieval and Deletion: the index is calculated then the array is searched from this point onwards until a matching key is found.
Note
In both separate chaining and open addressing we are storing more than just a value. Therefore the array positions are often called “buckets”.
Time Complexity#
By mapping through a hash function, all operations are \(O(1)\) complexity assuming that the hash table is not overloaded i.e. the linked-lists in separate chaining are short.
Space Complexity#
The hash table only requires \(n\) elements in its array therefore the space complexity is \(O(n)\).
Code Challenge: Hash Table
Using the provided PairNode class, complete the HashTable class. This class will only support insertion and retrieval.
Note
This exercise does not require collision resolution. Any collisions are resolved by overwriting the existing Node.
HashTable Specifications
Attributes
n_buckets, the size of the listbuckets, the fixed-size list, pre-populated withNone
Methods
insert(self, key, value), to add a new key-value pair into the hash tableretrieve(self, key), return the associated value for akey, returnNoneif the key hasn’t been inserted.
Both insert and retrieve use modulo indexing into the array based on the object’s hash value.
Instructions
Note
Look for the TODO items in the provided code!
Check the new
PairNodeclass, it haskeyandvalueattributes.Check the provided
insertmethod to understand how it worksComplete the
retrievemethod by:Calculating the bucket index
Retrieving the node at that index
Returning the node’s value
Usage Example
>>> table = HashTable()
>>> table.insert("Steve Jobs", "Apple")
>>> table.insert("Bill Gates", "Microsoft")
>>> table.retrieve("Steve Jobs")
'Apple'
Here is some code for you to start with:
class PairNode:
def __init__(self, key, value):
self.key = key
self.value = value
def __str__(self):
return f"{repr(self.key)}: {repr(self.value)}"
def __repr__(self):
return str(self)
def __eq__(self, o):
return self.key == o.key and self.value == o.value
class HashTable:
def __init__(self):
self.n_buckets = 2**3
self.buckets = [None]*self.n_buckets
def insert(self, key, value):
bucket_index = hash(key) % self.n_buckets
self.buckets[bucket_index] = PairNode(key, value)
def retrieve(self, key):
# TODO
table = HashTable()
table.insert("a", 1)
table.insert("b", 2)
table.insert("c", 3)
table.insert("d", 4)
print(table.retrieve("a"))
Solution
Solution is locked
Extension
Adjust the
n_bucketsto try and cause collisions.Add a
remove(self, key)method to yourHashTableto remove items given a key.
Code Challenge: Hash Table with Collision Resolution
Let’s improve our HashTable by resolving collisions with separate chaining.
Note that the buckets attribute has been initialised to contain empty sub-lists.
HashTable Specifications
Attributes
n_buckets, the size of the listbuckets, the fixed-size list, pre-populated with[]at each element
Methods
insert(self, key, value), to add a new key-value pair into the hash tableretrieve(self, key), return the associated value for akey, returnNoneif the key hasn’t been inserted.
Both insert and retrieve use modulo indexing into the array based on the object’s hash value.
Instructions
Note
Look for the TODO items in the provided code!
Complete the
insertmethod by:Calculating the bucket index
Checking if a node in the bucket list has the same key and updating its value
Complete the
retrievemethod by:Calculating the bucket index
Searching through the bucket list to see if a node contains the requested
key
Usage Example
>>> table = HashTable()
>>> table.insert("Steve Jobs", "Apple")
>>> table.insert("Bill Gates", "Microsoft")
>>> table.retrieve("Steve Jobs")
'Apple'
Here is some code for you to start with:
class PairNode:
def __init__(self, key, value):
self.key = key
self.value = value
def __str__(self):
return f"{repr(self.key)}: {repr(self.value)}"
def __repr__(self):
return str(self)
def __eq__(self, o):
return self.key == o.key and self.value == o.value
class HashTable:
def __init__(self):
self.n_buckets = 2**3
# Buckets contain an empty list
self.buckets = []
for i in range(self.n_buckets):
self.buckets.append( [] )
def insert(self, key, value):
# TODO
def retrieve(self, key):
# TODO
table = HashTable()
table.insert("a", 1)
table.insert("b", 2)
table.insert("c", 3)
table.insert("d", 4)
print(table.retrieve("a"))
Solution
Solution is locked