Hash Table#

A hash table (or hash map) is a data structure that provides access to values through an associated keys. Python’s dictionary class uses a hash table behind the scenes.

Concept#

A hash table used hash function to map keys to indices in a fixed-size array. The data is stored in the corresponding elements of the array.

The hash table contains:

  • a single fixed-size array of length m

Note

In practice this array is often re-allocated to a larger array as the input data grows. Python’s dictionary class for example will increase the size of this array by powers of two when more space is required.

When looking up, inserting, or deleting a value:

  1. A hash is calculated from the provided key

  2. This hash is translated to an index. A common method is a modulo of the array size e.g.

index = hash(k) % m
  1. The value is retrieved, stored or removed from the array using index

In the image below, names are mapped to phone numbers in a hash table.

../../_images/hash-table.png

Collision Resolution#

Most hash functions do not guarantee that different inputs will generate different values. Furthermore, a hash table has a fixed-size array to store values in. This means that there is a reasonable chance that there are collisions, which are situations where different keys are mapped to the same array index.

To resolve collisions there are two main methods:

  1. Separate chaining

  2. Open addressing

Separate Chaining

Instead of storing a single value as the array elements, we store a linked list containing multiple elements. These elements must contain both the original key and the value.

When performing a lookup for example we:

  1. calculate the array index

  2. search through the linked list at that position until an element with the matching key is found

Open Addressing

In open addressing, the array elements contain single key-value pairs. The collisions are resolved as follows

  1. Insertion: the key-value pair is inserted at the next available position in the array

  2. Retrieval and Deletion: the index is calculated then the array is searched from this point onwards until a matching key is found.

Note

In both separate chaining and open addressing we are storing more than just a value. Therefore the array positions are often called “buckets”.

Time Complexity#

By mapping through a hash function, all operations are \(O(1)\) complexity assuming that the hash table is not overloaded i.e. the linked-lists in separate chaining are short.

Space Complexity#

The hash table only requires \(n\) elements in its array therefore the space complexity is \(O(n)\).

Code Challenge: Hash Table

Using the provided PairNode class, complete the HashTable class. This class will only support insertion and retrieval.

Note

This exercise does not require collision resolution. Any collisions are resolved by overwriting the existing Node.

HashTable Specifications

Attributes

  • n_buckets, the size of the list

  • buckets, the fixed-size list, pre-populated with None

Methods

  • insert(self, key, value), to add a new key-value pair into the hash table

  • retrieve(self, key), return the associated value for a key, return None if the key hasn’t been inserted.

Both insert and retrieve use modulo indexing into the array based on the object’s hash value.

Instructions

Note

Look for the TODO items in the provided code!

  1. Check the new PairNode class, it has key and value attributes.

  2. Check the provided insert method to understand how it works

  3. Complete the retrieve method by:

    • Calculating the bucket index

    • Retrieving the node at that index

    • Returning the node’s value

Usage Example

>>> table = HashTable()
>>> table.insert("Steve Jobs", "Apple")
>>> table.insert("Bill Gates", "Microsoft")
>>> table.retrieve("Steve Jobs")
'Apple'

Here is some code for you to start with:

class PairNode:
    def __init__(self, key, value):
        self.key = key
        self.value = value

    def __str__(self):
        return f"{repr(self.key)}: {repr(self.value)}"

    def __repr__(self):
        return str(self)

    def __eq__(self, o):
        return self.key == o.key and self.value == o.value

class HashTable:

    def __init__(self):
        self.n_buckets = 2**3
        self.buckets = [None]*self.n_buckets

    def insert(self, key, value):
        bucket_index = hash(key) % self.n_buckets
        self.buckets[bucket_index] = PairNode(key, value)

    def retrieve(self, key):

        # TODO



table = HashTable()

table.insert("a", 1)
table.insert("b", 2)
table.insert("c", 3)
table.insert("d", 4)

print(table.retrieve("a"))
Solution

Solution is locked

Extension
  1. Adjust the n_buckets to try and cause collisions.

  2. Add a remove(self, key) method to your HashTable to remove items given a key.

Code Challenge: Hash Table with Collision Resolution

Let’s improve our HashTable by resolving collisions with separate chaining.

Note that the buckets attribute has been initialised to contain empty sub-lists.

HashTable Specifications

Attributes

  • n_buckets, the size of the list

  • buckets, the fixed-size list, pre-populated with [] at each element

Methods

  • insert(self, key, value), to add a new key-value pair into the hash table

  • retrieve(self, key), return the associated value for a key, return None if the key hasn’t been inserted.

Both insert and retrieve use modulo indexing into the array based on the object’s hash value.

Instructions

Note

Look for the TODO items in the provided code!

  1. Complete the insert method by:

    1. Calculating the bucket index

    2. Checking if a node in the bucket list has the same key and updating its value

  2. Complete the retrieve method by:

    1. Calculating the bucket index

    2. Searching through the bucket list to see if a node contains the requested key

Usage Example

>>> table = HashTable()
>>> table.insert("Steve Jobs", "Apple")
>>> table.insert("Bill Gates", "Microsoft")
>>> table.retrieve("Steve Jobs")
'Apple'

Here is some code for you to start with:

class PairNode:
    def __init__(self, key, value):
        self.key = key
        self.value = value

    def __str__(self):
        return f"{repr(self.key)}: {repr(self.value)}"

    def __repr__(self):
        return str(self)

    def __eq__(self, o):
        return self.key == o.key and self.value == o.value

class HashTable:

    def __init__(self):
        self.n_buckets = 2**3
        # Buckets contain an empty list
        self.buckets = []
        for i in range(self.n_buckets):
            self.buckets.append( [] )

    def insert(self, key, value):

        # TODO

    def retrieve(self, key):

        # TODO

table = HashTable()

table.insert("a", 1)
table.insert("b", 2)
table.insert("c", 3)
table.insert("d", 4)

print(table.retrieve("a"))
Solution

Solution is locked