Efficiency#
Previously we discussed data structures as just another way to store data in our programs. However the choice of data structure affects how efficiently our programs run and as a result, how much data they can handle. It is critical as programmers that we understand the time and memory requirements of our programs. A program may need to complete in an acceptable time to a user or you might be writing code for a microcontroller that has very limited memory.
This is increasingly relevant in the world of portable computing devices such as phones, tablets, laptops and even wearables such as watches. These devices have lower power processors, limited memory and often very small batteries.
As programmers, we need to understand the underlying implementation of a data structure so that we can make the right choice for a given programming problem and context.
Example#
Consider the problem of checking if elements exist in a list.
Naive (Slow)#
The naive solution is to operate on the original list and check using Python’s
in operator. This is relatively slow because the speed of in` is
proportional to the length of the list. The longer the list the longer it
takes to evaluate in.
The code below demonstrates this by:
Generating a list of
Nrandom integersStarting a timer
Repeating:
Generating a random integer
Checking if the integer is in the list
Ending a timer
Printing the time elapsed
import time, random
random.seed(0)
N = 10000 # Length of the list, increase N to increase time taken
A = [random.randint(0, 100000) for _ in range(N)]
start = time.time()
for i in range(1000): # Repeat 1000 times for reliable estimate
present = random.randint(0, 100000) in A # Check if random element is in the list
print("Time:", time.time() - start)
Output
Time: 0.12879061698913574
Smart (Fast)#
If we convert our list of numbers to a more appropriate data structure then the in operator can be much faster. In the example below we’ve used a set to hold the numbers, which means that the speed on in is a small constant. No matter how big N gets, the time to check if an element is present remains the same.
import time, random
random.seed(0)
N = 10000 # Length of the list, increase N to increase time taken
A = [random.randint(0, 100000) for _ in range(N)]
setA = set(A)
start = time.time()
for i in range(1000): # Repeat 1000 times for reliable estimate
present = random.randint(0, 100000) in setA # Check if random element is in the set
print("Time:", time.time() - start)
Output
Time: 0.0003714561462402344