Big O Notation
Big O Notation is an important concept for Data Scientists to consider. Our job deals with large amounts of data on a daily basis. The larger the data, presumably the longer time it will take for our codes and programs to run. When we are mindful of Big O Notation, we can often find more optimal solutions to make our code more considerate of time.
What is Big O Notation?
If you could not guess from the paragraph above, here is the technical definition:
‘Big O notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity.’ — as famously defined by Wikipedia.
In less mathematical terms, this notation is utilized to characterize the time complexity of our code. It is important to always consider the worst case scenario when describing programs in terms of Big O because that will tell us the maximum amount of ‘time’ for the program to complete.
In my experience, the three most common types I have encountered are:
O(1)
This is referred to as constant time. The 1 signifies that there is only one step to completing the program. An example of this in code would be:
#retrieving a value from a dictionaryfavorite_colors = {'Jared': 'orange', 'Zach': 'blue', 'Jamie': 'pink', 'Brianne': 'purple'}
print(favorite_colors['Zach'])
It is important to note that dictionaries are such great structures in Python. The majority of operations with a dictionary have a time complexity of O(1).
O(n)
This is referred to as linear time. The n signifies the number of items to go through, the length of the structure.
#searching for a number in a listarray = [1, 2, 3]print(2 in array)
The program asks to go through the list and search for the number 2. It will have to go through and check each number’s value to see if it equals 2. Normally, a singular ‘for’ in a for loop signals to the coder that O(n) is present.
O(n²)
This is what is known as quadratic time. The best way to explain it is to show by example:
#nested for loops array1 = [1, 2, 3, 3]for i in array1:
for j in array1:
print(i == j)
When ever you see a loop inside a loop, get ready to declare your time complexity in the name of O(n²). This is where things really start adding up. Your program here has to go through the first loop with every iteration of the second loop. If you have a lot of data to get through, this can really take some time.
How do we improve our time complexity?
In the three examples posted above the rank of time complexity would be as follows: O(1) < O(n) < O(n²). Of course there are other complexities greater than those and even some in the middle. To show an example of how to go about improving time complexity, I will take a classic Python task from O(n²) bring it to O(n). Here is the task:
Write a function that finds the indices of the pairs in a list whose sum is equal to the target.
ie. [2, 7, 11, 15], 9 → [0,1]
#O(N^2)def two_sum(nums, target):
pair_index = set()
for i in range(len(nums)):
for j in range(len(nums)):
if nums[i] + nums[j] == target:
pair_index.add(i)
pair_index.add(j)
return list(pair_index)#O(N)def two_sum(nums, target):
pair_index = {} for i in range(len(nums)):
if target - nums[i] in pair_index:
return[pair_index[target - nums[i]], i]
pair_index[nums[i]] = i
By utilizing a dictionary instead of a double for loop, we can decrease the time complexity and make our code much more efficient. We should always solve the problem at hand first and then we can take the time to look for a more efficient solution with respect to time complexity.
References:
Wikipedia — https://en.wikipedia.org/wiki/Time_complexity
Understanding Time Complexity — https://www.geeksforgeeks.org/understanding-time-complexity-simple-examples/
TwoSum Problem — https://leetcode.com/problems/two-sum/