Default Dictionaries

Zachary Greenberg
4 min readOct 13, 2021
Image Source

For my very first post on Medium, I wrote a blog called Collections of Knowledge. It was on the module Collections. This module gives us access to different special kinds of datatypes. It is interesting that when I wrote it, I did not cover a default dictionary. Now that I know more about them, I will say they are incredible and can be extremely useful.

Defaultdict is a container like dictionaries present in the module collections. Defaultdict is a sub-class of the dictionary class that returns a dictionary-like object. The functionality of both dictionaries and defaultdict are almost same except for the fact that defaultdict never raises a KeyError. It provides a default value for the key that does not exists.’ — as defined by Geeksforgeeks

So there you have it. A defaultdict is a special dictionary like container. They allow us to pair items together with keys and values, just like a regular dictionary. What makes them unique it the ability to not raise a KeyError. When you search for an item in the dictionary, you access it with the keys to obtain its value. With a regular dictionary, your code will break if what you are looking for is not in the keys. A default dictionary will assign a value of your choosing and store that key in the dictionary.

So, last week I was explaining about using pivot tables. I gave the example problem of a chef wanting to figure out the elapsed time between the order being placed and delivered. And I came up with the following pivot table:

I am now going to finish off this problem using a default dictionary. It will prove to be very useful when we start to deal with order # 3. I will paste the code here and then explain it:

from collections import defaultdict
import numpy as np
#I am turning NaN values to -1 to make them easier to work with
times.replace({np.nan:-1.0}, inplace = True)
#turn these two columns into a list
order_in = times['in'].values
order_out = times['out'].values
#this will be our list of times
elapsed_time = []
#the default value we are setting here is -1
orders = defaultdict(lambda:-1)
for index, number in enumerate(order_in):
#-1.0 is our NaN value, so if not add it to the dictionary
if number != -1.0:
orders[number] = index
#if the order number is in the dictionary and its not -1
if (order_out[index] in orders) and (orders[order_out[index]] != -1):
#perform the time difference
elapsed_time.append(times.index[index] - times.index[orders[order_out[index]]])

#change the value to the default
orders[order_out[index]] = -1
#otherwise
else:
elapsed_time.append(-1)
print(elapsed_time)

First, I turned the NaN values to -1 to be easier to deal with. For this list, if the time difference does not apply then there will be a value of -1, which later on you can replace with NaN if you like.

Second, create the defaultdict and set your default to (lambda: -1).

Third enumerate through the orders_in list (essentially the ‘in’ column of the pivoted DataFrame). If the value is NOT -1 (or not NaN), add the value’s index to the dictionary. Keep in mind these values are actually the order numbers! Next, if the orders_out value is in the dictionary, you would perform a time difference between those two and add that to the list. *IT IS CRUCIAL AFTER THIS TO SET THAT VALUE TO -1 TO AVOID INACCURACY* Finally, if the value is -1, or the orders_out value is not in the dictionary add -1 to the list. The final output of above looks like this:

Now you have a list that is the same length as the DataFrame. You can even tell the for which order the differences are because they will match up with the value from the out column.

So, you see default dictionaries can prove to be a very useful tool when manipulating or transforming data. It makes it so simple to look up a key and get a value. Using a default dict can actually help you reduce computational time because dictionaries prove to be highly efficient.

References:

Definition of defaultdict — geeksforgeeks.org/defaultdict-in-python/

Collections documentation — https://docs.python.org/3/library/collections.html#collections.defaultdict

--

--