When learning Python for Data Science you start with datatypes, and then from there you go on to learning about data structures. Data structures are fundamental in this object-oriented programming language because they help to organize your information. What if I told you there are more than just the basic built in datatypes you’re learning about? If you take a look at the collections module you will see there are other options.
The collections module allows us to access new kinds of “container datatypes” that help to facilitate the storage of information. To name a few, the counter, namedtuple, and deque are among these variants. I am going to talk specifically about these three.
Intuitively, you can sort of figure out what the first two can do based on their names and similarities to existing Python datatypes. First we’ll talk about counter:
Counter does exactly what it says, it counts the values within a variable and creates a dictionary-like object storing the value names along with their counts. We’ve all seen the Python tasks where you are asked get a count of individual words within a text. After splitting the text string into a list, you would then have to — with multiple statements — iterate through the list and create a dictionary to store the counts. That is not the case with counter, the counting part can be done with a single line of code. Here is a simple example with a list of integers:
This can be easily converted to a dictionary, if ever desired, or left as is. Either way, you can easily put it into a Pandas Series. It is perfect for getting a count of numbers, words, or even mixed datatypes if you need. Not to mention it is a nice break for your fingers considering its much quicker to type.
Next we have the namedtuple. Real quick, a tuple is an immutable datatype, which is valuable in Data Science because it can store important static information — ie. using the RGB color model for image recognition. A namedtuple is simply a way to organize your tuple. It does this by assigning attributes to the main object variable, so you can call the specific values you want by their corresponding attributes. This sounds like a dictionary with key-value pairs, and it is, however, the data is now securely immutable because it is a namedtuple. This is how you create one and call an attribute:
In a regular tuple, you would be clueless with the context of what the value is referring to, not with a namedtuple. They are made to provide such clarifications. One more thing on this, we as Data Scientists often create Pandas DataFrames manually by with the use of dictionaries. With the .from_records() method we can put a namedtuple directly into a DataFrame like so:
Finally onto the deque. At first, you would think the name doesn’t really help with figuring out what it does, but upon further research, I have learned it is also known as a “double ended queue” — or list — and that is probably an easier way to understand it. So, in Python sometimes you would like to manipulate both ends of a list. Let’s say , for example, you had to add a value to the end of the list and that you also forgot to incorporate some of the values in the beginning. It is easy to .append() and .extend() on the far right of the list, but you can’t do this on the left. With a deque you now have that option. Here is an example:
So, using a deque, it was just as easy adding a number onto right of the list as well as the left. It is great for making quick adjustments wherever you need it.
There are many other interesting things that the collections module has to offer. They can make your Python coding journey easier, clearer, and more efficient. I highly suggest checking out the documentation on the module. It has helped me for sure.
Collections — https://docs.python.org/3/library/collections.html