Pivot Tables

Zachary Greenberg
4 min readOct 6, 2021
Image Source

Pivot tables are useful tools in both Pandas and Excel. For todays blog, I am going to be talking about them with Pandas in mind. When working with a DataFrame (or Sheet if you are in Excel), at times it can be difficult to see the information based on how the columns are formed. The beauty of a pivot table is that it literally ‘pivots’ your data so that you can see things from a different point of view. Honestly, I have never had to use a pivot table until recently. Now that I have, I can 100% see its merit. In my typical blog fashion, let me try to find a formal definition to clarify what a pivot table is:

‘Pivot tables are a technique in data processing. They arrange and rearrange (or “pivot”) statistics in order to draw attention to useful information. This leads to finding figures and facts quickly making them integral to data analysis. This ultimately leads to helping businesses or individuals make educated decisions.’ — as defined by Wikipedia.

Like Wikipedia says, pivot tables can be a great way to draw attention to useful information. From my experience I have seen that it can help you find information quickly, and can definitely help businesses find what they are searching for. The best way to talk about them, honestly, is by showing a practical example.

Let’s say that a head chef/restaurant owner wants to examine the amount of time it takes for customers to receive their food. When you’re working in a kitchen, you typically receive a ticket for the order and then you call over the waiter when it is ready. So in order to figure out this time difference you’d subtract the time the waiter picks up the food to the time the waiter delivers the order ticket to the kitchen. This is all dandy, but lets say your dataset looks like this:

We have information on the orders, whether the order has been placed or completed (aka in or out), and when the action was done. Because of the status column it can be a little confusing as to how to deal with this. With the status column not being in a nice easy to work with pattern, we cannot simply perform a difference from all the times in the row. Pivot tables in this case can be a savior. They can help you see this data in a better way. With a pivot table, you can specify the rows, columns, and even the values you’d like, making it so easy to see your data. I would do something like this:

#restaurant is the name of the DataFrame
pd.pivot_table(restaurant, values='order_num', index = restaurant['time'].dt.time, columns = 'status')
#the index is formatted this way so that we can extract the time.

This pivot table now looks like this:

So, now we can see the data from a whole different point of view. Notice the time, now in the index, has not changed (except for the fact that it is now only including the time because the date was irrelevant). The status is now the columns, and the order numbers naturally have taken the values. Personally I think this is a much easier way to see the data because you have a more clear distinction of the two statuses.

I think learning how to take the next step in this process is a whole different ball game. I will stop here for now, because my point of all of this was to see the beauty and functionality of a pivot table. I will however, continue this problem in a blog for another time.

To sum up, pivot tables are very useful tools to know whether you are working in Pandas or Excel. They can help you to rearrange your data in a way that makes things clearer and most of the time easier to work with.

References:

Pivot table definition — https://en.wikipedia.org/wiki/Pivot_table

Pandas pivot documentation — https://pandas.pydata.org/docs/reference/api/pandas.pivot_table.html

--

--