photo of panda and cub playing

How to Rename Columns in Pandas — A Quick Guide

A short guide on multiple options for renaming columns in a pandas dataframe

Ensuring that dataframe columns are appropriately named is essential to understanding what data is contained within, especially when we pass our data on to others. In this short article, we will cover a number of ways to rename columns in a pandas dataframe.

But first, what is Pandas? Pandas is a powerful, fast, and commonly used python library for carrying out data analytics. The Pandas name itself stands for “Python Data Analysis Library”. According to Wikipedia, the name originates from the term “panel data”. It allows data to be loaded in from a number file formats (CSV, XLS, XLSX, Pickle, etc.) and stored within table-like structures. These tables (dataframes) can be manipulated, analyzed, and visualized using a variety of functions that are available within pandas.

If you are interested in learning about other popular Python libraries then you may be interested in this article.

Library Imports and Data Creation

The first steps involve importing the pandas library and creating some dummy data that we can use to illustrate the process of column renaming.

import pandas as pd

We will create some dummy data to illustrate the various techniques. We can do this by calling upon the .DataFrame() Here we will create three columns with the names A, B, and C.

df = pd.DataFrame({'A':[1,2,3,4,5], 
'B':[101,102,103,104,105],
'C':[42,42,42,42,42]})
Starting dataframe created using pd.DataFrame()

Renaming Columns in Pandas When Loading Data

An alternative method for creating the dataframe would be to load the data from an existing file, such as a csv or xlsx file. When we load the data, we can change the names of the columns using the names argument. When we do this, we need to make sure we drop the existing header row by using header=0.

df = pd.read_csv('data.csv', 
names=['ColA', 'ColB', 'ColC'],
header=0)
Pandas dataframe after renaming the columns during the loading of a csv file.

Renaming All Columns Using .rename()

The first method of renaming columns within a pandas dataframe we will look at is the .rename() function. Here we can pass in a dictionary to the columns keyword argument. The dictionary allows us to provide a mapping between the old column name and the new one that we want.

We will also set the inplace argument to True so that we are making the changes to the dataframe, df, directly instead of making a copy of it.

df.rename(columns= {'A':'Z', 'B':'Y', 'C':'X' }, inplace=True)

An alternative version of this is to specify the axis, however, it is less readable and may not be clear what this argument is doing compared to using the columns argument.

df.rename({'A':'Z', 'B':'Y', 'C':'X' }, inplace=True, axis=1)

When we call upon df, we now see that our columns have been renamed from A, B, and C to Z, Y, and X respectively.

Pandas dataframe after using df.rename() to rename the columns.

Renaming a Specific Column Using .rename()

If we want to rename specific columns, we can use the rename function again. Instead of providing a string for string mapping, we can use df.columns and select a column by providing a column index position in the square brackets. We then map this to a new column name string.

df.rename(columns={df.columns[0]:'New Name'}, inplace=True)
Pandas dataframe with renamed columns after using df.rename().

We can also specify a mapping between an existing column name and the new one.

df.rename(columns= {'X':'XX' }, inplace=True)
Pandas dataframe with a single renamed column after using df.rename().

Renaming Columns Using set_axis()

The next method is set_axis() which is used to set the axis (column: axis=1 or row: axis=0) of a dataframe.

We can use this method to rename the columns by first defining our list of names we want to replace the columns with and setting axis=1 or axis='columns'. Note that here the number of names needs to equal the total number of columns.

df.set_axis(['Column A', 'Column B', 'Column C'], 
axis='columns',
inplace=True)
Pandas Dataframe after renaming the columns using .set_axis()

Using .columns() to Assign a New List of Names

We can rename the columns directly by assigning a new list containing the names that we want to rename the columns to. This is achieved using the df.columns attribute of the dataframe.

This method requires the new list of names to be the same length as the number of columns in the dataframe. Therefore, if we only want to rename one or two columns this is probably not the best approach.

df.columns = ['1', '2', '3']
Dataframe after using a list to replace df.columns

Using columns.str.replace()

The final method we will look at is using str.replace(), which can be used to replace specific characters or entire column names.

In this example, we will replace column 1 with the letter Z.

df.columns = df.columns.str.replace('1', 'Z')
Dataframe after using df.columns and str.replace to rename columns.

Summary

There are multiple methods for renaming columns within a pandas dataframe including pd.read_csv.set_axis, df.rename and df.columns. This illustrates the great flexibility that is available within the pandas python library and makes it easy to ensure that columns within a dataframe are appropriately labeled.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *