|

7 Useful Pandas Display Options You Need to Know

Pandas is a powerful Python library commonly used within data science. It allows you to load and manipulate datasets from a variety of sources and is often one of the first libraries you come across in your data science journey.

When working with pandas, the default options will be suitable for the majority of people. However, there may be occasions when you want to change the format of what is displayed. Pandas has numerous user configurable options that allow you to customise how things are displayed.

The following sections illustrate a few use cases for changing these default settings. A full list of available options can be found here.

Within this article we will be covering:

  • Controlling the Number of Rows to Display
  • Controlling the Number of Columns to Display
  • Suppressing Scientific Notation
  • Controlling the Floating Point Precision
  • Controlling the Decimal Format
  • Changing the Backend Plotting Library
  • Resetting the Display Options

1. Controlling the Number of Rows Displayed by Pandas Dataframe

Sometimes when viewing the dataframe you often would like to see more than the default number of rows — which is set to 10. This includes the first 5 rows and the last 5 rows of the dataframe.

This prevents pandas from slowing down your computer by displaying an overwhelming amount of data when dataframes are called.

To generate sample data for this example you can use the following code:

arr_data = np.random.default_rng().uniform(0, 100, size=(100,5))
pd.DataFrame(arr_data, columns=list('ABCDE'))
Basic dataframe generated by pandas using the default options. Image by the author.

There are two options that can be used to control how many rows are displayed.

The first is display.max_rows which controls the maximum number of rows that will be displayed before truncating the dataframe. If the number of rows in the dataframe exceeds this then the display will be truncated. By default this is set to 60.

If you want to display all of the rows when the dataframe is called, you need to change the display.max_rowsoption and set it to None. Be aware that if you have a very large dataframe this can slow down your computer.

pd.set_option('display.max_rows', None)

When you call upon our dataframe you can see every single row within it.

Pandas dataframe after changing the default option for max_rows. Image by the author.

In the case where the dataframe contains more rows than is set by max_rows then, you have to change the display.min_rows parameter to the value that you want to display. You will also need to make sure that the max_rows parameter is greater than min_rows.

pd.set_option('display.min_rows', 20)

If you set the min_rows to 20, when you call upon the dataframe to view you will now see 10 rows from the top and 10 rows from the bottom of the dataframe.

2. Controlling the Number of Columns to Display in a Dataframe

When working with datasets that contain a large number of columns pandas will truncate the dataframe display to show 20 columns. To know if your dataframe columns have been truncated look for the three dots (ellipsis) as seen in the image below between columns 9 and 15.

If you want to generate the above data you can use the following code:

arr_data = np.random.default_rng().uniform(0, 100, size=(100,25))
df = pd.DataFrame(arr_data)
df

To see more columns on the display you can change the display.max_columns parameter like so:

pd.set_option('display.max_columns', 30)

When you do this, up to 30 columns will be displayed. However, this can cause other issues such as when you want to display a graphic in a post like this and it becomes hard to read.

Pandas dataframe after changing max_columns display option. Image by the author.

3. Suppressing Scientific Notation

Often when working with scientific data you will come across very large numbers. Once these numbers are in the millions pandas will reformat them to scientific notation, which can be helpful, but not always.

To generate a dataframe with very large values, you can use the following code.

arr_data = np.random.default_rng().uniform(0, 10000000, size=(10,3))
df = pd.DataFrame(arr_data)
df
Dataframe displaying scientific notation for very large numbers. Image by the author.

Sometimes you will want to display these numbers in their full form without the scientific notation. This can be done by changing the float_format display option and passing in a small lambda function. This will reformat the display to have values without the scientific notation and up to 3 decimal places.

pd.set_option('display.float_format', lambda x: f'{x:.3f}')
Pandas dataframe after converting the numbers from scientific notation. Image by the author.

If you want to make it nicer on the eye, you can add a comma separator between the thousands.

The code below may look the same as above, but if you look closely following the f'{x: part of this code there is a comma.

pd.set_option('display.float_format', lambda x: f'{x:,.3f}')
Pandas Dataframe containing large numbers with a thousands separator. Image by the author.

4. Changing the Floating Point Precision within the Dataframe

There are occasions where the data you are working with may have too many values after the decimal point. This can sometimes make it messy to look at. By default, pandas will display 6 values after the decimal point.

Pandas dataframe showing the precision of floating point numbers to 6 decimal places. Image by the author.

To make it easier to read you can reduce the number of values displayed by calling upon display.precision option and pass in the number of values you want to display.

pd.set_option('display.precision', 2)

When you view the original dataframe you will now see that the floating point precision of our numeric columns has been reduced to 2.

Pandas dataframe after changing the floating point precision. Image by the author.

Bear in mind that this setting only changes how the data is displayed. It does not change the underlying data values.

5. Controlling the Float Format

There may be occasions when the numbers you are dealing with represent percentages or monetary values. If this is the case it can be handy to have them formatted with the correct unit.

To append a percentage sign to your columns you can call upon the display.float_format option and pass in the format of what you want to display using an f-string:

pd.set_option('display.float_format',  f'{:,.3f}%')
Pandas dataframe shows all values as percentages. Image by the author.

To start the floating point numbers with a dollar sign you can change the code like so:

pd.set_option('display.float_format',  f'${:,.2f}')
Pandas dataframe after changing the display options to include the $ sign at the start. Image by the author.

6. Changing the Default Pandas Plotting Library

When carrying out Exploratory Data Analysis you often have to generate quick plots of your data. You could build up a plot using matplotlib, however, you can do it with a few lines of code using pandas using the .plot() method.

Pandas providea us with a range of plotting libraries to use:

To change the default plotting library for your current instance you need to change the plotting.backend option for pandas.

pd.options.plotting.backend = "hvplot"

Once you have done that, you can then begin creating your plots with the .plot method

df.plot(kind='scatter', x='1', y='2')
Powerful interactive plot after changing pandas’ backend plotting option to hvplot. Image by the author.

7. Resetting Pandas Display Options

In the event that you want to set the parameters for a specific option back to the default value, you can call upon the reset_option method and pass in the option you want to reset.

pd.reset_option('display.max_rows')

Alternatively, if you have changed multiple options you can change them all back to their default values by providing the word all as a parameter.

pd.reset_option('all')

Changing Multiple Options at Once

Instead of changing the previous settings one by one, you can leverage the power of a dictionary and then loop through each of the options and set them.

Doing this can help save time and reduce the amount of code written and improve readability.

import pandas as pd
settings = {
'max_columns': 30,
'min_rows':40,
'max_rows': 30,
'precision': 3
}for option, value in settings.items():
pd.set_option("display.{}".format(option), value)

Summary

Pandas is a powerful library and can be used straight out of the box, however, the default options may not be suitable for your needs. This article has covered some of the popular options you may want to change to improve how you view your dataframes.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *