7 Useful Pandas Display Options You Need to Know
Pandas is a powerful Python library commonly used within data science. It allows you to load and manipulate datasets from a variety of sources and is often one of the first libraries you come across in your data science journey.
When working with pandas, the default options will be suitable for the majority of people. However, there may be occasions when you want to change the format of what is displayed. Pandas has numerous user configurable options that allow you to customise how things are displayed.
The following sections illustrate a few use cases for changing these default settings. A full list of available options can be found here.
Within this article we will be covering:
- Controlling the Number of Rows to Display
- Controlling the Number of Columns to Display
- Suppressing Scientific Notation
- Controlling the Floating Point Precision
- Controlling the Decimal Format
- Changing the Backend Plotting Library
- Resetting the Display Options
1. Controlling the Number of Rows Displayed by Pandas Dataframe
Sometimes when viewing the dataframe you often would like to see more than the default number of rows — which is set to 10. This includes the first 5 rows and the last 5 rows of the dataframe.
This prevents pandas from slowing down your computer by displaying an overwhelming amount of data when dataframes are called.
To generate sample data for this example you can use the following code:
arr_data = np.random.default_rng().uniform(0, 100, size=(100,5))
pd.DataFrame(arr_data, columns=list('ABCDE'))
There are two options that can be used to control how many rows are displayed.
The first is display.max_rows
which controls the maximum number of rows that will be displayed before truncating the dataframe. If the number of rows in the dataframe exceeds this then the display will be truncated. By default this is set to 60.
If you want to display all of the rows when the dataframe is called, you need to change the display.max_rows
option and set it to None
. Be aware that if you have a very large dataframe this can slow down your computer.
pd.set_option('display.max_rows', None)
When you call upon our dataframe you can see every single row within it.
In the case where the dataframe contains more rows than is set by max_rows
then, you have to change the display.min_rows
parameter to the value that you want to display. You will also need to make sure that the max_rows
parameter is greater than min_rows
.
pd.set_option('display.min_rows', 20)
If you set the min_rows
to 20
, when you call upon the dataframe to view you will now see 10 rows from the top and 10 rows from the bottom of the dataframe.
2. Controlling the Number of Columns to Display in a Dataframe
When working with datasets that contain a large number of columns pandas will truncate the dataframe display to show 20 columns. To know if your dataframe columns have been truncated look for the three dots (ellipsis) as seen in the image below between columns 9 and 15.
If you want to generate the above data you can use the following code:
arr_data = np.random.default_rng().uniform(0, 100, size=(100,25))
df = pd.DataFrame(arr_data)
df
To see more columns on the display you can change the display.max_columns
parameter like so:
pd.set_option('display.max_columns', 30)
When you do this, up to 30 columns will be displayed. However, this can cause other issues such as when you want to display a graphic in a post like this and it becomes hard to read.
3. Suppressing Scientific Notation
Often when working with scientific data you will come across very large numbers. Once these numbers are in the millions pandas will reformat them to scientific notation, which can be helpful, but not always.
To generate a dataframe with very large values, you can use the following code.
arr_data = np.random.default_rng().uniform(0, 10000000, size=(10,3))
df = pd.DataFrame(arr_data)
df
Sometimes you will want to display these numbers in their full form without the scientific notation. This can be done by changing the float_format
display option and passing in a small lambda function. This will reformat the display to have values without the scientific notation and up to 3 decimal places.
pd.set_option('display.float_format', lambda x: f'{x:.3f}')
If you want to make it nicer on the eye, you can add a comma separator between the thousands.
The code below may look the same as above, but if you look closely following the f'{x:
part of this code there is a comma.
pd.set_option('display.float_format', lambda x: f'{x:,.3f}')
4. Changing the Floating Point Precision within the Dataframe
There are occasions where the data you are working with may have too many values after the decimal point. This can sometimes make it messy to look at. By default, pandas will display 6 values after the decimal point.
To make it easier to read you can reduce the number of values displayed by calling upon display.precision
option and pass in the number of values you want to display.
pd.set_option('display.precision', 2)
When you view the original dataframe you will now see that the floating point precision of our numeric columns has been reduced to 2.
Bear in mind that this setting only changes how the data is displayed. It does not change the underlying data values.
5. Controlling the Float Format
There may be occasions when the numbers you are dealing with represent percentages or monetary values. If this is the case it can be handy to have them formatted with the correct unit.
To append a percentage sign to your columns you can call upon the display.float_format
option and pass in the format of what you want to display using an f-string:
pd.set_option('display.float_format', f'{:,.3f}%')
To start the floating point numbers with a dollar sign you can change the code like so:
pd.set_option('display.float_format', f'${:,.2f}')
6. Changing the Default Pandas Plotting Library
When carrying out Exploratory Data Analysis you often have to generate quick plots of your data. You could build up a plot using matplotlib, however, you can do it with a few lines of code using pandas using the .plot()
method.
Pandas providea us with a range of plotting libraries to use:
To change the default plotting library for your current instance you need to change the plotting.backend
option for pandas.
pd.options.plotting.backend = "hvplot"
Once you have done that, you can then begin creating your plots with the .plot
method
df.plot(kind='scatter', x='1', y='2')
7. Resetting Pandas Display Options
In the event that you want to set the parameters for a specific option back to the default value, you can call upon the reset_option
method and pass in the option you want to reset.
pd.reset_option('display.max_rows')
Alternatively, if you have changed multiple options you can change them all back to their default values by providing the word all
as a parameter.
pd.reset_option('all')
Changing Multiple Options at Once
Instead of changing the previous settings one by one, you can leverage the power of a dictionary and then loop through each of the options and set them.
Doing this can help save time and reduce the amount of code written and improve readability.
import pandas as pd
settings = {
'max_columns': 30,
'min_rows':40,
'max_rows': 30,
'precision': 3
}for option, value in settings.items():
pd.set_option("display.{}".format(option), value)
Summary
Pandas is a powerful library and can be used straight out of the box, however, the default options may not be suitable for your needs. This article has covered some of the popular options you may want to change to improve how you view your dataframes.