|

Enhance Your Plotly Express Scatter Plot With Marginal Plots

Scatter plots are a commonly used data visualisation tool within data science. They allow us to plot two numerical variables, as points, on a two dimensional graph. From these plots, we can understand if there is a relationship between the two variables, and what the strength of that relationship is.

Within this short tutorial, we are going to use the excellent Plotly library to visualise a data set, and we are going to see how to add marginal plots to the edges of the y, and x-axis to enhance our visualisation and understanding of the data.

I have covered creating scatter plots in plotly and matplotlib, which you can find below:

Part of this tutorial is covered in my Plotly Scatter Plots video:

The Plotly Library

Plotly is a web-based toolkit that is used to generate powerful and interactive data visualisations. It is very efficient and plots can be generated with very few lines of code. It is a popular library that contains a wide range of charts, including statistical, financial, maps, machine learning, and much more.

The Plotly library can be used in two main ways:

  • Plotly Graph Objects, which is a low-level interface for creating figures, traces, and layouts
  • Plotly Express, which is a high level wrapper around Plotly Graph Objects. Plotly Express allows users to type much simpler syntax to generate the same plot.

Creating a Scatter Plot With Plotly Express

Loading Libraries & Data

The first step is to load in pandas, which will be used to for loading our data, and plotly.express for viewing the data.

import pandas as pd
import plotly.express as px

Once the libraries have been imported, we can import our data.

The dataset we will be using for this article comes from a Machine Learning competition for lithology prediction that was run by Xeek and FORCE (https://xeek.ai/challenges/force-well-logs/overview). The objective of the competition was to predict lithology from a dataset consisting 98 training wells each with varying degrees of log completeness. The objective was to predict lithofacies based on the log measurements. To download the file, navigate to the Data section of the link above. The original data source can be downloaded at: https://github.com/bolgebrygg/Force-2020-Machine-Learning-competition

df = pd.read_csv('xeek_subset_example.csv')

We can then call upon df to view the first five and last five rows of the dataframe.

Dataframe containing well log data.
Dataframe containing well log data.

What we get back is the above dataframe. Our dataset contains two well log measurements (RHOB- Bulk Density and NPHI- Neutron Porosity), a Depth curve and a geologically interpreted lithology.

Creating the Scatter Plot

Creating Scatter Plots with Plotly Express is a very simple, we specify the dataframe and the columns we want to plot.

px.scatter(data_frame=df, x='NPHI', y='RHOB', range_x=[0, 1],range_y=[3, 1], color='LITH')

This returns the following scatter plot. At the moment it looks a little messy as many of the lithologies have overlapping values. This occurs as the interpreted lithology would have been created based on a number of different logging measurements and cuttings descriptions.

Simple plotly express scatter plot of well log data.
Simple plotly express scatter plot of well log data.

Individual LITH groups can be hidden by clicking on the name of the LITH in the legend.

Simple plotly express scatter plot of well log data after filtering.
Simple plotly express scatter plot of well log data after filtering.

Adding Marginal Plots to a Plotly Express Scatter Plot

Marginal plots are mini plots that can be attached to the margins of the y and x axes. There are four different types of marginal plots available within Plotly Express.

Box Plots

A boxplot is a graphical and standardised way to display the distribution of data based on five key numbers: The “minimum”, 1st Quartile (25th percentile), median (2nd Quartile./ 50th Percentile), the 3rd Quartile (75th percentile), and the “maximum”. The minimum and maximum values are defined as Q1–1.5 * IQR and Q3 + 1.5 * IQR respectively. Any points that fall outside of these limits are referred to as outliers.

Graphical depiction of a boxplot highlighting key components, including the median, quartiles, outliers, and Interquartile Range.
Graphical depiction of a boxplot highlighting key components, including the median, quartiles, outliers, and Interquartile Range. 

Marginal boxplots can be added to a single axes like so

px.scatter(data_frame=df, x='NPHI', y='RHOB', range_x=[0, 1],range_y=[3, 1], color='LITH', 
marginal_y='box')
Plotly Express Scatter Plot with a series of boxplots on the y-axis.
Plotly Express Scatter Plot with a series of boxplots on the y-axis. 

Or to both axes by specifying values for marginal_y and marginal_x keyword arguments.

px.scatter(data_frame=df, x='NPHI', y='RHOB', range_x=[0, 1],range_y=[3, 1], color='LITH', 
marginal_y='box', marginal_x='box')
Plotly Express Scatter Plot with boxplots on the axes. 

Rug Plot

Rug plots are used to visualise the distribution of data and can be added as follows:

px.scatter(data_frame=df, x='NPHI', y='RHOB', range_x=[0, 1],range_y=[3, 1], color='LITH', 
marginal_y='rug', marginal_x='rug')
Plotly Express Scatter Plot with rug plots on the axes.
Plotly Express Scatter Plot with rug plots on the axes. 

Histograms

Histograms are an excellent data visualisation tool and appear similar to bar charts. However, histograms allow us to gain insights about the distribution of the values within a set of data and allow us to display a large range of data in a concise plot. Within the petrophysics and geoscience domains, we can use histograms to identify outliers and also pick key interpretation parameters. For example, clay volume or shale volume end points from a gamma ray.

To change the marginal plots to histograms, we do so as follows:

px.scatter(data_frame=df, x='NPHI', y='RHOB', range_x=[0, 1],range_y=[3, 1], color='LITH', 
marginal_y='histogram', marginal_x='histogram')
Plotly Express Scatter Plot with histogram marginal plots on the axes.
Plotly Express Scatter Plot with histogram marginal plots on the axes.

Violin Plot

Violin plots are similar to boxplots, but they also combine the power of kernel density estimation plots. In addition to illustrating the key statistical points that a boxplot shows, it also allows us to gain an insight into the distribution of the data.

px.scatter(data_frame=df, x='NPHI', y='RHOB', range_x=[0, 1],range_y=[3, 1], color='LITH', 
marginal_y='violin', marginal_x='violin')
Plotly Express Scatter Plot with violin marginal plots on the axes.
Plotly Express Scatter Plot with violin marginal plots on the axes. 

Mixing Marginal Plots

You don’t have to have the same plot on both axes, you can use a histogram on the x-axis and a violin plot on the y-axis.

px.scatter(data_frame=df, x='NPHI', y='RHOB', range_x=[0, 1],range_y=[3, 1], color='LITH', 
marginal_y='violin', marginal_x='histogram')
Plotly Express Scatter Plot with mixed marginal plots on the axes. 

Summary

In this short tutorial, we have seen how to display a variety of marginal plots on a plotly express scatter plot using well log data. These plots can enhance our data visualisations and provide us with further information about the data distribution.

Similar Posts

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *