Creating Histograms of Well Log Data Using Matplotlib in Python
Creating Histograms of Well Log Data Using Matplotlib in Python
Visualising the distribution of data with histograms
Introduction
Histograms are a commonly used tool within exploratory data analysis and data science. They are an excellent data visualisation tool and appear similar to bar charts. However, histograms allow us to gain insights about the distribution of the values within a set of data and allow us to display a large range of data in a concise plot. Within the petrophysics and geoscience domains, we can use histograms to identify outliers and also pick key interpretation parameters. For example, clay volume or shale volume end points from a gamma ray.
To create a histogram:
- We first take a logging curve and determine the range of values that are contained within it. For example, me may have a gamma ray log, from one of our wells, that ranges from 5 to 145 API.
- We then divide the entire range into different bins or intervals. Using our example gamma ray log, we could have bins ranging from 0 to 10, 11 to 20, 21 to 30 all the way up to 141 to 150.
- Once the bins have been created, we then take the data and assign each value into the appropriate bin.
- And what we end up with is an interval vs frequency graph like the one below
In this short tutorial we will see how we can quickly create a histogram in python using matplotlib. We will also see how we can customise the plot to include additional information, such as percentile values and the mean.
The associated Python Notebook can be found here.
The accompanying video for this tutorial can be found on my new YouTube channel at:
Importing Libraries and Loading LAS Data
The first stage of any python project or notebook is generally to import the required libraries. In this case, we are going to be using lasio
to load our las file, pandas
for storing our well log data, and matplotlib
for visualising our data.
import pandas as pd
import matplotlib.pyplot as plt
import lasio
The data we are using for this short tutorial comes from the publicly released Equinor Volve dataset. Details of which can be found here. This dataset was released by Equinor, formerly Statoil, as a way to promote research, development and learning. In this exercise, we will be using one of the wells from this dataset.
To read the data we will use the lasio library which we explored in the previous notebook and video.
las = lasio.read("Data/15-9-19_SR_COMP.LAS")
We then convert the las file to a pandas dataframe object.
df = las.df()
Using the .describe()
method we can explore the summary statistics of the data.
df.describe()
We can see that we have seven logging curves within this file.
- AC for acoustic compressional slowness
- CALI for borehole caliper
- DEN for bulk density
- GR for gamma ray
- NEU for neutron porosity
- RDEP for deep resisitivity
- RMED for medium resistivity
Creating Histograms Using pandas
We can create a quick histogram using pandas without relying on importing other libraries.
df['GR'].plot(kind='hist')
plt.show()
Creating Histograms Using matplotlib
We can also create the same histogram using matplotlib like so.
plt.hist(df['GR'])
plt.show()
This generates a very minimal plot. We can see that the values range from around 0 to 150, with a very small piece of data at 250 API. Each bin is around 25 API wide, which is quite a large range.
We can control this by specifying a set number for the bins argument, in this example we will set it to 30.
plt.hist(df['GR'], bins=30)
plt.show()
Let’s tidy the plot up a little by adding edge colours to the bins.
plt.hist(df['GR'], bins=30, edgecolor='black')
plt.show()
When we do this, we can see that the bins just below 100 API, is in fact two separate bins.
To tidy the plot up further, we can assign both an x and y label, and also set the x-axis limits.
plt.hist(df['GR'], bins=30, color='red', alpha=0.5, edgecolor='black')
plt.xlabel('Gamma Ray - API', fontsize=14)
plt.ylabel('Frequency', fontsize=14)
plt.xlim(0,200)
plt.show()
In addition to the bars, we can also add in a kernel density estimation, which provides us with a line illustrating the distribution of the data.
df['GR'].plot(kind='hist', bins=30, color='red', alpha=0.5, density=True, edgecolor='black')
df['GR'].plot(kind='kde', color='black')
plt.xlabel('Gamma Ray - API', fontsize=14)
plt.ylabel('Density', fontsize=14)
plt.xlim(0,200)
plt.show()
Adding Extra Information to the Plot
When calculating clay and shale volumes as part of a petrophysical workflow, we often use the percentiles as our interpretation parameters. This reduces the influence of outliers or a small amount of data points that may represent a thin hot shale for example.
We can calculated the key statsitics using built in pandas functions: mean()
and quantile()
.
mean = df['GR'].mean()
p5 = df['GR'].quantile(0.05)
p95 = df['GR'].quantile(0.95)
print(f'Mean: \t {mean}')
print(f'P05: \t {p5}')
print(f'P95: \t {p95}')
This returns the following output.
Mean: 71.98679770957146
P05: 12.74656
P95: 128.33267999999995
To get a better idea of where these points fall in relation to our data, we can add them onto the plot using axvline and passing in the calculated variables, a colour and a label.
df['GR'].plot(kind='hist', bins=30, color='red', alpha=0.5, edgecolor='black')
plt.xlabel('Gamma Ray', fontsize=14)
plt.ylabel('Frequency', fontsize=14)
plt.xlim(0,200)
plt.axvline(mean, color='blue', label='mean')
plt.axvline(p5, color='green', label='5th Percentile')
plt.axvline(p95, color='purple', label='95th Percentile')
plt.legend()
plt.show()
Summary
In this short tutorial, we have covered the basics of how to display a well log curve as a histogram and customise to provide a plot that is suitable for including in reports and publications.
Thanks for reading!
If you have found this article useful, please feel free to check out my other articles looking at various aspects of Python and well log data. You can also find my code used in this article and others at GitHub.
If you want to get in touch you can find me on LinkedIn or at my website.
Interested in learning more about python and well log data or petrophysics? Follow me on Medium.