Creating Histograms of Well Log Data Using Matplotlib in Python

Creating Histograms of Well Log Data Using Matplotlib in Python

Visualising the distribution of data with histograms

png
Photo by Marcin Jozwiak on Unsplash

Introduction

Histograms are a commonly used tool within exploratory data analysis and data science. They are an excellent data visualisation tool and appear similar to bar charts. However, histograms allow us to gain insights about the distribution of the values within a set of data and allow us to display a large range of data in a concise plot. Within the petrophysics and geoscience domains, we can use histograms to identify outliers and also pick key interpretation parameters. For example, clay volume or shale volume end points from a gamma ray.

To create a histogram:

  • We first take a logging curve and determine the range of values that are contained within it. For example, me may have a gamma ray log, from one of our wells, that ranges from 5 to 145 API.
  • We then divide the entire range into different bins or intervals. Using our example gamma ray log, we could have bins ranging from 0 to 10, 11 to 20, 21 to 30 all the way up to 141 to 150.
  • Once the bins have been created, we then take the data and assign each value into the appropriate bin.
  • And what we end up with is an interval vs frequency graph like the one below
png
Photo by Marcin Jozwiak on Unsplash

In this short tutorial we will see how we can quickly create a histogram in python using matplotlib. We will also see how we can customise the plot to include additional information, such as percentile values and the mean.

The associated Python Notebook can be found here.

The accompanying video for this tutorial can be found on my new YouTube channel at:

png
Photo by Marcin Jozwiak on Unsplash

Importing Libraries and Loading LAS Data

The first stage of any python project or notebook is generally to import the required libraries. In this case, we are going to be using lasio to load our las file, pandas for storing our well log data, and matplotlib for visualising our data.

import pandas as pd
import matplotlib.pyplot as plt
import lasio

The data we are using for this short tutorial comes from the publicly released Equinor Volve dataset. Details of which can be found here. This dataset was released by Equinor, formerly Statoil, as a way to promote research, development and learning. In this exercise, we will be using one of the wells from this dataset.

To read the data we will use the lasio library which we explored in the previous notebook and video.

las = lasio.read("Data/15-9-19_SR_COMP.LAS")

We then convert the las file to a pandas dataframe object.

df = las.df()

Using the .describe() method we can explore the summary statistics of the data.

df.describe()
png
Photo by Marcin Jozwiak on Unsplash

We can see that we have seven logging curves within this file.

  • AC for acoustic compressional slowness
  • CALI for borehole caliper
  • DEN for bulk density
  • GR for gamma ray
  • NEU for neutron porosity
  • RDEP for deep resisitivity
  • RMED for medium resistivity

Creating Histograms Using pandas

We can create a quick histogram using pandas without relying on importing other libraries.

df['GR'].plot(kind='hist')
plt.show()
png
Photo by Marcin Jozwiak on Unsplash

Creating Histograms Using matplotlib

We can also create the same histogram using matplotlib like so.

plt.hist(df['GR'])
plt.show()
png
Photo by Marcin Jozwiak on Unsplash

This generates a very minimal plot. We can see that the values range from around 0 to 150, with a very small piece of data at 250 API. Each bin is around 25 API wide, which is quite a large range.

We can control this by specifying a set number for the bins argument, in this example we will set it to 30.

plt.hist(df['GR'], bins=30)
plt.show()
png
Photo by Marcin Jozwiak on Unsplash

Let’s tidy the plot up a little by adding edge colours to the bins.

plt.hist(df['GR'], bins=30, edgecolor='black')
plt.show()
png
Photo by Marcin Jozwiak on Unsplash

When we do this, we can see that the bins just below 100 API, is in fact two separate bins.

To tidy the plot up further, we can assign both an x and y label, and also set the x-axis limits.

plt.hist(df['GR'], bins=30, color='red', alpha=0.5, edgecolor='black')
plt.xlabel('Gamma Ray - API', fontsize=14)
plt.ylabel('Frequency', fontsize=14)
plt.xlim(0,200)

plt.show()
png
Photo by Marcin Jozwiak on Unsplash

In addition to the bars, we can also add in a kernel density estimation, which provides us with a line illustrating the distribution of the data.

df['GR'].plot(kind='hist', bins=30, color='red', alpha=0.5, density=True, edgecolor='black')
df['GR'].plot(kind='kde', color='black')
plt.xlabel('Gamma Ray - API', fontsize=14)
plt.ylabel('Density', fontsize=14)
plt.xlim(0,200)
plt.show()
png
Photo by Marcin Jozwiak on Unsplash

Adding Extra Information to the Plot

When calculating clay and shale volumes as part of a petrophysical workflow, we often use the percentiles as our interpretation parameters. This reduces the influence of outliers or a small amount of data points that may represent a thin hot shale for example.

We can calculated the key statsitics using built in pandas functions: mean() and quantile().

mean = df['GR'].mean()
p5 = df['GR'].quantile(0.05)
p95 = df['GR'].quantile(0.95)

print(f'Mean: \t {mean}')
print(f'P05: \t {p5}')
print(f'P95: \t {p95}')

This returns the following output.

Mean: 	 71.98679770957146
P05: 12.74656
P95: 128.33267999999995

To get a better idea of where these points fall in relation to our data, we can add them onto the plot using axvline and passing in the calculated variables, a colour and a label.

df['GR'].plot(kind='hist', bins=30, color='red', alpha=0.5, edgecolor='black')
plt.xlabel('Gamma Ray', fontsize=14)
plt.ylabel('Frequency', fontsize=14)
plt.xlim(0,200)

plt.axvline(mean, color='blue', label='mean')
plt.axvline(p5, color='green', label='5th Percentile')
plt.axvline(p95, color='purple', label='95th Percentile')

plt.legend()
plt.show()
png
Photo by Marcin Jozwiak on Unsplash

Summary

In this short tutorial, we have covered the basics of how to display a well log curve as a histogram and customise to provide a plot that is suitable for including in reports and publications.

Thanks for reading!

If you have found this article useful, please feel free to check out my other articles looking at various aspects of Python and well log data. You can also find my code used in this article and others at GitHub.

If you want to get in touch you can find me on LinkedIn or at my website.

Interested in learning more about python and well log data or petrophysics? Follow me on Medium.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *