smartphone laptop working industry
|

ChatGPT Advanced Data Analytics For Custom Matplotlib Well Log Plots

ChatGPT’s Code Interpreter, now renamed to Advanced Data Analytics, has been out for some time now. It was launched on July 6th 2023, and is a plugin developed by OpenAI to allow users to upload data and perform analysis on it. This can range from cleaning the data, creating visualisations and summarising the data.

Rather than relying on you to write Python code to analyse your data, you can leverage ChatGPT by telling it what to do in plain English. From that, it will carry out the analysis for you.

As many of my regular readers will know, I am a big fan of matplotlib. Even though the library appears to be clunky and time-consuming to use, it can be used to create stunning visualisations with a little bit of effort.

After playing around with this new tool, I thought it was about time to see how ChatGPT and the Advanced Data Analytics plugin could be used to create custom plots for working with well log data.

Before proceeding and due to the rising legal cases against OpenAI:

Always be cautious of the data you upload to ChatGPT as that data and your input could be used to train future models. If in doubt, avoid uploading any data and always follow your company’s policies.

Enabling Advanced Data Analytics in ChatGPT

To use the Advanced Data Analytics plugin within ChatGPT, you first have to enable it.

This can be done by going to Settings and then selecting Beta Features. In here, you will see the option to turn on Advanced Data Analysis, which will enable the plugin.

Enabling Advanced Data Analysis plugin in ChatGPT. Image by the author.

The plugin will now be available when you start a new chat.

Uploading and Converting Data to a Pandas Dataframe

To start, we need to upload our file. For this example, I am using a well log data set from the NLOG database (details at the end of the article). This data set contains a series of well log measurements obtained from an oil and gas exploration well.

To begin, we first click the plus icon on the chat input box and then select the file containing our data.

The chat input for the Advanced Data Analytics plugin includes a plus (+) icon for uploading data. Image by the author.

Next, we need to provide a prompt for ChatGPT. In this case, I am going to tell it to be a petrophysicist (a niche role within geoscience) and that it understands well log measurements. This can help fine-tune some of the responses that are returned from ChatGPT.

Initial prompt to ChatGPT with the well log data set. Image by the author.

After submitting the above input, ChatGPT will load the file and look at the contents.

Below is the response we get back from ChatGPT’s Advanced Data Analytics plugin, which provides information about each of the measurements in our dataset.

In this case, it is using pandas to read the CSV file into a dataframe and then output the header using the common df.head() command.

Initial analysis and summary of a well log dataset analysed by ChatGPT. Image by the author.

Most of the comments about the curves contained within the file are correct. However, there are a few inconsistencies, such as DT being Delta-Time. This is partially correct; however, this curve represents Acoustic Compressional Slowness, a measure of the compressional arrival from a sound wave emitted by a sonic tool and travelling through the formation.

In my experiences with ChatGPT, I find that it can be challenging to use within niche topics such as petrophysics and can come back with incorrect information. Something that everyone should be aware of when working with large language models (LLMs).

Always double-check the output and make sure it makes sense.

Clicking on the Show Work drop down box, we can examine the code that was used to load the data. We can see that it has done some basic Python coding to read our CSV file into a pandas dataframe.

import pandas as pd

# Load the data from the provided CSV file
well_data = pd.read_csv("/mnt/data/L0509WellData.csv")

# Display the first few rows of the data
well_data.head()

ChatGPT has also identified placeholder values of -999 in some of the curves. These values are representative of missing data. However, as we will see this can cause some confusion and issues with the responses we will get.

Data Exploration With Code Interpreter

When working with any dataset, the most time-consuming part is understanding what data you have, carrying out data quality checks and cleaning up the data.

This is where I personally see ChatGPTs Advanced Data Analytics plugin being the most helpful to petrophysicists and data scientists. However, as always, you must check the results and work carried out by these AI systems, as they could have inadvertently made a few mistakes.

Whilst experimenting with the data cleaning step, ChatGPT has trouble converting the -999 values to NaNs. When doing this, it also converted the column data type to String.

As a result, I had to be specific in my request to make sure it kept the columns as numeric.

More specific prompt to get ChatGPT Advanced Data Analytics plugin to convert -999 values to NaNs. Image by the author.

It finally came back with the following response:

ChatGPT response to replacing -999 values with NaNs. Image by the author.

And the following code# Replace -999 values with NaN

Which generated the following output:

well_data.replace(-999, float("nan"), inplace=True)

# Display the first few rows of the updated data
well_data.head()
RESULT
DEPTH GR DT RHOB DRHO NPHI
0 4609.8008 89.059479 <NA> 2.483700 0.018494 0.129119
1 4609.9008 90.721848 <NA> 2.489308 0.013656 0.108034
2 4610.0008 90.709061 <NA> 2.501088 0.011289 0.085650
3 4610.1008 89.568954 66.223099 2.515450 0.008615 0.070332
4 4610.2008 88.169571 66.705551 2.530982 0.005628 0.065343

It is interesting and a little odd that the response is trying to convert a string nan to a float. I would have used np.nan in the dataframe’s replace function to put NaNs instead of -999 values. However, this seems to have worked for now, but it will cause issues further on.

Generating a Descriptive Summary

Next, let’s see if we can get a descriptive summary table of each of the measurements in the dataset using the following simple prompt.

ChatGPT prompt to generate a simple descriptive summary of a well log dataset. Image by the author.

ChatGPT returns with the following table in it’s response.

ChatGPT summary table providing information and statistics of well log data. Image by the author.

At first glance, it may appear to be the same output as the df.describe() function; however, ChatGPT has also added units to each of the measurements. This is handy if we want to take this content and place it in a report. Although it would have been nicer to have the units on the left in the row headers rather than in every box.

Additionally, it has converted the porosity units from decimal to percentage. This could potentially be misleading when reporting or passing the information on to colleagues. I would have preferred it leave the data in the original units.

Finally, it has failed to create statistics for the DT curve due to nulls being present — this ties back to the earlier section where ChatGPT failed to convert the -999 values to nulls. However, if we were to use thedf.describe() method and as long as the data is numeric, then we should still see the statistics of that curve.

To an unaware data analyst, this could potentially slip through without being noticed.

Creating Well Log Plots With Matplotlib and ChatGPT Advanced Data Analytics Plugin

When I first started writing articles on Medium, I focused on how to create basic well log plots with matplotlib and how to work with well log data using Python. That process took a lot of time as I was relatively new to Python and struggled to get the coding right.

Essentially a well log plot consists of subplots, which are often referred to as tracks. Within each of these tracks, different scientific measurements are plotted, which can be used together to develop an interpretation of the subsurface. Each measurement is plotted against depth on the y-acis.

As a quick test of ChatGPT’s Advanced Data Analytics plugin, I wanted to see if it can help recreate a well log plot from my earlier article with a few simple prompts.

Example well log plot generated using matplotlib. Image by the author.

The first step is to get it to create a basic well log plot, where each of the measurements are displayed in their own subplot.

Initial prompt for generating the well log plot. Image by the author.

After processing and writing the code for the first attempt, ChatGPT, again comes up against the issue of NaN values with the dataset. As a result it has to recreate the plot.

Basic well log plot generated using ChatGPTs Advanced Data Analytics plugin. Image by the author.

Once the plot has been generated, it comes back with a relatively good first-pass plot of the well measurements.

Interestingly, it has also made a few adjustments to the plot, including:

  • Inverting the y-axis, so that we go from a shallow depth to a deeper depth
  • Applied standard scales to some of the measurements, e.g. GR is scaled 0 to 150

The code ChatGPT generated is as follows:# Setting up the figure and axes again

fig, ax = plt.subplots(nrows=1, ncols=6, figsize=(15, 10), sharey=True)

# Plotting DEPTH on the first track
ax[0].plot(well_data['DEPTH'], well_data['DEPTH'], color='black')
ax[0].set_title('DEPTH')
ax[0].set_ylabel('Depth (m)')
ax[0].invert_yaxis() # Invert the y-axis to have depth increasing downwards
ax[0].grid()

# Plotting GR on the second track
ax[1].plot(well_data['GR'], well_data['DEPTH'], color='green')
ax[1].set_title('GR (Gamma Ray)')
ax[1].set_xlim(0, 150) # Typical scale for GR
ax[1].grid()

# Plotting DT on the third track (excluding NaN values)
ax[2].plot(well_data['DT'].dropna(), well_data['DEPTH'].loc[well_data['DT'].dropna().index], color='blue')
ax[2].set_title('DT (Delta Time)')
ax[2].grid()

# Plotting RHOB on the fourth track
ax[3].plot(well_data['RHOB'], well_data['DEPTH'], color='red')
ax[3].set_title('RHOB (Bulk Density)')
ax[3].set_xlim(1.5, 3.0) # Typical scale for RHOB
ax[3].grid()

# Plotting DRHO on the fifth track
ax[4].plot(well_data['DRHO'], well_data['DEPTH'], color='cyan')
ax[4].set_title('DRHO')
ax[4].grid()

# Plotting NPHI on the sixth track
ax[5].plot(well_data['NPHI'], well_data['DEPTH'], color='magenta')
ax[5].set_title('NPHI (Neutron Porosity)')
ax[5].set_xlim(-0.15, 0.45) # Typical scale for NPHI
ax[5].grid()

plt.tight_layout()
plt.show()

This code is very similar to the code in my earlier article, but rather than using subplot2grid, it accesses the individual axes and adds the plots.

We can now take this plot and modify it by making several changes at once.

When working with ChatGPT, I have often found that splitting up prompts into individual instructions can help get the required output. However, to see how ChatGPT’s Code Interpreter responds to multiple commands at once, we can use the following prompt to change the following:

  • Remove the subplot containing the Depth curve
  • Add gridlines to each subplot
  • Add spines to the right and top sides of each subplot
  • Add standard colours for each curve
  • Make sure that all curves have defined x-axis scale ranges
Multiple requests in ChatGPT prompt to improve a well log plot. Image by the author.

After further processing, ChatGPT came back with a plot containing the changes I requested.

Basic well log plot after suggested changes. Image by the author.

It also refactored the code from having individual calls to each axis, and it now uses a loop to go through each measurement and apply changes using some common functions such as the spines and grid colour.# Setting up the figure and axes with one less subplot for depth

fig, ax = plt.subplots(nrows=1, ncols=5, figsize=(15, 10), sharey=True)

# Define a dictionary for colors and scales for each log
colors = {
"GR": "green",
"DT": "magenta",
"RHOB": "red",
"DRHO": "grey",
"NPHI": "blue"
}
scales = {
"GR": (0, 150),
"DT": (140, 40),
"RHOB": (1.95, 2.95),
"DRHO": (-0.2, 0.2),
"NPHI": (0.45, -0.15)
}

# Plotting each log with the specified colors and scales
logs = ['GR', 'DT', 'RHOB', 'DRHO', 'NPHI']
for i, log in enumerate(logs):
ax[i].plot(well_data[log], well_data['DEPTH'], color=colors[log])
ax[i].set_xlim(scales[log])
ax[i].set_title(log)
ax[i].grid(color="lightgrey")
# Adding top and right spines
ax[i].spines['top'].set_visible(True)
ax[i].spines['right'].set_visible(True)

ax[0].set_ylabel('Depth (m)')
ax[0].invert_yaxis() # Invert the y-axis to have depth increasing downwards

plt.tight_layout()
plt.show()

Adding a Variable Fill Between the Y-axis and a Line

On log plots, it is common to apply shading and additional fills between curves and the edges of the subplots. This helps the petrophysicist understand the data through visualisation cures.

One such method is to apply a variable shading for the gamma-ray (GR) to help identify shale and non-shale intervals. For this, we commonly use the Yellow — Orange — Brown colourmap, which can help show clean reservoir intervals as brighter colours and more shaley intervals as darker colours.

I have previously attempted this with ChatGPT and Code Interpreter, but it has always been unsuccessful. Perhaps this is down to how I make the prompt.

Let’s see if ChatGPT can help create a YlOrBr colour map shading for the GR curve by being more specific in the prompt.

Prompt for ChatGPT to create a variable shading for the GR — Gamma Ray curve. Image by the author.

We get back the following plot:

ChatGPT Data Analysis Plugin generating a variable shading for the GR curve. Image by the author.

On the first attempt, it managed to get the fill on the subplot. However, it did not do it from the left-hand side of the subplot to the line.

We can see if ChatGPT will correct this:

Prompt to apply a fill only from the left side of the subplot to the curve. Image by the author.

And it did, which is very surprising as previous attempts have failed.

Log plot after ChatGPT corrects the plot for the variable fill. Image by the author.

We can take a closer look at the code to see what it did to generate the plot.# Setting up the figure and axes again

fig, ax = plt.subplots(nrows=1, ncols=5, figsize=(15, 10), sharey=True)

# Plotting each log with the specified colors and scales, excluding NaN values for DT
for i, log in enumerate(logs):
valid_indices = well_data[log].dropna().index
ax[i].plot(well_data[log].loc[valid_indices], well_data['DEPTH'].loc[valid_indices], color=colors[log])
ax[i].set_xlim(scales[log])
ax[i].set_title(log)
ax[i].grid(color="lightgrey")
# Adding top and right spines
ax[i].spines['top'].set_visible(True)
ax[i].spines['right'].set_visible(True)

# Adding the fill for the GR subplot
if log == 'GR':
# Add color based on the value of the GR line
for depth, value in zip(well_data['DEPTH'].loc[valid_indices], well_data[log].loc[valid_indices]):
ax[i].fill_betweenx([depth-0.05, depth+0.05],
0, value,
facecolor=plt.cm.YlOrBr(value/150),
edgecolor='none',
zorder=0) # Place the fill behind the line

ax[0].set_ylabel('Depth (m)')
ax[0].invert_yaxis() # Invert the y-axis to have depth increasing downwards

plt.tight_layout()
plt.show()

The approach taken by ChatGPT differs from my earlier attempt, which looped through each sorted value and applied the fill under the curve. Whereas ChatGPT does it depth by depth and then applies the fill.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

left_col_value = 0
right_col_value = 150

#assign the column to a variable for easier reading
curve = well_data['GR']

#calculate the span of values
span = abs(left_col_value - right_col_value)

#assign a color map
cmap = plt.get_cmap('YlOrBr')

#create array of values to divide up the area under curve
color_index = np.arange(left_col_value, right_col_value, span / 100)

#setup the plot
well_data.plot(x='GR', y='DEPTH', c='black', lw=0.5, legend=False, figsize=(6,15))
plt.ylim(4800, 4600)
plt.xlim(left_col_value, right_col_value)
plt.title('Plot With a Variable Fill to Y-Axis')

#loop through each value in the color_index
for index in sorted(color_index):
index_value = (index - left_col_value)/span
color = cmap(index_value) #obtain colour for color index value
plt.fill_betweenx(well_data['DEPTH'], 0 , curve, where = curve >= index, color = color)

plt.show()

The above code generates the following GR plot with the shading.

Colour fill using a GR curve. Image by the author.

Moving Two Lines onto the Same Subplot

On log plots, it is common to plot the Bulk Density (RHOB) and Neutron Porosity (NPHI) on the same track. As these two curves have different scales, we need to place one of them on a secondary x-axis.

This can be tricky to get right in matplotlib, so we will see how ChatGPT manages it with the following prompt.

Prompt requesting that NPHI is placed on the same subplot as RHOB and on a secondary axis. Image by the author.

The returned plot is not terrible. ChatGPT has managed to get the NPHI curve on the same subplot as RHOB and has placed it on a secondary x-axis. However, the labels for the subplot overlap each other and it is not clear what scale belongs to the curves.

Additionally, we now have two sets of grid lines on the subplot, which can cause confusion.

Returned log plot from ChatGPT after moving NPHI onto the same subplot as RHOB. Image by the author.

We can modify that with a simple prompt to make sure the labels don’t overlap and remove the gridlines from one of the curves.

Prompt to adjust the labelling and gridlines for a subplot.

ChatGPT has done what it was asked, but it has also added colour to the different labels.

I was not expecting the result to do this, however, it does allow us to easily link the label with the correct line.

ChatGPT Data Analytics Plugin generates a well log plot after adjusting labelling for NPHI and RHOB curves. Image by the author.

Therefore, based on ChatGPT changing the label colours, we can use that as inspiration to change the rest of the labels to match the colour of the lines.

This should help us if we have multiple lines on the same subplot, which is very common with log plots.

Prompt for adjusting all labels and tick labels to match the colour of their respective curve. Image by the author.

ChatGPT has now responded with our final plot, and it is now easy for us to see the NPHI and RHOB scales and what line they belong to.

Final well log plot generated by ChatGPT and the Data Analytics plugin. Image by the author.

The following code is what ChatGPT came up with to generate the above plot.# Setting up the figure and axes again

fig, ax = plt.subplots(nrows=1, ncols=4, figsize=(15, 10), sharey=True)

# Plotting each log with the specified colors and scales, excluding NaN values
for i, log in enumerate(logs):
valid_indices = well_data[log].dropna().index
ax[i].plot(well_data[log].loc[valid_indices], well_data['DEPTH'].loc[valid_indices], color=colors[log])
ax[i].set_xlim(scales[log])
# Remove subplot title
ax[i].set_title("")
# Add axis label representing the name of the log measurement
ax[i].set_xlabel(log, color=colors[log])
# Color tick marks and tick labels in the same color as the line
ax[i].tick_params(axis='x', colors=colors[log])
# Adding top and right spines
ax[i].spines['top'].set_visible(True)
ax[i].spines['right'].set_visible(True)

# Adding the fill for the GR subplot
if log == 'GR':
# Add color based on the value of the GR line
for depth, value in zip(well_data['DEPTH'].loc[valid_indices], well_data[log].loc[valid_indices]):
ax[i].fill_betweenx([depth-0.05, depth+0.05],
0, value,
facecolor=plt.cm.YlOrBr(value/150),
edgecolor='none',
zorder=0) # Place the fill behind the line
ax[i].grid(color="lightgrey")

# Adding NPHI to the RHOB subplot with a secondary x-axis
if log == 'RHOB':
ax2 = ax[i].twiny() # Create a secondary x-axis for NPHI
valid_indices_nphi = well_data['NPHI'].dropna().index
ax2.plot(well_data['NPHI'].loc[valid_indices_nphi], well_data['DEPTH'].loc[valid_indices_nphi], color=colors['NPHI'])
ax2.set_xlim(scales['NPHI'])
ax2.set_xlabel('NPHI', color=colors['NPHI'])
ax2.tick_params(axis='x', colors=colors['NPHI'])
# Remove gridlines for NPHI and display the ones for RHOB
ax2.grid(False)
ax[i].grid(color="lightgrey")

ax[0].set_ylabel('Depth (m)')
ax[0].invert_yaxis() # Invert the y-axis to have depth increasing downwards

plt.tight_layout()
plt.show()

The plot and code look reasonable, and there are a few more modifications I would make. However, I felt at this point, it was best to continue modifying the plot using Jupyter Notebooks.

This was due to a few issues I had previously where if I went back and changed an earlier prompt, it would wipe out anything after that prompt and recreate it.

This was especially problematic if I had left the ChatGPT window open for several hours or even days, and it had timed out.

Summary

Overall, I have found ChatGPT’s Data Analytics plugin (Previously Code Interpreter) to be a useful tool in generating well log plots for petrophysics and geoscience. However, I do have several reservations and issues with using it.

I found it very difficult to replicate results in a new chat instance. I had previously attempted the same process above and ended up with completely different results, and ChatGPT struggled to generate the plot I wanted. This even occurred when using exactly the same prompts.

Sometimes the results generated by the Data Analytics plugin were questionable and even erroneous. As with any LLM, it is always wise to review the output and make sure it makes sense programmatically and technically.

If you make any errors in the prompts, it is not easy to go back and change them. If you do try to change one of your prompts, it can result in the deletion of any chat after that prompt. Therefore, I would recommend copying the code over to Jupyter Notebook as you go along so that you do not lose any information.

Finally, my biggest reservation about using ChatGPT and the Data Analytics plugin is how easy it can be to upload proprietary data (in this example, I have used public data, which is open to use). The data, prompts and output can all be used to train future models without you knowing it. The issues surrounding copyright and Intellectual Property are increasing daily, and extreme caution is advised when working with this tool and proprietary data.

It would be great to hear your comments, thoughts and concerns about using ChatGPT and the Data Analytics plugin.

Data Used in this Example

The data used within this tutorial was downloaded from NLOG.nl, which is a website that contains well logging data for the entire Dutch sector of the North Sea. The data is free to download and use. Full details of the data licence can be found here, but a summary of the usage is provided here from the Intellectual Property Rights section:

NLOG.NL does not claim any rights (except domain names, trademark rights, patents and other intellectual property rights) in respect of information provided on or through this website. Users are permitted to copy, to download and to disclose in any way, to distribute or to simplify the information provided on this website without the prior written permission of NLOG.NL or the lawful consent of the entitled party. Users are also permitted to copy, duplicate, process or edit the information and/or layout, provided NLOG.NL is quoted as the source.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *