|

Creating a Multi-Well Integrated Well Log and Formation Tops Dataframe in Python

When working with well log measurements and subsurface data we are often dealing with different file formats and sample rates. For instance, well log measurements are typically stored and transferred within .las files or dlis files and sampled every 0.1m or 0.5ft. Geological formation tops on the other hand are single discrete depth points. This requires interpolation of the formation data to match the sample rates of well log measurements.

In my previous tutorial, we saw how to merge well log data and formation data for a single well. Within this tutorial, we are going to see how we can do this for multiple wells.

Importing Libraries

The first step in the process is to import the libraries we will be working with.

For this tutorial we will be using lasio to load in .las files, os to read files from a directory, pandas to enable us to work with dataframes, and csv to load formation data stored within csv files.

import lasio as las
import os
import pandas as pd
import csv

Importing Well Log LAS Files Using LASIO

Next, we will begin importing the data.

The data used within this tutorial was downloaded from NLOG.nl, which is a website that contains well logging data for the entire Dutch sector of the North Sea. The data is free to download and use. Full details of the data licence can be found here.

From the NLOG.nl website, we will be using data from three wells: L07–01, L07–05 and L07–04.

When loading in single files, we can easily pass the file location into the lasio.read() function. However, as we are working with multiple .las files, we need to read them separately and append them together within a list.

The dataframes stored within that list are then joined together using pd.concat() .

The following code will read all files ending in .las within a directory named Data/Notebook 36.

directory = "Data/Notebook 36"

# Initialise empty list for dataframes
df_list = []

for file in os.listdir(directory):
    if file.endswith('.las'):
        f = os.path.join(directory, file)

        # Convert LAS file to a DF
        las_file = las.read(f)
        df = las_file.df()
        
        # Create a column for the Well Name
        well_name = las_file.well.WELL.value
        df['WELL'] = well_name
        
        # Make sure depth is a column rather than an index
        df = df.reset_index()
        df = df.sort_values(['WELL', 'DEPT']).reset_index(drop=True)
        df_list.append(df)

# Create a single dataframe with all wells
well_df = pd.concat(df_list)

Once the full file name has been obtained (line 8) it will be combined with the directory path it is stored in. The las file is then read (line 11) and converted to a pandas dataframe (line 12).

In order to distinguish which well the data came from, we can add a new column called WELL to the dataframe. Its value will be set to the well name (line 15–16) contained within the well header section of the las file.

When loading files with LASIO and converting them to dataframes, the index of the dataframe will be set to the depth curve. We can change this so that we have a simple integer index and depth as an actual column within the dataframe. This is achieved by using the .reset_index() function within pandas (line 19).

Next, we need to sort the dataframe so that it goes from the shallowest depth measurement to the deepest (line 20). Doing this will put the index out of order, so we need to reset the index again, but this time, we do not want the index transformed into a column (line 20) so we need to set the dropparameter to True.

Once the dataframe has been sorted, we can then append it to our dataframe list: df_list (line 21).

This process repeats until all available .las files have been read within the specified directory.

Finally, the dataframes stored within the list are joined together using pd.concat() (line 24).

When we call upon the well_df dataframe, we get back the following view of the first 5 and last 5 rows.

Dataframe contents after loading in multiple .las files with lasio and joining them together into a single dataframe.
Dataframe contents after loading in multiple .las files with lasio and joining them together into a single dataframe.

Loading Formation Tops Data from CSV

Formation top data is often stored within tabular form, most commonly within .csv files. These files will contain the name of the geological formation, and the associated top and bottom depth.

Example of formation data stored within a csv file.
Example of formation data stored within a csv file.

The csv files for this example already have a column called Well, which contains the well name. Doing this upfront prior to loading them into Python is helpful, but not essential. If you don’t do this though, you may have to extract the well name from the file name, which can be more time-consuming.

In the code example below, we again create a blank list (line 2) to store the dataframes in.

Next, we loop over all files ending with .csv within the specified directory and read them using pd.read_csv(), and then append them to the list called df_formation_list

Once all files have been read, we can then call upon pd.concat() to join the dataframes together.

# Initialise empty list for dataframes
df_formation_list = []

for file in os.listdir(directory):
    if file.endswith('.csv'):
        f = os.path.join(directory, file)
        df = pd.read_csv(f)

        df_formation_list.append(df)

# Combine all formation data into a single dataframe
formations_df = pd.concat(df_formation_list)

When we call upon the formations_df dataframe we get back the following view:

Formation data stored within a pandas dataframe for multiple wells.
Formation data stored within a pandas dataframe for multiple wells.

Creating a Dictionary of Formation Data

Now that we have the formations within a simple pandas dataframe, we now need to convert this dataframe to a nested dictionary.

This makes the process of combining the two datasets much easier and allows us to create a continuous column with the formation name at each depth level.

We can do this by using dictionary comprehension.

formations_dict = {k: f.groupby('Top')['Stratigraphical Unit'].apply(list).to_dict()
     for k, f in formations_df.groupby('Well')}

Once we have run the above code, we can call upon formations_dict and we get back the following result.

Nested dictionary of formation names and depths.
Nested dictionary of formation names and depths.

From it, we can see that the main key is the well name, and within each well we have a sub-dictionary with the depth as the key and the formation name as the value.

You may wonder why we are using depth as the key rather than the formation name. Doing it this way will allow us to check if the depth we are currently at (in the loop we will cover in the next section) is between two of the keys. If it is, then we can simply get the formation name.

If we want to view the tops for a specific well, we can call upon specific wells within the call to the dictionary like so:

formations_dict['L07-01']

Which will return the formation data for that specific well:

Formation depths and names for a specific well.
Formation depths and names for a specific well.

Merging Formation Data with Well Log Data

Now that the processing and setup have been completed, we can move on to integrating the formation tops dictionary with the well log dataframe.

For this, we will use the following function.

def add_formation_name_to_df(depth, well_name):

    formations_depth = formations_dict[well_name].keys()   
    
    # Need to catch if we are at the last formation
    try:
        at_last_formation = False
        below = min([i for i in formations_depth if depth < i])
    except ValueError:
        at_last_formation = True

    # Need to catch if we are above the first listed formation
    try:
        above_first_formation = False
        above = max([i for i in formations_depth if depth > i])
    except:
        above_first_formation = True

    if above_first_formation:
        formation = ''

    # Check if the current depth matches an existing formation depth
    nearest_depth = min(formations_depth, key=lambda x:abs(x-depth))
    if depth == nearest_depth:
        formation = formations_dict[well_name][nearest_depth][0]

    else:
        if not at_last_formation:
            if depth >= above and depth <below:
                formation = formations_dict[well_name][above][0]
        else:
                formation = formations_dict[well_name][above][0]
    return formation

The function first gets the depths (keys from formations_dict) for the formations of the well (well_name ) that is passed in.

We then need to catch a few edge cases.

First, we need to see if we are at the last formation within the formations dictionary. If we are then we will set a flag at_last_formation to True, otherwise, we will create a new variable called below which will be the closest formation depth below the current depth (depth).

Next, we need to see if we are at the first formation within the dictionary (lines 12–17). In this situation, we are checking if we have any depth values above the first formation listed. This can occur if we only have formations from a specific depth instead of from the surface. If the current depth is above (shallower) the first formation depth, then we will set the formation name to a blank string (lines 19–20). Otherwise, we will get the depth value for the formation from above the current depth.

Finally, we need to check where the current depth value sits within the formation depths. If we don’t do this, then the correct formation name will not be set. If the current depth is equal to one contained within the formation dictionary then we will set it to the formation name at the depth listed.

Once the function has been written, we can call upon it using the applymethod within pandas. This allows us to iterate over each row within the dataframe.

well_df['FORMATION'] = well_df.apply(lambda x: add_formation_name_to_df(x['DEPT'], x['WELL']), axis=1)

When we call upon well_df we get back the following view of the dataframe:

Combined dataframe results containing both well log data and formation data.
Combined dataframe results containing both well log data and formation data.

Checking the Final Result

When doing anything like this, it is essential you check the results close up. For example, we can check the results within one of the wells, between the depths where we expect a formation change.

We can do that as follows.

well_df.loc[(well_df['WELL'] == 'L07-01') & (well_df['DEPT'] >= 929) & (well_df['DEPT'] <= 935)]
Close-up of formation data within pandas dataframe and original data within the csv file.
Close-up of formation data within pandas dataframe and original data within the csv file.

In the original formation tops csv file above, we can see that the transition between the Brussels Marl Member and the Ieper Member occurs at 930 ft. This occurs at the same point within the combined dataframe.

This helps us gain confidence that the process has worked.

To be sure, it is always wise to check multiple wells and intervals in this way, or by generating a well log plot.

Summary

Integrating well log data and formation data for multiple wells can be a challenge within Python. Within this short article, we have seen how to load multiple las files and formation data files and combine them together into a single dataframe.

This will allow us to integrate formation data and well log data into machine learning models or well log displays.

Be sure to check out the following article if you are looking to deal with a single well:

Similar Posts

One Comment

  1. I think the last else block should be like this:

    else:
    if not above_first_formation:
    if not at_last_formation:
    if depth >= above and depth <below:
    formation = formations_dict[well_name][above][0]
    else:
    formation = formations_dict[well_name][above][0]

    Otherwise, there may be cases for which the else block is entered without a defined above value

Leave a Reply

Your email address will not be published. Required fields are marked *