Getting Started With Python as a Geoscientist? Here Are 5 Ways You Can Improve Your Code!

Over the years, I have seen and worked on numerous Python scripts within the geoscience and petrophysics domains. In that time, I have seen (and also written) a variety of coding styles, from well-organised code that is well-documented to everything in a single Python file with little to no organisation. With the latter, it can be difficult to maintain, debug and understand when the code is revisited several months later. The purpose of writing the code will often dictate the style used.

If we are creating a script that may be used once or twice or are working under strict time constraints and pressures, then we may not be able to make things as pretty and organised as we would like. However, if we are creating code that we will use multiple times or are deploying it to other users, and if we have the time, we may want to structure the code or app in a way that will make it suitable for expansion later on. This can save time and headaches when you revisit your code and also avoids the dilemma of forgetting what the code does or was intended to do.

As a geoscientist, coding may not be part of your natural background. However, you may have seen and heard colleagues and friends discussing apps they have created, which has inspired you to give it a go.

In this article, I will share five tips I have learned over the years that have improved my geoscience Python applications. I do this hoping they will also help those venturing into the world of Python and machine learning for the first time.

These tips are also equally applicable to anyone who is not a geoscientist and is just starting out learning Python.

Setting Up an Application Folder Structure

When creating Python apps at the beginning of your journey, it may be convenient and easy to keep all of your code within a single folder. However, as your project begins to grow in size and complexity, it may become difficult to maintain and navigate through your code base.

This can easily be sorted by creating an effective folder structure. Even if your project is going to be small, it is useful to separate data files and any output or temporary files into their own folders.

For example, the following structure maintains a set of folders where specific functions for data processing, visualisation and analysis are separated out. This allows specific functions to be stored within their own file. These functions can then be called as and when needed from the main.py file, which contains the main code you are executing.

Additionally, any data files can be placed within a data subdirectory.# Example of a simple project structure

# Example of a simple project structure
your_project/
├── data_processing/
│   └── data_cleaner.py
├── visualisation/
│   └── plotter.py
├── analysis/
│   └── statistical_model.py
├── data/
│   └── raw_data.csv
└── main.py

If you are developing a library or a platform-specific application (e.g Dash), then it is worth checking out a framework like Cookiecutter. This allows you to set up the project structure very quickly from several pre-defined templates.

cookiecutter.readthedocs.io

Finally, if you are going to be reusing code in multiple applications, consider building a Python library to store those functions and classes.

Using Functions and Classes to Create Reusable Code

When working in Python (or any programming language), following practices for creating clean and efficient code is good practice. Especially when your applications increase in size. This allows code to be easily reused and easily maintained and improves readability (plus many other benefits).

This can be achieved through functions and classes, which help organise your code and split out functionality whilst reducing repeated code.

Functions for Modular Code

Functions allow us to encapsulate code that can be run when called. We can pass in parameters into the function, and we will return a result.

To create a function in Python, we can use the def keyword, followed by the name of the function, and finally, the parameters/arguments that it will take.

For example, the function below calculates a density porosity from a bulk density well log measurement. It takes in three parameters ( rho_matrixrho_bulk and rho_fluid ) and will carry out the calculation and return a value for porosity.

def calculate_density_porosity(rho_matrix, rho_bulk, rho_fluid):

    return (rho_matrix - rho_bulk) / (rho_matrix - rho_fluid)

Classes for Complex Data Structures

When your application becomes larger or needs to handle complex data structures, then a class can be a useful thing to consider.

A class in Python is used to create objects (similar to a blueprint) and also defines how objects behave. This is done by providing initial values for state (member variables) and implementations of behaviour (member functions or methods).

Best Practices for Reusable Code

Here are a few best practices when it comes to creating reusable code in your application.

  1. Single Responsibility Principle: Each function or class should have a single responsibility or purpose. This makes your code easier to test and maintain.
  2. Descriptive Naming: When creating functions or classes, they should have descriptive names that clearly indicate what they do or what they represent. Check out the pep-8 guidelines for naming conventions.
  3. Documentation: Use docstrings to document the purpose and usage of your functions and classes, and if possible, include references to the source of any calculations or methodologies. This is especially important in geoscience, where functions can be used to perform complex calculations. Documentation can include block comments, inline comments and documentation strings. Check out the pep- guidelines for further details.
  4. Modular Design: You should design your code so that each piece (function or class) can be tested and used independently. This may result in functions that have one or two lines, but this is better than having it hidden with a larger piece of code.

Adding Documentation to Your Code

Image representation code documentation. Generated using DALL-E 3 by the author.

When writing scripts, functions or classes it is important that it is well documented. By having comments and documentation strings (docstrings) within your code, it can go a long way to prevent headaches, especially when you revisit your application several months later. Not only that, it can help others understand your code if they are looking through it.

There are three simple ways we can improve the readability of our code through documentation:

  • Comments are used to help explain the ideas behind sections of code, breakdown complex logic or provide context about the code. The idea behind adding comments is not to explain how the code works, but rather to explain they why behind the code and logic.
  • Docstrings are used to describe what a function or class does and what parameters the function or class requires. This is very helpful for anyone who uses your code.
  • Type Hinting allows the user to better understand your code. They do so by informing the user about what data types each of the parameters are for a given function or class.

In the example we have seen previously, we have expanded the documentation for it by adding a docstring, which explains what the function does, what each of the expected parameters are, and what data type they are expected to be.

def calculate_density_porosity(rho_matrix: float, rho_bulk: float, rho_fluid: float) -> float:
    """
    Calculates porosity based on bulk density measurments.

    Parameters:
      rho_matrix (float): The rock matrix density.
      rho_bulk (float): The bulk density value.
      rho_fluid (float): The fluid density.
    
    Returns:
      float: The calculated density porosity.
    """
    return (rho_matrix - rho_bulk) / (rho_matrix - rho_fluid)

To find out more about improving documentation in your Python code, check out my video below, where I cover code documentation in more detail:

Or you can check out the following medium article: https://towardsdatascience.com/5-essential-tips-to-improve-the-readability-of-your-python-code-a1d5e62a4bf0

Version Control

As a geoscientist, I am sure you are familiar with storing multiple versions of essays and dissertations during your academic studies and creating file names along the lines of dissertation_version1.docxdissertation_final.docx or dissertation_final_final.docx. Doing this can be very messy and often ends up with you losing track of what really is the final version when you come back to look at the files at a later date.

Implementing a system like Git or an online system like GitHub is a great way to keep track of different versions without having multiple files with crazy names.

Also, when working on your project, you may want to keep track of any changes you have made. That way, when you find a function that has previously worked but has stopped working due to changes, you can potentially roll back the code to when it was working previously.

Implementing version control allows you to:

  • Keep track of changes over time
  • Collaborate with others effectively
  • Experiment without fear of breaking existing code
  • And more

Check out this great guide to understand and explore the concepts of Version Control:

guides.beanstalkapp.com

Create Tests as You Go

When developing your code or application, you need to make sure that it works as expected, especially if you modify the function in any way.

One way to ensure that any changes you make to a function or any other functions that it depends on is to write tests around that code.

There are a number of Python testing frameworks, such as PyTest and unittest, that are easy to use and even automate as part of a continuous integration workflow.

For example, the code below uses unittest and wraps around the porosity equation we saw earlier.

import unittest
from your_project.geoscience_calculations import calculate_density_porosity

class TestDensityPorosityCalculation(unittest.TestCase):
    def test_calculate_density_porosity(self):
        self.assertAlmostEqual(calculate_density_porosity(2.65, 2.0, 1.0), 0.39)

if __name__ == '__main__':
    unittest.main()

This test checks if the calculate_density_porosity function correctly calculates the density porosity given specific values for rho_matrixrho_bulk, and rho_fluid.

The assertAlmostEqual method is then used to check if the expected value matches closely to the returned value. This is due to the nature of floating-point numbers, where exact equality might not always be possible.

Summary

As a geoscientist, you may not be familiar with Python or you may have dabbled with it during your career or studies but want to get better at building apps.

These five tips should help you get off to a great start when you are in the early stages of learning how to code with Python or to expand your current skill and knowledge.

Ensuring that your code is properly documented, has tests written around and is contained within an organised structure will save you time and headaches later on when your Python application increases in size and complexity. By using version control, you will be able to keep track of all your changes and easily collaborate with friends and colleagues on larger projects.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *