Investigation API - Predictors#

Introduction#

This notebook provides the user a guide to how a Python predictor can be created which that can be called from Blueback Investigator.

Setup#

The Investigator API is defined in the cegalprizm.investigator package.

The InvestigatorConnection class establishes a connection between the notebook and a running instance of Blueback Investigator.

[1]:
from cegalprizm.investigator import InvestigatorConnection

Establish a connection to Investigator

[ ]:
inv_conn = InvestigatorConnection()

Create a new investigation from an .invpy file

[ ]:
inv = inv_conn.investigation_from_file("Wells.invpy")

Investigation dataframe#

The classifier will be trained using the samples from the Investigation.

[ ]:
df = inv.as_dataframe()
df.head()

Split dataframe in to clean and undefined entries

[ ]:
df = df[df["Dataset"] != "B8"]

Train the classifier#

Import the necessary python packages

[1]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.neural_network import MLPRegressor

Define the features and target to be used for training the predictor

[10]:
features = ["Porosity", "Perm"]
target = "Gamma"

Split the dataframe into training and test dataframes using the features and target.

[11]:
X_train, X_test, y_train, y_test = train_test_split(df[features], df[target], test_size=.3)

Create a scaler which is used to normalise the inputs before training the regression model

[12]:
scaler = MinMaxScaler()
scaler.fit(X_train)

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

Train a linear regression on the dataframe

[2]:
reg = LinearRegression().fit(X_train, y_train)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-2-e7994597f7a6> in <module>
----> 1 reg = LinearRegression().fit(X_train, y_train)

NameError: name 'LinearRegression' is not defined

Check accuracy score

[15]:
reg.score(X_test, y_test)
[15]:
0.3359948307266003

Prepare the regression model for use in Investigator#

It is necessary to provide a description of the inputs and the output the regression model expects.

Investigator will use this information to ensure that the correct dimensions information will be passed to the predictor in the required units.

The predictor must return the results of the regression based on the description of the output provided.

[16]:
from cegalprizm.investigator import ContinuousPropertyTuple
input_dimensions = [ContinuousPropertyTuple(x, inv.display_units[x]) for x in features]
output_dimension = ContinuousPropertyTuple(target, inv.display_units[target])
print(input_dimensions)
print(output_dimension)
[ContinuousPropertyTuple(name='Porosity', unit_symbol='m3/m3'), ContinuousPropertyTuple(name='Perm', unit_symbol='mD')]
ContinuousPropertyTuple(name='Gamma', unit_symbol='gAPI')

Wrap the regression model in a function that can be pickled and can process the data provided by Investigator.

[17]:
from cegalprizm.investigator import InvestigatorPyFunction2D

def predict(v):
    if np.isnan(v).any():
        return np.nan
    return reg.predict([v])

@InvestigatorPyFunction2D
def predictor(inputs):
    scaled_inputs = scaler.transform(inputs)
    return [predict(v) for v in scaled_inputs]

Create the predictor in the Investigation#

Creates a new python predictor in the investigation. Investigator wil automatically run this predictor on the data in the investigation.

[ ]:
inv.create_predictor("predict_example", predictor, input_dimensions, output_dimension)