Investigation API - Classifiers#

Introduction#

This notebook provides the user a guide to how a Python classifier can be created and used in Blueback Investigator.

Setup#

The Investigator API is defined in the cegalprizm.investigator package.

The InvestigatorConnection class establishes a connection between the notebook and a running instance of Blueback Investigator.

[3]:
from cegalprizm.investigator import InvestigatorConnection

Establish a connection to Investigator

[4]:
inv_conn = InvestigatorConnection()

Create a new investigation from an .invpy file

[6]:
inv = inv_conn.investigation_from_file("Wells.invpy")

Investigation dataframe#

The classifier will be trained using the samples from the Investigation.

[7]:
df = inv.as_dataframe()
df.head()
[7]:
Perm Gamma Porosity Data sets Zones (hierarchy) Facies
0 416.231333 79.866280 0.123958 Wells/B2 Undefined Undefined
1 215.495886 66.020500 0.123935 Wells/B2 Undefined Clay
2 217.868134 55.825428 0.120401 Wells/B2 Undefined Clay
3 151.011123 66.199051 0.118198 Wells/B2 Undefined Clay
4 192.420003 61.975246 0.114056 Wells/B2 Undefined Clay

Split dataframe in to clean and undefined entries

[7]:
no_B8_df = df[df["Dataset"] != "B8"]
df = no_B8_df[no_B8_df["Facies"] != "Undefined"]
undefined_df = no_B8_df[no_B8_df["Facies"] == "Undefined"]

Train the classifier#

Import the necessary python packages

[2]:
import numpy as np
import pandas as pd
from sklearn import svm
from sklearn.model_selection import train_test_split

Define the features and target to be used for training the classifier

[9]:
features = ["Porosity","Gamma"]
target = "Facies"

Split the dataframe into training and test dataframes using the features and target.

[10]:
X_train, X_test, y_train, y_test = train_test_split(df[features], df[target], test_size=.3)

Train an SVM classifier on the dataframe

[11]:
clf = svm.SVC(gamma=0.01, C=10.,
              cache_size=200, class_weight=None, coef0=0.0,
              decision_function_shape='ovr', degree=3, kernel='rbf',
              max_iter=-1, probability=True, random_state=None, shrinking=True,
              tol=0.001, verbose=True)
clf.fit(X_train, y_train)
[LibSVM]
[11]:
SVC(C=10.0, gamma=0.01, probability=True, verbose=True)

Check accuracy score of classifier:

[12]:
clf.score(X_test, y_test)
[12]:
0.6716242661448141

Prepare the classification model for use in Investigator#

It is necessary to provide a description of the inputs and the output the classification model expects.

Investigator will use this information to ensure that the correct dimensions information will be passed to the classifier in the required units.

The classifier must return the results of the classification based on the description of the output provided.

[13]:
from cegalprizm.investigator import ContinuousPropertyTuple, DiscretePropertyTuple
lookup = inv.discrete_dimension_tags[target]
input_dimensions = [ContinuousPropertyTuple(x, inv.display_units[x]) for x in features]
output_dimension = DiscretePropertyTuple(target, inv.discrete_dimension_tags[target])
print(input_dimensions)
print(output_dimension)
[ContinuousPropertyTuple(name='Porosity', unit_symbol='m3/m3'), ContinuousPropertyTuple(name='Gamma', unit_symbol='gAPI')]
DiscretePropertyTuple(name='Facies', tags=['Undefined', 'Clay', 'Sand', 'Silt', 'Fine silt', 'T1', 'Ness', 'N2', 'N1'])

Wrap the classification model in a function that can be pickled and can process the data provided by Investigator.

[14]:
from cegalprizm.investigator import InvestigatorPyFunction2D

def predict(v):
    if np.isnan(v).any():
        return 'Undefined'
    return clf.predict([v])

@InvestigatorPyFunction2D
def classifier(inputs):
    return [lookup.index(predict(v)) for v in inputs]

Create the classifier in the Investigation#

Creates a new python classifier in the investigation. Investigator wil automatically run this classifier on the data in the investigation.

[ ]:
inv.create_classifier("classify_example", classifier, input_dimensions, output_dimension)