Investigation API - Classifiers#
Introduction#
This notebook provides the user a guide to how a Python classifier can be created and used in Blueback Investigator.
Setup#
The Investigator API is defined in the cegalprizm.investigator package.
The InvestigatorConnection class establishes a connection between the notebook and a running instance of Blueback Investigator.
[3]:
from cegalprizm.investigator import InvestigatorConnection
Establish a connection to Investigator
[4]:
inv_conn = InvestigatorConnection()
Create a new investigation from an .invpy file
[6]:
inv = inv_conn.investigation_from_file("Wells.invpy")
Investigation dataframe#
The classifier will be trained using the samples from the Investigation.
[7]:
df = inv.as_dataframe()
df.head()
[7]:
Perm | Gamma | Porosity | Data sets | Zones (hierarchy) | Facies | |
---|---|---|---|---|---|---|
0 | 416.231333 | 79.866280 | 0.123958 | Wells/B2 | Undefined | Undefined |
1 | 215.495886 | 66.020500 | 0.123935 | Wells/B2 | Undefined | Clay |
2 | 217.868134 | 55.825428 | 0.120401 | Wells/B2 | Undefined | Clay |
3 | 151.011123 | 66.199051 | 0.118198 | Wells/B2 | Undefined | Clay |
4 | 192.420003 | 61.975246 | 0.114056 | Wells/B2 | Undefined | Clay |
Split dataframe in to clean and undefined entries
[7]:
no_B8_df = df[df["Dataset"] != "B8"]
df = no_B8_df[no_B8_df["Facies"] != "Undefined"]
undefined_df = no_B8_df[no_B8_df["Facies"] == "Undefined"]
Train the classifier#
Import the necessary python packages
[2]:
import numpy as np
import pandas as pd
from sklearn import svm
from sklearn.model_selection import train_test_split
Define the features and target to be used for training the classifier
[9]:
features = ["Porosity","Gamma"]
target = "Facies"
Split the dataframe into training and test dataframes using the features and target.
[10]:
X_train, X_test, y_train, y_test = train_test_split(df[features], df[target], test_size=.3)
Train an SVM classifier on the dataframe
[11]:
clf = svm.SVC(gamma=0.01, C=10.,
cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, kernel='rbf',
max_iter=-1, probability=True, random_state=None, shrinking=True,
tol=0.001, verbose=True)
clf.fit(X_train, y_train)
[LibSVM]
[11]:
SVC(C=10.0, gamma=0.01, probability=True, verbose=True)
Check accuracy score of classifier:
[12]:
clf.score(X_test, y_test)
[12]:
0.6716242661448141
Prepare the classification model for use in Investigator#
It is necessary to provide a description of the inputs and the output the classification model expects.
Investigator will use this information to ensure that the correct dimensions information will be passed to the classifier in the required units.
The classifier must return the results of the classification based on the description of the output provided.
[13]:
from cegalprizm.investigator import ContinuousPropertyTuple, DiscretePropertyTuple
lookup = inv.discrete_dimension_tags[target]
input_dimensions = [ContinuousPropertyTuple(x, inv.display_units[x]) for x in features]
output_dimension = DiscretePropertyTuple(target, inv.discrete_dimension_tags[target])
print(input_dimensions)
print(output_dimension)
[ContinuousPropertyTuple(name='Porosity', unit_symbol='m3/m3'), ContinuousPropertyTuple(name='Gamma', unit_symbol='gAPI')]
DiscretePropertyTuple(name='Facies', tags=['Undefined', 'Clay', 'Sand', 'Silt', 'Fine silt', 'T1', 'Ness', 'N2', 'N1'])
Wrap the classification model in a function that can be pickled and can process the data provided by Investigator.
[14]:
from cegalprizm.investigator import InvestigatorPyFunction2D
def predict(v):
if np.isnan(v).any():
return 'Undefined'
return clf.predict([v])
@InvestigatorPyFunction2D
def classifier(inputs):
return [lookup.index(predict(v)) for v in inputs]
Create the classifier in the Investigation#
Creates a new python classifier in the investigation. Investigator wil automatically run this classifier on the data in the investigation.
[ ]:
inv.create_classifier("classify_example", classifier, input_dimensions, output_dimension)