Investigation API - Sample Data#

Introduction#

This notebook provides the user a guide to some of the API methods available to access samples data from a Blueback Investigation. See the other user guides to more detailed information on specific parts of the API.

Setup#

The Investigator API is defined in the cegalprizm.investigator package.

The InvestigatorConnection class establishes a connection between the notebook and a running instance of Blueback Investigator.

[9]:
from cegalprizm.investigator import InvestigatorConnection

Establish a connection to Investigator

[10]:
inv_conn = InvestigatorConnection()

Create a new investigation from an .invpy file

[11]:
inv = inv_conn.investigation_from_file("Wells.invpy")

API usage#

The investigation sample data can be presented as a pandas dataframe.

The .as_dataframe() function returns a Pandas dataframe containing the investigation sample data. If no parameters are specified then the dataframe will contain all the samples.

Note: The dataframe can be quite large depending on the investigation

[12]:
df = inv.as_dataframe()
print(df.head())
         Perm      Gamma  Porosity Data sets Zones (hierarchy)     Facies
0  416.231333  79.866280  0.123958  Wells/B2         Undefined  Undefined
1  215.495886  66.020500  0.123935  Wells/B2         Undefined       Clay
2  217.868134  55.825428  0.120401  Wells/B2         Undefined       Clay
3  151.011123  66.199051  0.118198  Wells/B2         Undefined       Clay
4  192.420003  61.975246  0.114056  Wells/B2         Undefined       Clay

Various parameters can be used to limit the amount of data in the dataframe.

It is possible to limit the columns in the dataframe by specifying the names the continuous_columns and/or discrete_columns to be included.

[16]:
inv.as_dataframe(continuous_columns=["Perm", "Porosity"], discrete_columns=["Facies"]).head()
[16]:
Perm Porosity Facies
0 416.231333 0.123958 Undefined
1 215.495886 0.123935 Clay
2 217.868134 0.120401 Clay
3 151.011123 0.118198 Clay
4 192.420003 0.114056 Clay

BY default, the continuous data will be returned using the display unit as specified in the investigation. This can be overridden to request different units using the continuous_units parameter.

A dictionary of the supported units can be found by calling available_units

[15]:
inv.as_dataframe(continuous_units={"Gamma": "gAPI", "Porosity": "m3/m3", "Perm": "D"}).head()
[15]:
Perm Gamma Porosity Data sets Zones (hierarchy) Facies
0 0.416231 79.866280 0.123958 Wells/B2 Undefined Undefined
1 0.215496 66.020500 0.123935 Wells/B2 Undefined Clay
2 0.217868 55.825428 0.120401 Wells/B2 Undefined Clay
3 0.151011 66.199051 0.118198 Wells/B2 Undefined Clay
4 0.192420 61.975246 0.114056 Wells/B2 Undefined Clay
[17]:
inv.as_dataframe(continuous_units="invariant").head()
[17]:
Perm Gamma Porosity Data sets Zones (hierarchy) Facies
0 4.107884e-13 79.866280 0.123958 Wells/B2 Undefined Undefined
1 2.126779e-13 66.020500 0.123935 Wells/B2 Undefined Clay
2 2.150191e-13 55.825428 0.120401 Wells/B2 Undefined Clay
3 1.490364e-13 66.199051 0.118198 Wells/B2 Undefined Clay
4 1.899038e-13 61.975246 0.114056 Wells/B2 Undefined Clay

When using the dataframes in machine learning it is often useful to express discrete data as values rather than string. The can be controlled using the discrete_data_as parameter

[7]:
inv.as_dataframe(discrete_data_as="value").describe()
[7]:
Perm Gamma Porosity Data sets Zones (hierarchy) Facies
count 8514.000000 8514.000000 8514.000000 8514.000000 8514.000000 8514.000000
mean 141.396486 85.035714 0.168533 1.455720 0.863636 1.596547
std 120.970665 13.087800 0.049188 1.176126 2.266894 0.822931
min 3.478512 39.594429 0.071062 0.000000 0.000000 0.000000
25% 73.698095 77.603285 0.119500 0.000000 0.000000 1.000000
50% 110.966671 84.163486 0.176083 1.000000 0.000000 1.000000
75% 171.878377 90.620266 0.213134 3.000000 0.000000 2.000000
max 2800.810507 139.570190 0.328864 3.000000 8.000000 4.000000

A dataframe containing only the samples from a defined dataset can be returned using the dataset_name parameter.

[22]:
inv.as_dataframe(dataset_name="B2").describe()
[22]:
Perm Gamma Porosity
count 2317.000000 2317.000000 2317.000000
mean 135.761381 81.563077 0.182965
std 108.943311 11.467934 0.048075
min 3.478512 46.375515 0.090809
25% 71.862079 75.034622 0.122807
50% 107.176873 80.604904 0.210643
75% 164.449917 86.415085 0.220046
max 1471.192789 135.338440 0.275775