Investigation API - Sample Data#
Introduction#
This notebook provides the user a guide to some of the API methods available to access samples data from a Blueback Investigation. See the other user guides to more detailed information on specific parts of the API.
Setup#
The Investigator API is defined in the cegalprizm.investigator package.
The InvestigatorConnection class establishes a connection between the notebook and a running instance of Blueback Investigator.
[9]:
from cegalprizm.investigator import InvestigatorConnection
Establish a connection to Investigator
[10]:
inv_conn = InvestigatorConnection()
Create a new investigation from an .invpy file
[11]:
inv = inv_conn.investigation_from_file("Wells.invpy")
API usage#
The investigation sample data can be presented as a pandas dataframe.
The .as_dataframe() function returns a Pandas dataframe containing the investigation sample data. If no parameters are specified then the dataframe will contain all the samples.
Note: The dataframe can be quite large depending on the investigation
[12]:
df = inv.as_dataframe()
print(df.head())
Perm Gamma Porosity Data sets Zones (hierarchy) Facies
0 416.231333 79.866280 0.123958 Wells/B2 Undefined Undefined
1 215.495886 66.020500 0.123935 Wells/B2 Undefined Clay
2 217.868134 55.825428 0.120401 Wells/B2 Undefined Clay
3 151.011123 66.199051 0.118198 Wells/B2 Undefined Clay
4 192.420003 61.975246 0.114056 Wells/B2 Undefined Clay
Various parameters can be used to limit the amount of data in the dataframe.
It is possible to limit the columns in the dataframe by specifying the names the continuous_columns and/or discrete_columns to be included.
[16]:
inv.as_dataframe(continuous_columns=["Perm", "Porosity"], discrete_columns=["Facies"]).head()
[16]:
Perm | Porosity | Facies | |
---|---|---|---|
0 | 416.231333 | 0.123958 | Undefined |
1 | 215.495886 | 0.123935 | Clay |
2 | 217.868134 | 0.120401 | Clay |
3 | 151.011123 | 0.118198 | Clay |
4 | 192.420003 | 0.114056 | Clay |
BY default, the continuous data will be returned using the display unit as specified in the investigation. This can be overridden to request different units using the continuous_units parameter.
A dictionary of the supported units can be found by calling available_units
[15]:
inv.as_dataframe(continuous_units={"Gamma": "gAPI", "Porosity": "m3/m3", "Perm": "D"}).head()
[15]:
Perm | Gamma | Porosity | Data sets | Zones (hierarchy) | Facies | |
---|---|---|---|---|---|---|
0 | 0.416231 | 79.866280 | 0.123958 | Wells/B2 | Undefined | Undefined |
1 | 0.215496 | 66.020500 | 0.123935 | Wells/B2 | Undefined | Clay |
2 | 0.217868 | 55.825428 | 0.120401 | Wells/B2 | Undefined | Clay |
3 | 0.151011 | 66.199051 | 0.118198 | Wells/B2 | Undefined | Clay |
4 | 0.192420 | 61.975246 | 0.114056 | Wells/B2 | Undefined | Clay |
[17]:
inv.as_dataframe(continuous_units="invariant").head()
[17]:
Perm | Gamma | Porosity | Data sets | Zones (hierarchy) | Facies | |
---|---|---|---|---|---|---|
0 | 4.107884e-13 | 79.866280 | 0.123958 | Wells/B2 | Undefined | Undefined |
1 | 2.126779e-13 | 66.020500 | 0.123935 | Wells/B2 | Undefined | Clay |
2 | 2.150191e-13 | 55.825428 | 0.120401 | Wells/B2 | Undefined | Clay |
3 | 1.490364e-13 | 66.199051 | 0.118198 | Wells/B2 | Undefined | Clay |
4 | 1.899038e-13 | 61.975246 | 0.114056 | Wells/B2 | Undefined | Clay |
When using the dataframes in machine learning it is often useful to express discrete data as values rather than string. The can be controlled using the discrete_data_as parameter
[7]:
inv.as_dataframe(discrete_data_as="value").describe()
[7]:
Perm | Gamma | Porosity | Data sets | Zones (hierarchy) | Facies | |
---|---|---|---|---|---|---|
count | 8514.000000 | 8514.000000 | 8514.000000 | 8514.000000 | 8514.000000 | 8514.000000 |
mean | 141.396486 | 85.035714 | 0.168533 | 1.455720 | 0.863636 | 1.596547 |
std | 120.970665 | 13.087800 | 0.049188 | 1.176126 | 2.266894 | 0.822931 |
min | 3.478512 | 39.594429 | 0.071062 | 0.000000 | 0.000000 | 0.000000 |
25% | 73.698095 | 77.603285 | 0.119500 | 0.000000 | 0.000000 | 1.000000 |
50% | 110.966671 | 84.163486 | 0.176083 | 1.000000 | 0.000000 | 1.000000 |
75% | 171.878377 | 90.620266 | 0.213134 | 3.000000 | 0.000000 | 2.000000 |
max | 2800.810507 | 139.570190 | 0.328864 | 3.000000 | 8.000000 | 4.000000 |
A dataframe containing only the samples from a defined dataset can be returned using the dataset_name parameter.
[22]:
inv.as_dataframe(dataset_name="B2").describe()
[22]:
Perm | Gamma | Porosity | |
---|---|---|---|
count | 2317.000000 | 2317.000000 | 2317.000000 |
mean | 135.761381 | 81.563077 | 0.182965 |
std | 108.943311 | 11.467934 | 0.048075 |
min | 3.478512 | 46.375515 | 0.090809 |
25% | 71.862079 | 75.034622 | 0.122807 |
50% | 107.176873 | 80.604904 | 0.210643 |
75% | 164.449917 | 86.415085 | 0.220046 |
max | 1471.192789 | 135.338440 | 0.275775 |