The current situation has proven challenging for the oil and gas (O&G) industry, especially with low prices and an environmentalist and globalist agenda that wants to return the world to the pre-industrial era. Under such unfavorable conditions, small to medium oil and gas companies, in particular, need to find ways to keep their businesses running. To achieve this, they must maximize their resources and make the most of every penny of their budgets.
Introduction: Around the world, billions of barrels of oil, evaluated in trillions of dollars, remain trapped underground in thousands of mature fields. The question is, how this worth could be extracted efficiently and at the lowest costs? Here's when Data Analytics techniques and methods come in handy, supplying services and tools that can get the most out of the most valuable asset of any company u operator: its data. The goal is to use Analytics to extract critical and actionable insight from the siloed available data (no matter the volume or data type) not fully exploited. All in a very short time.
Following the previous use case, in this post, I will introduce a cost-effective Cloud-Based End-to-End Analytic Solution designed and implemented to address the issue outlined in the latter paragraph. As I will describe next, it utilizes advanced Data Preparation, Machine Learning, and Advanced Visualization techniques.
Solution Architecture: The solution pipeline is illustrated in the figure below. It was designed and implemented in the Google Cloud Platform (GCP). First, an input CSV format file was uploaded to Google Cloud Storage (GCS), then refined in Google Cloud Dataprep (GCD), published in GCS, and connected with Google Colab. Machine Learning (ML) techniques such as PCA and binary classification (well's classes: OPEN/CLOSED) are performed using the R (or Python) language. A Google Cloud Function was also deployed to trigger GDP when a fresh CSV format input file is uploaded to GCS. The previous post contains comprehensive details about the various GCP's tools and services.
ML results are published back in GCS and accessed to be blended in GCD with the other relevant data. Finally, the refined/combined up-to-data file (containing the predictive probabilities) is published in Google BigQuery, where it is connected with Looker Studio to carry out Descriptive Analytics and Advanced Visualization, serving results as fully interactive visualizations and dashboards that enable users to extract actionable insights with a few clicks.
The input data is from an official CGC Company's publicly available, not fully exploited dataset that comprises well data and oil, gas, water productions, etc., of around 3 thousand wells from mature Oil fields in San Jorge Gulf Basin, Chubut Province, Argentina. This basin is characterized by high water-cut oil production. The challenge is to identify the wells currently awaiting repairs, workovers, or similar conditions and predict their likelihood of success if they are intervened and reopened for production. This insight can be valuable in guiding decision-making, allocating resources, and maximizing the operator's budget.
As pointed out in other posts, data preparation is the key to any Analytics project. Indeed, GCD empowers the user with a thorough comprehension and supervision of the data preparation process. It also provides features that simplify the automation of these typically intricate and time-consuming tasks. The image shown below provides an overview of the typical GCD data lifecycle.
Unleashing Machine Learning: Principal Components Analysis (PCA) was performed on the 29 original variables (both numeric and categorical) to identify the most relevant variables for a binary classification process. Wells currently in production were labeled with the class "OPEN" (well's classes: OPEN/CLOSED). PCA was carried out in Google Colab using the R language. Results can be published in GCS or BigQuery.
Analysis of PCA results - The two plots below aid in selecting the most impactful variables. Domain expert knowledge should be used:
The first plot assists in identifying the most impactful categorical variables, such as "Yacimiento (reservoir)," "Area," "Tipo_Pozo (well type)," "Form_Prod (formation production)," "Sist_Extract (well's extraction system)," etc., the variables depicted further away from the ellipse's focus.
The second plot is designed for selecting quantitative or numerical variables. The variables that have the longest or readiest arrows, such as "PT_m" (total depth in meters), "Prod_Acum_Pet_m3" (cumulative oil production in m3), etc., are the ones that will have the highest impact or contribution.
In the end, I was left with 17 variables that could be used for the binary classification task to be carried out in R in Google Colab. I considered three different algorithms: RANGER, RANDOMFOREST, and XGBOOST. After tuning parameters, evaluating metrics, and comparing their results, I found that XGBOOST was the most effective algorithm to calculate the predicted probabilities of opening wells currently "CLOSED." The results were published in Google Cloud Storage (GCS), combined in GDP with the other relevant data, and then published in BigQuery (for enhancement dashboards and visualizations response). Completing the dataflow, the data is now readily available in Google Looker Studio to create responsive, easy-to-digest interactive dashboards and visualizations and a Recommendation Engine.
Serving the Results - Advanced Visualization and Recommendation Engine: To better understand the results obtained so far, the first panel of the solution offers an easy-to-digest, fully interactive dashboard to explore the data. It includes maps that display the well-opening probabilities and cumulative fluid productions, a table that showcases relevant well features, word clouds, and filters that allow the user to visually and quantitatively slice and dice features and probabilities with a few clicks.
The visualization below shows wells "likely" to be successfully opened in the CERRO WENCESLAO Area. The selection process could involve setting maximum values for Cumulate Water Production (CWP), the Total Depth of the wells (TD), etc., if required. The Top-15 wells are also presented in the table and can be downloaded in CSV, Google Sheets, or Excel format. Further analyzing probability and fluid maps, the current list of wells could be reduced.
The solution second panel serves the Recommendation Engine. It includes maps displaying the well-opening probabilities and cumulative fluid productions, a table with relevant well features, a word cloud, and filters to slice and dice the data. It also includes a combo chart to visually and quantitatively drill down and compare fluid productions and probabilities by Area and formation Production to individual wells. The latter allows extraction of actionable insight to, for example, pin down well candidates for Water Shut-Off and conformance or other enhancement interventions.
A practical example. The image below considers the same Top-15 "likely" wells in the CERRO WENCESLAO Area. As the combo chart shows, Wells CW-1035 and CW-2070 are suitable for performing Water Shut-Off or other production enhancement procedures if they were finally opened. Once again, the visualizations can assist in further filtering out wells from the list. Insights like these are crucial for allocating resources, optimizing the operator's investment budget, and increasing profit.
Wrapping up, as a contribution to efficiently and cost-effectively extract remnant reserves from mature oil and gas fields:
A cost-effective, end-to-end, cloud-based analytics solution was successfully designed and deployed using some Google Cloud Platform tools and services.
The data was from an official CGC Company's publicly available, not fully exploited dataset of around 3 thousand wells from mature Oil fields in San Jorge Gulf Basin, Chubut Province, Argentina.
By applying advanced data preparation techniques and Machine Learning methods such as PCA and Binary Classification, predicted probabilities of currently "CLOSED" wells being successfully opened to production were evaluated.
Finally, the results were presented in Google Looker Studio as intuitive, easy-to-understand dashboards and visualizations that enable reservoir engineers and analysts to extract essential and actionable insights with just a few clicks.
The gained knowledge and insights can be utilized immediately to support decision-making, allocate resources, optimize the operator's investment budget, and increase profits.
I hope the presented example of Analytics Solution, which utilizes multiple Google Cloud Platform Services and Tools, will assist reservoir engineers and analysts in gaining deeper insights and extracting more knowledge from underutilized datasets in the oil and gas industry. Want to learn more or carry out a One-on-One user test?
Don't miss out on the posts that are coming up! Stay tuned, and be sure to catch them all.
Comments