3 Events catapulted me into Data Analytics and made me fall in love with it

Angel Aponte
Feb 18, 2023
8 min read

Updated: May 24, 2023

When and how did I discover the world of Data Analytics? Here is my story... After extensive research in academia and geomodeling in the Oil and Gas Industry careers, I shifted my attention to cutting-edge technologies such as Artificial Intelligence/Machine Learning (AI/ML), Advanced Visualization, Virtual reality, and Cloud Computing. All the while my passion and goal remain to get the most value out of data, solving complex and diverse, real-world problems that can have the biggest impact.

Why work on real-world projects?

Experience in solving real-world data
Improvement in critical thinking
Industry knowledge
Valuable addition to your resume/career

So, I would like to share a few stories of this data journey with you. It hasn't been a straight-easy walkthrough, but a tortuous and sometimes painful path. To facilitate the exposition, I have divided it into four main milestones:

The origins
Traffic jams modeling and simulation
Geomodeling
Data Analytics: challenges and opportunities

Please relax and follow me for a while; hoping you will enjoy these stories as much as I have enjoyed writing them.

The origins

In my last year in college and after having my degree in Physics and armed with perhaps my most potent weapon, namely, SIMILAR PROBLEMS IDENTICAL SOLUTIONS, I was particularly interested in electromagnetic theory and Maxwell's Equations; in how to solve these partial differential equations and obtain closed solutions, to understand electromagnetic wave propagation and several other problems related to its applications in real-life situations.

Indeed, the understanding and the solution of this set of partial differential equations shed light upon nature's secrets; it was absolutely fascinating and I was determined to continue dig-in into the matter. Certainly, the results of the lengthy and difficult calculations must be checked against experimental measurements (DATA). However, for me at that point in time, dealing with measurements/data was the job of other people.

Unfortunately, the number of problems in Mechanics, Electromagnetism, Fluid Dynamics, etc., that have closed solutions, the real-world problems, only represent a small fraction. Nature always likes to hide her secrets and enjoys so much challenging human intellect to the limit...

Traffic jams modeling and simulation

So, to broaden the scope and increase the number of real-life issues to be addressed, I realized that it was imperative to incorporate additional methods and techniques (and more measurements/DATA!). Here is where statistics' tools come in handy along with mathematical formalism, as in Quantum Mechanics, Statistical Mechanics, and Quantum Statistical Mechanics. So I was dragged into Programming, Data Visualization, and Numerical Analysis, among other complex and interesting topics.

Again, nature never stops raising the stakes and challenging human intellect; there are so many complex problems, like the Modeling of Traffic Jams, in which the best approach is, most of the time, from the perspective of the measurements/DATA, stochastic modeling, and computational BRUTE FORCE; pretty far from the mathematical elegance of the Hilbert Spaces and partial differential equation exact solutions. However, it's worth a try.

Therefore, I couldn't help it to get involved and dive deeper into this fascinating subject and work on a few projects and the publication a handful of papers (see, for example, Inappropriate Use of the Shoulder in Highways - Impact over the Increase of Gas Consumption by A. Aponte et al.). My formula, SIMILAR PROBLEMS IDENTICAL SOLUTIONS, proves to be very effective in achieving these goals.

Geomodeling

While enjoying Cellular Automata and traffic jam modeling, I was hired by an Oil and Gas company. Taking advantage of my experience in mathematical and stochastic modeling, I started working on Geomodeling or Geologic Modeling. Figure below illustrates a typical geomodeling workflow. But What exactly is a Geomodel? In its simplest terms, it's a spatial representation of the rock porosity, permeability, and hydrocarbon saturation in a reservoir.

Ultimately, geomodels are consistent 3D representations of a WIDE RANGE of DATA and knowledge relevant to understand the hydrocarbon systems.

Geomodelling is commonly used to manage natural resources, identify natural hazards, and quantify geological processes, with main applications to Oil and Gas fields, groundwater aquifers, and ore deposits.

In the Oil and Gas industry, REALISTIC geologic models are required as input to reservoir simulator programs, which PREDICT the behavior of the rocks under various hydrocarbon recovery scenarios. A reservoir CAN ONLY BE DEVELOPED and PRODUCED ONCE; therefore, making a mistake by selecting a site with poor conditions for development is tragic and WASTEFUL.

As mentioned, the geomodeling process comprises a wide range of data and specialized software, each offering similar difficulties in model construction and standalone user experiences. So, now I was in a situation where the most complex and time-consuming step wasn't necessarily the modeling process itself but the gathering, depuration, and adequation of the (typically incomplete) relevant inputs. Gravity center has now changed toward data. Building a REALISTIC geomodel under such circumstances is pretty challenging. Realistic geomodels are a must for reliable PREDICTIONS: making mistakes is wasteful. This was pivotal in my data journey, a point of no return, and I was determined to go forward and tackle this and future data challenges.

How these and other data challenges were addressed? First, Excel formulas and obscure macros; cryptic scripts in C or Matlab languages (and lately, in R programming language or Python). A messy file agony! Limited, no-scalable, time-consuming solutions at most, that when the volume of data, namely, the number of wells in the project was beyond a few tens, quickly become cumbersome and impractical. Time to get out of that box and explore other new data-knowledge-domains!

Data Analytics: challenges and opportunities

Exploring new data knowledge domains sounded very exciting. Great! But, where to start? I had no idea that to answer this question I would have to embark on the most extraordinary quest I have ever dreamed about. Indeed, Data Analytics is a very comprehensive topic that comprises a huge number of other also comprehensive topics, like Data Visualization, Programming, Statistics, Machine learning, Artificial Intelligence, and Probability Theory, etc., etc.; roughly grouped into Descriptive Analytics (about the past), Predictive Analytics (about the future), and Prescriptive Analytics (provides advice based on predictions); each of them a vast knowledge domain by itself. And all the matter is always evolving and fast-changing (amid the COVID-19 pandemic, it is evolving and changing even faster). It was (and continues to be) absolutely overwhelming!

So, I was required to commit all my experience and my secret weapon's full power (SIMILAR PROBLEMS IDENTICAL SOLUTIONS!) to address the challenges ahead. After a while (several months) of reading many articles and papers, watching videos, attending several webinars, online courses, etc.; I realized that the best strategy was to keep it as simple as possible and focus only on a handful of methods and techniques. After a while, I got a basic but solid background to unlock a few relevant use cases. And, If required, I go back, read a little bit, watch more videos/webinars, and move forward again; back-and-forth, back-and-forth... In summary, iterate a few times if it is necessary.

The next task was to pin down a few relevant real-life use cases and gather the necessary data to carry them out. However, I realized that to effectively apply any Machine Learning/Artificial Intelligence method or technique and obtain practical and usable results, it was imperative to clean, transform, reshape, refine, and blend the raw data first and ready it for analysis.

So, forget Excel macros and cryptic scripts! The time had come to move on to the NEXT LEVEL and start using a powerful cloud-based, scalable, interactive data tool like Trifacta Data Wrangling Tools and Software (part of the services of the Google Cloud Platform); it is the best choice to tackle any data challenge and get the job done in short times and at scale. To learn more about Trifacta, look at the great ebook by Ulrika Jägare: "From messy file agony to automated analytics glory," for a detailed introduction. Things were already set: it was time to unlock some real-world use cases. Let's get this done!

After a careful search, I identified some use cases in the Oil and Gas (O&G), Healthcare, Public Security, and Fitness Industries. I'll end this post with an example of the application of Market Basket Analysis and Association Rules Mining techniques to address mature oil fields' wells productivity issues. In future posts, I'll present other real-life examples. Please, stay tuned, and please don't miss them!

Market Basket Analysis is one of several techniques used by RETAILERS to uncover associations (or rules) between items. It can be carried out using the apriori algorithm along; using apriori together with the hits algorithm to perform Weighted Association Rules Mining or using a Machine Learning method as the Neural Designer. It works by looking for combinations of items that occur together frequently in transactions. Simply put, it allows retailers to identify relationships or rules between the products customers buy.

During the life of an oil/gas mature field, it's required at specific times to apply to some of its wells particular actions/interventions or WELL-EVENTS; to maintain or increase the well's production and the overall oil/gas field productivity. Now, if one imagines the RETAIL as the mature field, the CUSTOMER as the well, and the well-events as the ITEMS the customer has BOUGHT, it is not hard to conclude that it is possible to apply the same techniques and algorithms used by retailers, to uncover relationships or rules between the well-events. This analogy process (remember the potent weapon?) was also applied to address other use cases to be discussed in future posts.

Indeed, once the associations of interventions/well-events or rules have been uncovered, it's possible to go a step further and, after a similarity analysis, prescribe some of them, for example, to newer wells with no interventions/well-events, located in the same field; or even deliver recommendations to wells located in other fields.

At this point, the effort and a more in-depth understanding of data preparation techniques paid off: the key step, the secret ingredient, in the successful implementation of the workflow described above (and others to be presented in future blog posts) was indeed to clean, reshape and blend the available well-events raw data into an input file with a format suited for the Market Basket Analysis' algorithms. This allowed quickly extracting new additional actionable insight from the well-events data; new valuable knowledge that can be used directly and/or integrated smoothly into other traditional O&G workflows.

The table in Figure 4 above depicts the analysis results, ranked by the metric Lift. Lift summarises the strength of association between the well-events on the left (lhs: Precedence) and right-hand (rhs: Consequence) sides of the rule: the larger the Lift, the greater the link between the combinations of interventions appearing on both sides of the " ==> " symbol.

A practical interpretation of these results is as follows. Referring, for example, to the rule labeled [2] in the table, if a well has been intervened (to increase its productivity) with the combination of events {"Punzar_Ensayo, Punzar_Ensayo_Fracturar"}, then the following action or intervention recommended to be applied would be {"Ensayo_Estimular"}; and so on. This is an example of how Data Analytics methods can be applied to other relevant problems to add significant value. Now the production and productivity Engineers have at her/his disposal additional tools to distill knowledge from well-events data and support the decision-making process and optimize budgets.

To explicitly take into account the production fractional increment related to each intervention or well-event and include it in the calculations (Weighted Association Rules Mining), it is necessary to previously quantify the weights corresponding to each "transaction"/intervention; or evaluate these weights using, for example, the hits algorithm. This is a story to be told in a future post.

Summary

Wrapping up, this was an introductory fast-track of my data journey, from its origins, a time when I was fascinated and focused only on the mathematical formalisms and on problems with elegant closed solutions; the challenging and fast-growing knowledge experience when addressing the very complex issue of simulation of traffic jams; until now, the on-going tireless pursuit of the in-depth understanding of Data Analytics methods and techniques, to get the most value out of data and solve real-world problems that can have the biggest impact. I hope you've enjoyed reading these paragraphs as well as I enjoyed writing each one of them.

In future posts, I'll present and discuss more real-life interesting, and relevant use cases. Please, stay tuned and don't miss them. And kindly leave your comments and share. Thank you!