The System of Interests…


Representation of Diverse People By: Geralt (Pixabay License), 2023

The Questions of Interest…

My main focus of the blog is understanding the benefits from systems engineering practices that support the data science “data”.

As a practicing systems engineer I constantly ask myself many questions when characterizing developing system. Some of main questions are (in no particular order):

  • Q1. What is the System of Interest(SOI) that I care about?
  • Q2. What abstraction level am I viewing the system?
  • Q3. What context am I looking at it in?
  • Q4. Who are the main stakeholders?

As a current data science student, my school projects consist mainly of machine learning models. Even still, I tend to ask myself the same questions during the model development process.

Question 1 and 4 are the two that I struggled to align best with my current data science studies.

Q1. What is the System of Interest (SOI) that I care about?

In production, the machine learning models that I develop could be considered one component of the larger system. But at the level and context I am focused on there more SOI’s. At a minimum the data should be considered as well!


Dr. Christian Kastner’s Fifth lecture of the Carnegie Mellon course “17-445/645 Software Engineering for AI-Enabled Systems”, Summer 2020

Data scientist are taught to treat data with the same of the care and nurture she deserves. I know a lot of system engineers who are not used to data being a new member of the family. They are not used to having to look after the new baby named “Data”.

For the remainder of this blog our primary SOI focus will be Data (from the perspective of Supervised Learning).

By @cool_ass_mom_of_4,Memes.com, 2021

Q4. Who are the main stakeholders?
A stakeholder is anyone who has an interest in the system of interest. So far in my projects the main stakeholders have been me, the model developer/data engineer and the “imaginary” client and SME.

As a systems engineer, the larger systems have a lot of interested stakeholders. Again, a lot of primary stakeholders are interested in the holistic system throughout the development lifecycle more than the smaller components like the machine learning (ML) models. As the frequency of ML models increase and are incorporated into the new systems development, there will be an increase in concern from more parties for the data throughout the lifecycle of the system as well.

US DOT, Systems Engineering V

The Questions at Hand…

Currently in the data science lifecycle, data is acquired, explored, cleaned, transformed and used model training and evaluation. System Engineers would benefit from using some of the SE principles more in depth on the “Data”, not just the larger system or the specific models. The above pictures are examples of system engineering and technical management principles. Highlighted below are a few aspects of the SE and Technical Management Process that I see would be a great benefit to incorporate more intentionally into data management. These are not all inclusive or even novel but aspects I’ve recognized from my involvement in both.

Lifecycle Management of Data– Data needs to handled like a system going through its own process from concept, development, integration and maintenance, not just a support to the model.

Requirements – Data should include constraints and needs on the data from more stakeholders than the SME, data scientist and client.

Risk Management – Possible understanding some of the risk of the decisions and constraints on data. Some types of risk stem from lack of data, imbalanced data, or even the lack of domain knowledge. These issues impact the model’s capability and the larger system.

Configuration Management – As data is explored and modified for different tasks and models it is important to have understanding of the configuration and versions for the associated purposes. Data manipulation is a major part of machine learning, keeping track of main configurations is a definite need.

Decision Analysis – Analysis on data decisions and rationale should be understood throughout the system , model and data lifecycle.

The Wrap Up…

The SOI’s process and stakeholders are important elements in handling data as it needs. Stakeholders should be inclusive of different backgrounds, experience and knowledge as well as collaborative and able to be involved in the data lifecycle throughout, not just when their SME role is needed.

In efforts that involve data science and machine learning, data is just as important (if not more) than the trained machine learning model. Recording the rigor applied to the data throughout it’s lifecycle is important. The more complex and data intense developed systems are, the more complex and faster the process and management around the data should be.

“Data that is loved tends to survive.” — Kurt Bollacker

As the author continues to understand data science and practice systems engineering, she definitely sees the need for AI/ML models supporting systems engineering and data management processes. This will be a topic for another time. 😉


Leave a Reply

Your email address will not be published. Required fields are marked *