As much as Aspirent loves helping our clients use data and analytics to solve really tough business challenges, it’s important we pull our eyes away from our code, take a step back, and connect the dots on where current trends are headed. The data science world moves fast and is an extremely diverse playing field, so capturing and tracking the biggest trends is an ongoing exercise—but we never stop trying. As we surveyed the data science landscape over the last several months and look ahead to the second half of the year, our team coalesced around five trends that already have significant momentum or are gaining steam rapidly.
Importance of an MLOps Strategy
Most large companies today are not only testing the waters of predictive analytics but have invested significantly in teams that have the expertise to develop predictive models in-house. As organizations create more and more of these advanced analytics models to predict retail buying behaviors or maintenance needs for manufacturing equipment, it’s essential they have a strategy in place for how and when those models are maintained, refreshed, and possibly retired. They also need to establish a data pipeline to enable those processes and allows them to happen quickly and efficiently. This is what MLOps is all about.
Without a coherent MLOps strategy in place, there is significant risk of one or more undesired consequences. First, models will sit in development and never get pushed to production, wasting the development effort entirely. Second, if it does end up in production, at some point it will become stale and outdated, reducing its effectiveness. Finally, and maybe most importantly, if the models are not producing on their potential value, the business will inevitably decrease long-term investment in ML.
Increasing confidence in predictions
Trust-building has been, and will continue to be, a much-talked-about issue for the foreseeable future. In the data science arena, building trust is contingent upon end-users’ understanding of, and confidence in, predictive models and their outputs. Building models that are interpretable, particularly in the business intelligence context, helps to overcome institutional skepticism and inertia that can impede adoption.
While it’s tempting to most data scientists to apply traditional black box models to a problem—which often require less up-front data discovery and statistical work—non-technical consumers of the predictive outputs still want to know the drivers. For example, we’ve seen this dichotomy play out in the modeling choices that are employed on the IT side of an organization vs. the business side. IT applications of ML tend towards less descriptive modeling techniques and prioritize the prediction itself over the insights gleaned from the inputs. On the business side, users tend to value the insights that come from more traditional statistical methodologies.
For example, in a recent client engagement, competing models were developed in an effort to predict tasking time. One was a well-curated statistical model and the other a gradient boosted machine (GBM). We worked closely with the client on the model development journey for both models, providing detailed insights on the differences in training data and the impact of each input on the prediction and the accuracy of the results. In spite of the GBM producing better results, they preferred to move the statistical model to production, as it was easier to explain to the end-consumer.
Data scientists need to keep this dynamic in mind when making modeling choices—working closely with end-users from the beginning to adopt a modeling approach that provides the clarity required to engender confidence. Either by layering in business logic or employing a rules-based framework alongside complex (and sometimes black box) AI models, or simply choosing a more traditional statistical modeling route, data scientists need to take business users on data journey. By delivering results that are intelligible and actionable to this broader audience, data scientists will be better equipped to build trust and advance adoption.
The Quest for the Renaissance Data Professional
While there will always be value in people who have deep expertise in singular areas, we’ve seen a growing number of requests for practitioners who have skills that span multiple domains. For example, they may ask for a data scientist who’s able to develop models, but also has experience bringing models from development through to deployment and realization. Or they need a data scientist who can take on part of a developer’s role to gain access to unstructured data through a custom API. Perhaps they want to layer visualization on top of a predictive model, and they need a data scientist to wear both of these hats. These Renaissance practitioners (as we call them) don’t necessarily have deep knowledge of any one area, but they’re often the ones who can “grease the skids” and unlock a team’s potential to move projects from one stage to the next.
Unsurprisingly, Renaissance practitioners—the ones that fit your specific list of needs when you need them—are incredibly rare. Finding the right person with the desired level of experience and a specific suite of skills at the right time can end up being too many stars to align. As a result, it becomes tremendously important, and a competitive advantage in the marketplace, for an analytics organization to be able to quickly and effectively parse which skills can be built from within and which skills truly need to be hired. As the landscape of analytics tools and techniques continues to grow, your Renaissance hire could increasingly be somebody that meets half of your technical requirements, with the other half filled by “eagerness to learn.” Organizations that are willing to invest in this eagerness via skill-building and professional development may just be able to realign the stars to their advantage.
Ethical Data Science
As data science gains greater influence in driving decision-making, organizations are becoming more aware of the need to consider the societal impact and ethical aspects of predictive models. In response, they are building out teams and policies to help them understand and manage the implications. A recent and notable example is Twitter stopping the use of AI to crop pictures because the cropping algorithm was found to have biases correlated to skin tone. There are more esoteric examples, too—like a power company using predictive analytics to guide the trimming of trees along power lines. How can the company ensure the data model they’re using isn’t biased to choose certain neighborhoods over others?
The societal impact of an algorithm that recommends neglecting some neighborhoods is not a trivial concern, so it’s vital to be exhaustive in our scrutiny of how they are built and the potential fallout that can occur. Many larger companies have hired social scientists to work alongside data scientists to evaluate the human impact of predictions, embed bias testing, and measure the overall effectiveness of the algorithms in use. Like MLOps, the monitoring of algorithms’ ethical implications requires constant attention—from design and development through deployment and beyond, including evaluation of the model’s impact. Why? Because bias can creep into a predictive model at any point, from the training data to implementation. Responsible organizations must be vigilant at all stages and ready to adapt to the changing realities of data models in action.
Business intelligence dashboards, apps, and other consumed data products have reached a level of maturity where ML solutions are being embedded in a way that makes the predictive nature of the model transparent to end-users. For example, a straightforward analytics dashboard that summarizes subcontractor performance could report on cost, work activities, safety record, and a host of other metrics that you might want to track.
Advancing that same dashboard beyond descriptive reporting and layering in advanced analytics allows the user to perform contractor optimization, decide which contractors should get which jobs, or predict how many safety events will occur given a change in circumstances. To the end user, the complexity of those calculations is transparent—embedded into the dashboard that they’re already using. As trust in predictive models increases, the embedded nature of advanced analytics will continue to become more common.
We would love to hear how these trends are playing out for you as a leader, creator, or end-user of these data science advancements. Feel free to reach out to me on LinkedIn and join the discussion.