Reading Causal Graphs

August 15, 2018 Peter Pham

As the complexity of modern distributed applications increases, problems are becoming increasingly difficult to diagnose. Traditional monitoring and the modern observability trend focus on increasing the quantity of available data. Although this makes it possible to answer most potential queries, it increases the number of possible queries exponentially. Devops still needs a way to see what queries matter for the problem at hand.

NMLStream’s Causal Graph leverages AI to surface the queries that matter. The causal graph models the relationship between possible queries and continually updates these relationships as metric data streams in. When a problem occurs, the causal graph traverses the most relevant queries based on recent data and leads the user to the root cause. This traversal mimics the reasoning process devops follows in their daily operation. The interpretation of one query’s output informs what query makes the most sense next. Let’s see this in action on real data:

When there is a problem, the causal graph initially displays the problem in the middle of the screen, with all potentially relevant queries in orbit. Interpreting all of these queries is a daunting task. This is the challenge that DevOps faces today.

As the causal graph receives more data, it emphasizes the most important queries and discards irrelevant queries. Important queries become larger and the connectivity between them is shown. The cloud of potential queries in orbit has cleared as many queries are eliminated. The thick, orange path shows the most relevant queries for the current problem.

In minutes, the most causal path extends through the web tier of a distributed application to the compute tier. The blue sectors represent boundaries in the system. In this case, we see boundaries between services, however, the causal graph can organize queries based on any dimension users care about as well as suggest the most relevant dimensions for the currently significant queries.

At this point, the cloud of possible queries has almost entirely dissipated. The causal graph has guided the user to through the most causal queries in the system. By showing what matters, NMLStream’s AI powered causal graph eliminates the need for humans to spend valuable downtime considering, executing, and interpreting queries.

August 15, 2018 Peter Pham

NMLStream

Copyright
All rights reserved