top of page

RUNBOOK

Enabling users to have a place to collaborate in troubleshooting issues in order to decrease resolution time.

Situation

Customers told us that they struggle with the following pain points when it comes to troubleshooting:

  • ​The Platform Engineering team is spending too much time investigating and helping App Development teams to solve application problems. In other words, the Platform Engineering team wants the App Development team to be more self-serve.

  • When a customer files for support for a critical issue, the customer does not know what relevant data to send to support, which adds to more customer frustration.

  • Customers are frustrated that the recommended steps to resolving issues are not immediately available or easy to find.

  • It can take too long to understand the impact of an issue.

Business Opportunities:

  • As we were merging two platforms, how might we provide value in observability for users without becoming or competing with observability tools?

  • How might we consolidate different products used for observability in Tanzu Application Service (Our open-source, scalable platform as a service (PaaS) cloud application that allows users to manage Kubernetes-backed container services) into one single tile and  provide observability for multi-foundation?

Cross-team Overlapping Goals

​As a design manager, who was managing the design work for both teams, I noticed there were overlapping goals by both teams around managing alerts, so I brought the teams together to collaborate and understand roles and responsibilities for each team.

Task

Create a troubleshooting Runbook that reduces overall mean-time-to-resolve (MTTR) by reducing the time to find relevant data in metrics, logs, KB articles and documentation and analyze data for efficient consumption and incident  resolution.

Actions

2007-2010

Collaborate with multiple teams whose goals overlapped to figure out how the goal connected to the hypotheses of the user opportunity. 

 Map out an initial user flow and assign teams ownership for the different parts of the experience.

Help define requirements based on customer feedback and current alert experiences from different parts of the platform.

Allocate designers and help scope a feasible timeline.

Ensure design work gets delivered on time.

Review design work to ensure design quality and consistency with the rest of the product.

Prioritize and help plan research.

Advocate for users by supporting prioritization of user experience improvements based on research findings.

Manage two designers in the project.

Results

Alert Context

Customers can see when and what triggered the issue (alert) and understand the overall context

Business Impact

Customers can understand business impact to decide on resolution priority, and to reduce noise for teams.

Customer Feedback

"This will help us reduce the troubleshooting time so that we can come to the solution faster. Definitely a good feature."

Support Feedback

"If I get woken at night for an event call, the first few hours are wasted trying to get data, so you do emotional management instead of troubleshooting. This will help us help them faster."

Customer Feedback

"This will be very useful. It would save us time to troubleshoot (smile)"
Brush Strokes

Research Learnings

Observation:  The trends are helpful to see, because you can seen if the time is creeping up for flapping alerts. Users want to query it, dig in more, and adjust the time window.

Recommendation: Allow users to interact with the charts.

Action: Provided a way for users to select different time for the charts, which could update the other charts in the Run Book. Also, added the ability to enlarge each chart.

Screenshot of a runbook chart showint the time adjust and expand features
Screenshot of time adjustment menu where the user can select a time range to display in the chart.

Contact Me

  • natalia linkedin
  • natalia facebook
  • natalia instagram

Name *

Email *

Subject

Message

Success! Message received.

bottom of page