Escalation engineer rotation

The escalation engineers (EEs) temporarily rotate onto our Customer Engineering & Support team for a fixed period of time to help solve the most complex and technical customer issues.

Contact

Tag @escalation-engineers on Slack for help.

Discuss escalation engineering rotation in #escalation-engineering.

Rotations

RotationEngineers
to @keegancsmith, @BolajiOlajide
to @keegancsmith, @valerybugakov
to @tbliu98, Thorsten Ball
to @jhchabran
to William Bezuidenhout
to Camden Cheek, Julie Tibshirani ,Manuel Ucles
to Beatrix Woo, Jacob Pleiness ,Manuel Ucles
to ?
to David Veszelovszki, Milan Freml, Idan Varsano
to Gary Lee [AMER], Taras Yemets [EMEA]
to Marek Zaluski, Dave Try
to Noah Santschi-Cooney, Petri-Johan Last
to Cesar Jimenez [PST]

Benefits

  • EEs provide extra engineering capacity dedicated to solving important customer issues to speed up resolution.
  • EEs get exposure to people on other teams, broader business priorities, and the broader service of our product and codebase.
  • EEs share their knowledge of the product and our codebase with all members of Technical Success (CE, IE, TA, SE) so we are continuously upleveling our ability to support all customers.
  • EEs directly experience customer pain and gratitude, and they can bring that valuable perspective back to their team to uplevel engineering.

How it works

This is a new program, so any and all of these points are flexible, but here is what the initial sketch is.

  • ~2 software engineers are on the escalation engineer rotation at any given time.
    • It’s preferred for us to have broad timezone coverage (and not have all current EEs in/near the same timezone).
  • Rotations are 4 weeks (negotiable depending on business/team/individual circumstances)
  • Becoming an EE is voluntary for any given engineer at any given time; however, it’s imperative to our customers and business so performing a support rotation (or current/future equivalents) will become a periodic expectation for all engineers and necessary for promotions (e.g., from IC3 to IC4) in the future.
    • We won’t let the number of opportunities to perform a support rotation limit the throughput of promotions.
    • We’ll formalize this expectation in the future as we learn about what works best.
  • If you choose to do a support rotation, we’ll make sure your eng team is supported and has their scope of work reduced proportionally.

How to participate

If you have questions or want to participate, please post in #escalation-engineering or add yourself to the ee rotation sheet and update the handbook page.

What does an Escalation Engineer do?

The idea behind the Escalation Engineer role is to spearhead efforts on support issues that are critical for the company and get them resolved, regardless of the ongoing context and current organization. In essence, an Escalation Engineer act as a joker card, that gets pulled when the current organization and processes are not sufficient to provide a solution quickly enough and the risks are big enough to justify their involvement.

This means that ultimately, the Escalation Engineer is responsible for solving the issue, and it’s on them to mobilize others to ensure enough resources are allocated for it to succeed. They effectively become the DRI for the support request and do whatever is necessary to get it done. They may have to explain to other teams whose involvement is needed why this is an important matter, feeding them enough context so they can repriorize their backlog for example.

As an escalation engineer, you’re effectively becoming the DRI (and tech lead) of a support request when you accept it. This means you:

  • Own the support request.
    • It is now your issue. You’re not helping someone to fix their issue, you’re helping them by taking it out from them.
    • The person who reported it may still work with you, but you’re helping them by taking the lead.
    • Transition support requests to the next rotation in partnership with SE.
  • Become the single point of contact for the entire duration of your involvement.
    • You providing regular updates.
    • You check on whoever is involved, making sure they have whatever they need.
    • Remind everyone involved that this is an escalation engeneering issue and explain what is at stake.
  • Seek help if needed, reaching out whomever can help.
    • You provide them the full context and help them to understand why it matters and get the buy in to help.
    • You escalate as soon as possible if you’re cornered and cannot get the help you need. If something is in the way, find a solution. Do not stop moving.
  • Lead the implementation effort.
    • If you’re stuck, get some help! If you someone who knows this topic very well, make it easy for them to help you.
    • Even if you’re not familiar with the topic, you can still lead and take away that burden from others, allowing them to focus on solving the technical detail while you keep everyone synchronized.
    • If you find yourself to not be able to lead, you are responsible for finding a person to replace you.
    • You keep the support request alive, you raise the alarm if it’s stagnating or ask to close the support request if the context changes.

Escalation Engineers will work on low effort “paper cuts” projects, typically scoped to 2-3 weeks maximum. These projects will primarily consist of product gaps identified and prioritized by GTM with input from Engingeering & PM. Customer ad hoc issues still remain the P0 for the EE, with the projects as a P1.

Tips for new escalation engineers

  • A good starting point to get context on existing customer issues is the GH board. Start here and reach out to SE for latest info.
  • Ask around, everyone is thankful that you are taking ownership of this role and will be more than happy to assist. You do not have to come up with the answer or understand the context entirely on your own.
  • Asking what happens if the request is not addressed is a good way of getting to the importance of an ask. It will help you to delegate the task if you need to, because you’ll be able to convey that context to the new owners.
  • When you do (since you’re wearing the esc-eng hat) link back to the original ask, this will help others to understand that what you’re asking is important.
  • Post some updates in the thread where the original ask came from (e.g. “I got in touch with X”, “we think we may have a solution”, etc)
  • Keep track of issues that you’re working on using this running doc this will ensure that everyone has some context on what is being worked on and help with transitions from one rotation to another.
  • Your own team is your primary circle, so don’t hesitate to solicit them for help / feedback, but don’t limit yourself to just your team. Anyone can help you.
  • If a request is coming from execs, it does not mean you can’t say “no”; in fact, that’s often one of the biggest mistake that can be made with them, i.e saying yes to everything just because of their position. They’re expecting to be told “no” if something is not possible, as you would with anyone else.
  • Don’t overthink it, even if you’re finding yourself in a position where you can’t implement the solution for a request because it’s out of your reach technically, you can still be the one coordinating others into solving it. And that’s very very helpful: it means you’re owning the problem, but getting help for the solution.

Rotation Handover

  • Each rotation is responsible for ensuring a smooth handover.
  • Include SE in the handover sync given they own the support side of the customer issue.

FAQ

How are EEs different than Support Engineers?

Support Engineers (SE) are the first and primary responders to reactive customer issues and that is unchanged. Whereas today, when an SE needs additional guidance on a customer issue they request help from the respective Eng team, they’ll now elevate to an EE who will engage on the issue. If the EE is unable to resolve the issue, they will follow standard procedures for requesting help from the appropriate Eng team. Our goal is to only engage Engineering on 10% or less of issues.

When will engineering teams get pulled in to respond to support issues?

If EEs are unable to resolve an issue, then they will use standard procedures to ask for help from engineering.

We want engineering teams to own their stuff running in production. Doesn’t the EE role violate that principle?

Engineering teams should be improving observability and alerts for their product area and owning how it runs in production on our Cloud managed instances. But automatic monitoring and alerting doesn’t (and won’t ever) cover all customer issues, and that’s where our Customer Support team comes in.

Examples:

  • When the customer is “doing the wrong thing” despite the docs and product working correctly (and our front-line support isn’t able to diagnose/explain the problem/solution).
  • When the problem overlaps multiple eng teams’ ownership areas, the EE can split up the issues and file them on the relevant teams, and then ensure follow-through on the overall solution.
  • When the source of the problem is unclear, the EE can diagnose and repro the problem before handing it off to the correct team they’ve identified.
  • When the problem is a quick fix (to code or docs), the EE can prevent an eng team from needing to context switch. The EEs can help build better automation and debugging tools for Customer Support.

What’s the interaction of EE with Implementation Engineers (IEs)?

During initial implementation:

  • Implementation Engineer leads efforts to get customer to GA (+ x days of buffer for stability)
  • If IE experiences issues and can’t resolve, EE is there to help

Post GA:

  • Assigned SEs are the first response to customer-reported issues
  • If SEs experiences issues and can’t resolve, EE is there to help

Will escalation engineers fix bugs?

Yes, sometimes (time permitting and when it makes sense). Sometimes it will make sense for the appropriate eng team to fix the bug instead. If the EE fixes a bug, the owning eng team should review the PR. We can define clearer boundaries here if it’s needed.

Who do EEs report to?

This is a temporary rotation so EEs continue to report to their existing engineering manager.

As an escalation engineer, what happens when I’m not on rotation anymore to the things I did?

The standard approach to ownership applies here: you’re owning the solution and if you don’t want to or can’t anymore, you’re expected to find a new owner.

What are the availability expectations for EEs?

Escalation engineers are expected to be available during their normal working hours. They are entitled to normal time off (PTO, sick days, etc) and should not be expected to be available during those times. However, we try to pick EEs so that they are not away for long periods during their rotation, and also in a way that they cover normal AMER and EMEA hours as a pair.