This event has ended. Visit the official site or create your own event on Sched.
Back To Schedule
Monday, July 11 • 14:20 - 14:40
How to Improve Your Service by Roasting It

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

In many companies, including Microsoft, SRE is not yet an integrated part of the operational landscape. Instead it is being actively adapted into mature companies. Our team has been working to develop new and interesting ways to introduce SRE and its tenets to an organization with many different operational approaches ranging from IT Ops to DevOps.

The process of introducing SRE has proven to be quite complex and socially delicate: you can't go in to a team and just tell them they are doing things wrong. You need to find the right way to show a developer all the warts on their baby and motivate them to work with you on addressing them. Furthermore, you have to deal with their earnest desire to treat you as "just another ops team" who is only there to take the pager from them.
One of the tools we've used to enable the right conversations is to hold what we call a Service Roast. Named after the famous friar's club roasts, the goal is to establish a safe environment to dig into and expose those warts, wrinkles, design flaws, shortcomings, and problems everyone knows a service has but doesn't want to talk about. We can't help you if you won't tell us where it hurts.

To perform the Service Roasts, we've discovered some process, ground rules, a new role of impartial moderator, and some useful structure to host this kind of meeting. Thus far we've been able to obtain great insight into some of our services and more importantly created some very interesting (and lively) conversations.

To be sure, this is a high-risk activity, and shouldn't be done without careful consideration of the teams participating, but we'll present what we've learned about holding these roasts, guidance teams need for successful participation, and (importantly) why we don't use this approach everywhere.

avatar for Jake Welch

Jake Welch

Principal Software Engineer, Microsoft
Jake Welch is a Site Reliability Engineer on the Microsoft Azure team in NYC. He has worked on large scale services for a decade, in both dev and operational roles. At Microsoft, he primarily works on infrastructure services with focus on Storage and Security.

Monday July 11, 2016 14:20 - 14:40 IST
Pembroke Room