DevOps, SRE, DX: Getting clear on Terminology and Teams

DevOps, SRE, DX: Getting clear on Terminology and Teams

  • Start Date: 2020-07-24
  • Author: Rufus Pollock and Irio Musskopf
  • Status: Review
  • Related: 0002-dx-2020

Summary

  • Deprecate DevOps as a term
  • Service Reliabiliity Engineering (SRE) is preferred term, esp for the crew/team (which is led by Irio at present)
  • SRE covers 2 areas:
    • DX (Developer Experience): creating and improving the systems and processes for efficiently creating and running reliable solutions
    • Toil: day to day reactive support and maintenance
  • Support is a project. Product Owner is Irakli at present.
  • "You build it, you run it".

Basic example

"X project has been launched and is entering hosting / support. Going forward SRE will own it."

"We need to deploy a new app. I will do this myself and only contact SRE if I run into problems"

"How can we improve DX in the area of continuous deployment."

Motivation

Clarify and precision: DevOps is a bit imprecise and has stopped being developer ops and become a mix of various things. SRE is both more precise.

DX is also a useful new term.

Approach

Service Reliability Engineering (SRE) will be our term for the activity and crew that:

  • Ensure our systems run reliably, especially our hosted solutions for clients.
  • The Devloper Experience (DX) for our developers building and maintaining these solutions is great.

We will deprecate the term DevOps other than as a label for a type of task (deployment, maintenance etc) done by developers.

Deploying and maintaining applications is a responsibility of developers not a separate team: you build it, you run it.

At the same time, we need a "crew" who are:

  • High level experts (so escalation point for issues)
  • Design the DX – the systems and processses – that enable our developers to "run it" (and, to an extent, "build it")
  • Responsible for the running of the SaaS-like solutions (or solutions that have entered pure hosting mode)

SRE work includes 2 areas: DX and “Toil”

  • DX: Developer Experience = the experience of developers in carrying out their work, specifically in creating and managing DMSes and data driven applications. Can think of it as "Product" work for our internal systems/processes around development, deployment and management of services.
  • Toil: day to day support. Contrasts with DX "Product" work as it is interrupt driven and responsive.
    • 9-5 or 24/7 cover
    • Immediate response

Aim from Google book is that SRE team members spend less than 50% time on toil.

Commentary

DevOps is something people (developers) do. It is not an area, team or role within the org. Literally it is Developers doing Operations.

SRE = Service Reliability Engineering. This is term we will adopt going forward for a specific “team” and area of responsibility.

Support

We also have the concept of "support" (see http://playbook.datopian.com/support/). Support could usefully be divided into:

  • SRE: a system is down, not working. This may evolve into technical support if the issue is traced to a bug in the underlying solution.
  • Technical support: there is a bug in the solution (CKAN) that needs debugging and fixing.

Drawbacks

  • We are changing terms. SRE is not as common (?)

Alternatives

  • Sticking with DevOps and clarifying it

Adoption strategy

  • Create [email protected] and [email protected] email addresses (former is catchall anyone can write to, latter is for team)
  • Rename devops project to sre
  • Send email to all-team
  • Talk through on all hands (?)

Unresolved questions

None

© 2024 All rights reservedBuilt with Find, Share and Publish Quality Data with Datahub

Built with Find, Share and Publish Quality Data with DatahubFind, Share and Publish Quality Data with Datahub