DevOps Guide

DevOps in Action



The Development Pipeline

Fundamentally, DevOps is about managing the flow of engineering work, from
the earliest phases of software design to implementation and
maintenance—this is known as the development pipeline. Like the sales
pipeline, the development pipeline acts to ensure a consistent flow of work,
minimizing inefficiencies and preventing bottlenecks, which cause spikes of
high stress interspersed by periods of boredom. No one wants to work like
that.

If talking about the “flow of work” evokes images of a factory
floor, you're not far off: “DevOps is like an assembly line for
software development,” says Gaurav Murghai. In fact, DevOps uses
principles taken directly from lean manufacturing—combined with
practices from agile development—to efficiently assemble software the
same way that car manufacturers assemble vehicles. We'll cover how this works
in the next few sections.



The Three Ways

A good basis for understanding DevOps is “The Three Ways”
outlined in The Phoenix Project, a 2013 novel co-written by
three expert DevOps practitioners. These principles draw extensively from
lean manufacturing and agile development practices. While additional models
for understanding DevOps exist, such as CALMS—Culture
Automation Lean Measurement Sharing—The
Three Ways remains one of the most influential. Since the essence of the
CALMS model is captured within The Three Ways, this explanation will focus on
the latter.

Imagine a factory floor: On one side, there are raw materials and at the
other, finished products. Between these two sides, the materials move from
station to station, gradually transforming into finished products.

It may be strange to think of software creation as part of an assembly
line—we're dealing with virtual products, not physical ones—but
like factory workers, engineering teams receive “orders” and have
to deliver “finished goods.” These are things like:

  • A new feature that the product team wants to
    implement
  • A bug users have complained about that needs
    to be fixed
  • An integration with another tool or partner
    company's product

To manage these “orders” efficiently, from initial request to
project completion, engineering teams can use some of the following
strategies:

1. The First Way: Systems Thinking

One of the core principles of lean manufacturing is that a build-up of
orders and excess inventory—in development terms, a backlog of
work—slows down productivity. The same is true for virtual products.
Letting work build up decreases worker productivity. Workers tend to
prioritize what's most urgent, which means that crucially important but
non-urgent work is neglected, causing problems to compound. This is what's
known as “technical debt.”

To take control of their backlogs, engineering managers have to understand
how work flows through their organization. One of the easiest ways to do this
is through visualization. In The Phoenix Project, the team creates a
“Kanban board” using index cards which
organizes all of their ongoing projects—from request to completion.
Today, most companies use virtual dashboards to manage requests, prioritize
tasks, and track their projects.

Visualizing work this way not only removes confusion—everyone knows
what their top priorities are—but it can also help managers identify
bottlenecks, minimize them, and ensure work flows smoothly from planning to
completion.

2. The Second Way: Amplify Feedback Loops

Having a system in place to ensure the smooth flow of work is just the
start. What happens when things inevitably go wrong? The second way is all
about detecting problems, resolving them quickly, and learning from them. The
purpose of these processes is to create a feedback loop that reinforces
quality from the earliest phases of software creation.

Here's how it works: When a problem is detected—whether through
automated error reporting or manual flagging—the top priority is to
resolve it. If this can't be done relatively quickly by a single person, the
entire team stops whatever they're doing, “swarming” the issue
until it's fixed.

At first glance, this might seem horribly inefficient. Stop everything for
one little problem? But containing problems while they're small and
manageable stops them from spiraling out of control. Think back to the
factory production line. One part of the system affects every other part. If
a piece of manufacturing equipment stops working and needs to be fixed,
allowing other parts of the production line to continue only increases the
backlog of unfinished work, causing future bottlenecks.

Swarming problems as they happen allows teams to learn from them and put
better systems in place. While this may temporarily slow down production, in
the long term, it continually increases work speed and quality in a positive
feedback loop.

3. The Third Way: Create a Culture of Continual Experimentation and
Learning

Culture plays an important role in creating an environment of ongoing
learning and improvement. In order to be able to amplify feedback loops,
engineers need to feel comfortable flagging issues and interrupting their
coworkers when a problem requires all hands on deck.

One way teams create a culture of experimentation and learning is by
applying agile development principles. Agile is ideal for DevOps because of
its focus on short-cycle timelines and consistent feedback. “In DevOps,
you work in small batch sizes,” says Greg Jacoby, Bright Development
Owner and Lead Developer. “You're never doing a massive crazy update,
you're focusing on producing value for the end user.” This style of
work results in better code quality, because frequent deploys allow
developers to get more immediate feedback from users and improve their code
accordingly. In order to execute agile effectively, teams use continuous
integration, continuous delivery (CI/CD)
.

Continuous Integration, Continuous Delivery (CI/CD)

Since multiple programmers work together across different operating
systems (and versions of those operating systems) to build software, they
need to automate the process of integrating and validating code changes to
support these multiple environments—this is what CI/CD does. It's a
DevOps best practice which creates consistent, automated processes to build,
package, and test new code. (And helps prevent programmers from uttering the
dreaded phrase: “But it works on my
machine!”)

In order to cultivate a culture of experimentation and learning, it's
useful to introduce problems into the system, on purpose. Netflix has even
created a tool—the aptly named “Chaos Monkey”—which
“is responsible for randomly terminating instances in production to
ensure that engineers implement their services to be resilient to instance
failures.” (If you want a little Chaos Monkey of your own, it's open
source and available on Netflix's GitHub.) These not only function as drills
to prepare engineering teams for system failures, but they also function as a
form of resilience engineering, ensuring safeguards are put into place to
protect software from catastrophic failure.



DevOps and Security

With DevOps' intense focus on increasing cycle speed, it might seem like
DevOps practices are at odds with security. In fact, nothing could be further
from the truth. Security is a critical component of DevOps because the
philosophy places such a high value on user experience. It's impossible to
create a positive user experience if customers are afraid to trust you with
their data. Some companies even refer to their DevOps philosophies as
DevSecOps.”

Teams practicing DevOps think about security considerations from the
earliest phases of product design. It's not something that's tacked on as an
afterthought—as Red Hat writes in their blog, “DevSecOps is about
built-in security, not security that functions as a perimeter around apps and
data.” Security checks are put into place at every phase of the
development cycle. These checks ultimately save time, because they catch
security vulnerabilities early in the process and don't leave teams
scrambling before a scheduled deployment.



DevOps Team Structures

We've covered some of the DevOps strategies used by teams to ensure smooth
collaboration between development and operations, along with other
stakeholders, such as security. But what should this team
configuration look like? According to Matthew Skelton and Matthew Païs
who—quite literally—wrote the book on
DevOps teams
(their book is called Team Topologies), there isn't one right answer.
How you configure your team depends on several factors, including the size of
your engineering department, your product offering, and your organizational
maturity.

Here are the most successful DevOps team “types” according to
the authors:

1. Dev and Ops Collaboration

In this model, Dev and Ops teams collaborate smoothly while maintaining
their individual specialties. The two teams share a clearly defined common
objective and engineers are comfortable seeking out members of the other team
to share ideas and ask for advice. Achieving and maintaining this kind of
harmony requires strong technical leadership and may necessitate a cultural
change in the company.

Best for: Organizations with multiple product streams and/or
development sub-teams
Example: Parts Unlimited, the fictional company featured in the
DevOps-inspired novel The Phoenix Project

2. Fully Shared Ops Responsibilities

This is the “operate what you build” or full-cycle model. In
this team structure, development and operations are merged into a single team
with a shared mission. The operations team ceases to exist as a distinct
entity because developers also take care of all operations
responsibilities.

Best for: Organizations with a single primary product offering,
usually web-based
Examples: Netflix and Facebook

3. DevOps Team with an Expiry Date

This team type functions as a transitional model to type #1 or #2. It's a
temporary solution used to create the culture shift needed to merge or foster
collaboration between distinct development and operations teams. The
temporary DevOps team eases the transition, acting as an advocate for DevOps
practices, with the goal of making itself obsolete once DevOps processes
become ingrained—ideally within 12-18 months.

Best for: Teams at the beginning of their DevOps journeys

4. SRE Team

This final model might not technically be a DevOps model, since product
development remains separate from operations. Rather, this structure, which
originated at Google, introduces a new team—the
Site
Reliability Engineering
(SRE) team—made up of developers with ops
expertise. After writing and testing their code, development hands it off to
SRE, not operations, to put it into production. Crucially, SRE can reject
code if it doesn't meet their requirements, ensuring that only high-quality
code is deployed.

Best for: Mature organizations with advanced engineering
teams
Example: Google