DevOps Guide

DevOps in Action

The Development Pipeline

Fundamentally, DevOps is about managing the flow of engineering work, from the earliest phases of software design to implementation and maintenance—this is known as the development pipeline. Like the sales pipeline, the development pipeline acts to ensure a consistent flow of work, minimizing inefficiencies and preventing bottlenecks, which cause spikes of high stress interspersed by periods of boredom. No one wants to work like that.

If talking about the “flow of work” evokes images of a factory floor, you're not far off: “DevOps is like an assembly line for software development,” says Gaurav Murghai. In fact, DevOps uses principles taken directly from lean manufacturing—combined with practices from agile development—to efficiently assemble software the same way that car manufacturers assemble vehicles. We'll cover how this works in the next few sections.

The Three Ways

A good basis for understanding DevOps is “The Three Ways” outlined in The Phoenix Project, a 2013 novel co-written by three expert DevOps practitioners. These principles draw extensively from lean manufacturing and agile development practices. While additional models for understanding DevOps exist, such as CALMS—Culture Automation Lean Measurement Sharing—The Three Ways remains one of the most influential. Since the essence of the CALMS model is captured within The Three Ways, this explanation will focus on the latter.

Imagine a factory floor: On one side, there are raw materials and at the other, finished products. Between these two sides, the materials move from station to station, gradually transforming into finished products.

It may be strange to think of software creation as part of an assembly line—we're dealing with virtual products, not physical ones—but like factory workers, engineering teams receive “orders” and have to deliver “finished goods.” These are things like:

  • A new feature that the product team wants to implement
  • A bug users have complained about that needs to be fixed
  • An integration with another tool or partner company's product

To manage these “orders” efficiently, from initial request to project completion, engineering teams can use some of the following strategies:

1. The First Way: Systems Thinking

One of the core principles of lean manufacturing is that a build-up of orders and excess inventory—in development terms, a backlog of work—slows down productivity. The same is true for virtual products. Letting work build up decreases worker productivity. Workers tend to prioritize what's most urgent, which means that crucially important but non-urgent work is neglected, causing problems to compound. This is what's known as “technical debt.”

To take control of their backlogs, engineering managers have to understand how work flows through their organization. One of the easiest ways to do this is through visualization. In The Phoenix Project, the team creates a “Kanban board” using index cards which organizes all of their ongoing projects—from request to completion. Today, most companies use virtual dashboards to manage requests, prioritize tasks, and track their projects.

Visualizing work this way not only removes confusion—everyone knows what their top priorities are—but it can also help managers identify bottlenecks, minimize them, and ensure work flows smoothly from planning to completion.

2. The Second Way: Amplify Feedback Loops

Having a system in place to ensure the smooth flow of work is just the start. What happens when things inevitably go wrong? The second way is all about detecting problems, resolving them quickly, and learning from them. The purpose of these processes is to create a feedback loop that reinforces quality from the earliest phases of software creation.

Here's how it works: When a problem is detected—whether through automated error reporting or manual flagging—the top priority is to resolve it. If this can't be done relatively quickly by a single person, the entire team stops whatever they're doing, “swarming” the issue until it's fixed.

At first glance, this might seem horribly inefficient. Stop everything for one little problem? But containing problems while they're small and manageable stops them from spiraling out of control. Think back to the factory production line. One part of the system affects every other part. If a piece of manufacturing equipment stops working and needs to be fixed, allowing other parts of the production line to continue only increases the backlog of unfinished work, causing future bottlenecks.

Swarming problems as they happen allows teams to learn from them and put better systems in place. While this may temporarily slow down production, in the long term, it continually increases work speed and quality in a positive feedback loop.

3. The Third Way: Create a Culture of Continual Experimentation and Learning

Culture plays an important role in creating an environment of ongoing learning and improvement. In order to be able to amplify feedback loops, engineers need to feel comfortable flagging issues and interrupting their coworkers when a problem requires all hands on deck.

One way teams create a culture of experimentation and learning is by applying agile development principles. Agile is ideal for DevOps because of its focus on short-cycle timelines and consistent feedback. “In DevOps, you work in small batch sizes,” says Greg Jacoby, Bright Development Owner and Lead Developer. “You're never doing a massive crazy update, you're focusing on producing value for the end user.” This style of work results in better code quality, because frequent deploys allow developers to get more immediate feedback from users and improve their code accordingly. In order to execute agile effectively, teams use continuous integration, continuous delivery (CI/CD).

Continuous Integration, Continuous Delivery (CI/CD)

Since multiple programmers work together across different operating systems (and versions of those operating systems) to build software, they need to automate the process of integrating and validating code changes to support these multiple environments—this is what CI/CD does. It's a DevOps best practice which creates consistent, automated processes to build, package, and test new code. (And helps prevent programmers from uttering the dreaded phrase: “But it works on my machine!”)

In order to cultivate a culture of experimentation and learning, it's useful to introduce problems into the system, on purpose. Netflix has even created a tool—the aptly named “Chaos Monkey”—which “is responsible for randomly terminating instances in production to ensure that engineers implement their services to be resilient to instance failures.” (If you want a little Chaos Monkey of your own, it's open source and available on Netflix's GitHub.) These not only function as drills to prepare engineering teams for system failures, but they also function as a form of resilience engineering, ensuring safeguards are put into place to protect software from catastrophic failure.

DevOps and Security

With DevOps' intense focus on increasing cycle speed, it might seem like DevOps practices are at odds with security. In fact, nothing could be further from the truth. Security is a critical component of DevOps because the philosophy places such a high value on user experience. It's impossible to create a positive user experience if customers are afraid to trust you with their data. Some companies even refer to their DevOps philosophies as “DevSecOps.”

Teams practicing DevOps think about security considerations from the earliest phases of product design. It's not something that's tacked on as an afterthought—as Red Hat writes in their blog, “DevSecOps is about built-in security, not security that functions as a perimeter around apps and data.” Security checks are put into place at every phase of the development cycle. These checks ultimately save time, because they catch security vulnerabilities early in the process and don't leave teams scrambling before a scheduled deployment.

DevOps Team Structures

We've covered some of the DevOps strategies used by teams to ensure smooth collaboration between development and operations, along with other stakeholders, such as security. But what should this team configuration look like? According to Matthew Skelton and Matthew Païs who—quite literally—wrote the book on DevOps teams (their book is called Team Topologies), there isn't one right answer. How you configure your team depends on several factors, including the size of your engineering department, your product offering, and your organizational maturity.

Here are the most successful DevOps team “types” according to the authors:

1. Dev and Ops Collaboration

In this model, Dev and Ops teams collaborate smoothly while maintaining their individual specialties. The two teams share a clearly defined common objective and engineers are comfortable seeking out members of the other team to share ideas and ask for advice. Achieving and maintaining this kind of harmony requires strong technical leadership and may necessitate a cultural change in the company.

Best for: Organizations with multiple product streams and/or development sub-teams
Example: Parts Unlimited, the fictional company featured in the DevOps-inspired novel The Phoenix Project

2. Fully Shared Ops Responsibilities

This is the “operate what you build” or full-cycle model. In this team structure, development and operations are merged into a single team with a shared mission. The operations team ceases to exist as a distinct entity because developers also take care of all operations responsibilities.

Best for: Organizations with a single primary product offering, usually web-based
Examples: Netflix and Facebook

3. DevOps Team with an Expiry Date

This team type functions as a transitional model to type #1 or #2. It's a temporary solution used to create the culture shift needed to merge or foster collaboration between distinct development and operations teams. The temporary DevOps team eases the transition, acting as an advocate for DevOps practices, with the goal of making itself obsolete once DevOps processes become ingrained—ideally within 12-18 months.

Best for: Teams at the beginning of their DevOps journeys

4. SRE Team

This final model might not technically be a DevOps model, since product development remains separate from operations. Rather, this structure, which originated at Google, introduces a new team—the Site Reliability Engineering (SRE) team—made up of developers with ops expertise. After writing and testing their code, development hands it off to SRE, not operations, to put it into production. Crucially, SRE can reject code if it doesn't meet their requirements, ensuring that only high-quality code is deployed.

Best for: Mature organizations with advanced engineering teams
Example: Google