Focus on Configuration Management Debt First

It has been a few years since Managing Software Debt was published and I think about how it could have been better on a regular basis. I think the content was useful to put in a book and aligns with current movements such as DevOps, Microservices, Continuous Delivery, Lean and Agile. On the other hand, there may have been a tendency by me to focus on Technical Debt at the beginning of the book that was not warranted in terms of importance for managing software debt. It has been my experience that managing technical debt, the activities that a team or team members choose not to do well now and will impede future development if left undone, is less important than the other four types of software debt. In fact, here are the priorities that I typically used with companies to help them manage software debt more effectively:

  1. Configuration Management Debt: Integration and release management become more risky, complex, and error-prone.
  2. Platform Experience Debt: The availability of people to work on software changes is becoming limited or cost-prohibitive.
  3. Design Debt: The cost of adding features is increasing toward the point where it is more than the cost of writing from scratch.
  4. Quality Debt: There is a diminishing ability to verify the functional and technical quality of software.

Along the way, I might work with teams on processes and techniques that can help with reducing technical debt but that was to get them ready to take advantage of the efficiencies that result from the other four. I wonder if the popularity of the topic “technical debt” and my background as a developer lead me to focus chapters 2 through 4 on managing technical debt. No matter what the reasoning, I think it is good to discuss the reasons for demoting it in priority with regards to managing software debt on your team and in your organizations.

Why Configuration Management Debt First?

The value of software is only potential value until it is put into a user’s hands. There can be many roadblocks to software getting into user’s hands in an organization’s processes:

  • Proliferation of long-lived branches
  • Over burdened release engineering and operations teams
  • Poor integration processes across architecture components and scaled team delivery
  • High coupling with centrally managed architecture element/component
  • Too many variations/versions of the software supported in production
  • Code changes feel too risky and takes too long to validate before releasing into production
  • Poor documentation practices
  • Too many hand-offs between teams in order to release software to users

Just as the tag line of the chapter 6 “Configuration Management Debt” says:

If releases are like giving birth, then you must be doing something wrong.
— Robert Benefield

In organizations that have effective configuration management practices it is common to see deployment pipelines that have a smaller number of hand-offs between teams, architectures that tend to be more malleable, and efficient validation processes. What is interesting about the list that I just wrote in the previous sentence is that they align with managing three other types of software debt more effectively:

  • Smaller number of hand-offs: Platform Experience Debt
  • Malleable architectures: Design Debt
  • Efficient validation processes: Quality Debt

What I have found is that by focusing on Configuration Management Debt it is simpler to identify aspects of the integration and release management process that need to be tackled in order to get working software in the hands of users sooner while reducing the bottlenecks in the organizational processes and practices therefore leading to further optimizations in the future.

Alignment with Today’s Industry Conversations

It is interesting to note that this focus aligns well with the tenets of the DevOps movement luminaries. Going towards Feature Teams organizational configurations that have people with all of the capabilities to design, implement, validate, integrate, configure, deploy and maintain reduces hand-offs and increases quality because the team that builds the software maintains it, as well. Its amazing how quality goes up when a team member on rotation has to respond to production issues in the middle of the night.

Not only does this align with the DevOps but malleable architectures tends to align with the Microservices movement. Given Feature Teams we can take advantage of Conway’s Law, rather than be a victim of it, to align team delivery with providing a specific capability in the architecture. Since these capabilities are more specific the implementation tends to be smaller and therefore easier to change later. Now there are still operational issues to overcome in order to make this an efficient approach. Platforms such as Cloud Foundry that provide application runtimes and access to services with low operational overhead will make Microservices based architectures even more approachable and efficient to attain.

The Agile movement has already encouraged significant changes in how we validate software. Continuous Delivery has continued to push the validation efficiencies forward. As more teams become aware and more effective with test frameworks, tools, platforms and SaaS offerings we will continue to see more efficient validation processes in our delivery pipelines.

There is a great book by Jez Humble, Lean Enterprise: How High Performance Organizations Innovate at Scale, that I recommend if you want to explore the topics above further. Let me know in the comments section if you have any additional or different thoughts around focusing on configuration management debt first. I’m always interested in learning from others tackling difficult problems and the approaches that they see working.

The Imminent Acceleration of the Twelve-Factor Apps Approach

Software that takes advantage of what the cloud has to offer must take on a new shape. Applications deployed to the cloud need to show resilience in the face of VMs or server instances going down or not behaving as expected. Applications deployed to cloud infrastructures also tend to need to scale temporarily to keep costs reasonable and automatic based on patterns of usage. Learning faster than your competition is of utmost importance to many who deploy software into a cloud therefore deploying updates on a continuous basis becomes critical. There is an approach to developing and deploying applications into the cloud, The Twelve-Factor App.

The Twelve Factors of this approach are:

I. Codebase – One codebase tracked in revision control, many deploys
II. Dependencies – Explicitly declare and isolate dependencies
III. Config – Store config in the environment
IV. Backing Services – Treat backing services as attached resources
V. Build, release, run – Strictly separate build and run stages
VI. Processes – Execute the app as one or more stateless processes
VII. Port binding – Export services via port binding
VIII. Concurrency – Scale out via the process model
IX. Disposability – Maximize robustness with fast startup and graceful shutdown
X. Dev/prod parity – Keep development, staging, and production as similar as possible
XI. Logs – Treat logs as event streams
XII. Admin processes – Run admin/management tasks as one-off processes

These factors tend to drive highly cohesive applications and services that also exhibit low coupling with their dependencies. Each application or service should be in a single repository that can be built and deployed on its own. Rather than branching or forking to create multiple deployable versions of the application, we should externalize configuration so that the maintenance costs are not exponentially growing to support all versions of the application. These applications should also have understandable boundaries and implement a single concept or responsibility to increase their disposability. Creating stateless applications and services enables horizontal scaling and redundancy across multiple nodes in a cloud. Deploying multiple instances of an application or service results in multiple ports that can load balanced and registered for use by others.

Some of these factors are no brainers for many experienced developers and some are not as easy to implement due to access restrictions to configuration management and flexible operating platforms. For those of us fortunate enough to use “real” cloud infrastructure, today’s cloud vendors are starting to provide services that enable Twelve-Factor Apps. The Cloud Foundry Foundation officially launched this week as part of the Linux Foundation Collaborative Projects. Cloud Foundry is the most robust and mature open source Platform-as-a-Service (PaaS) offering in the market. With Cloud Foundry it has become much easier to apply The Twelve-Factor App approach to applications and services. Buildpacks enable the use of polyglot software development across applications and services yet still deploy into a single PaaS. Software can be deployed using a single command to a PaaS: `cf push <appname>`. Zero scheduled downtime deployments can be performed using a Blue/Green Deployment approach, described well by Martin Fowler here, that keeps existing versions of the software (Blue) up while deploying and testing new versions (Green) that can run alongside or replace the old version (Blue) when validated as ready. And binding to dependent services, such as DB clusters and message queues, can be simplified to create a named service deployment with `cf create-service mongodb cluster my-mongo-cluster` and then binding it to your application with `cf bind-service my-app-1 my-mongo-cluster`. It is incredible how Cloud Foundry can make creating and continuously delivering a Twelve-Factor App orders of magnitude easier than constructing and managing your own infrastructure or in an Infrastructure-as-a-Service (IaaS) platform. When you need to optimize the deployment environment you can always take a single application or service and adjust it to deploy into your own data center or an IaaS platform but since they take The Twelve-Factor Approach you don’t have to do it for all of your applications and services.

I hope this article provides awareness around The Twelve-Factor App approach and how a PaaS, such as Cloud Foundry, can enable effective use of the approach. I recommend clicking the links provided in this article to read more details about the approach and how to take advantage of what Cloud Foundry has to offer. It is imminent that in the next few years that Twelve-Factor Apps will become more the norm in software development shops and PaaS will be a common deployment platform. Take the time to read and experiment with Cloud Foundry to get a leg up on this imminent acceleration of PaaS and The Twelve-Factor Apps approach.

Managing Configuration Management Debt

Introduction

In my book “Managing Software Debt: Building for Inevitable Change”, chapter 6 discusses Configuration Management Debt and approaches to help keep this type of debt to a minimum. Since I originally wrote this chapter a lot has gone on in the software development and operations communities. There has been a couple of huge uprisings: Continuous Delivery and DevOps. Of course, the ideas expressed as part of these movements are not necessarily new but rather standing on proverbial shoulders of giants.

Managing Configuration Management Debt

When I wrote in this chapter is only a small portion of ideas that are being feverishly adopted across our industry. It is exciting to see just how fast truly Lean approaches such as DevOps are motivating teams, enhancing customer experiences, and driving business results. Application of DevOps and Continuous Delivery are quickly maturing and seem to be a stronger force than even the Manifesto for Agile Software Development.

I would like to review the 3 areas that organizations can work on to improve their Configuration Management performance to put some perspective on how these hold up 5 or so years later:

  • Transfer some Configuration Management responsibilities to the teams
  • Increase automated feedback
  • Track issues in a collaborative manner

Transfer Responsibilities to Teams

Release Engineering teams tend to get overburdened as time goes by. They tend to be centralized and support more teams than they have people. The DevOps movement has put a spotlight onto this problem and provided cultural, process, and technical approaches for reducing the risk of overburdening Release Engineering teams. Taking a DevOps approach implies moving some responsibilities for configuration management to the team.

Increase Automated Feedback

To reduce the impact of increasing the team’s platform complexity by adding these responsibilities, organizations must invest in increasing automation and developer user experience of automation tools. The other aspect of transferring responsibilities to teams from the book was automating Install and Rollback procedures. Going beyond Continuous Integration towards the fully automated build, test deployment and monitoring approaches contained in Continuous Delivery encompasses this idea. There has been a tremendous amount of innovation that makes Continuous Delivery more approachable including the growth of Cloud Computing adoption to Chef & Puppet to automate infrastructure to the promise of Platform as a Service (PaaS).

Track Issues Collaboratively

Automated build pipelines and teams that have more responsibility for software operations leads us to ask about who tracks issues. Again, the DevOps movement has pushed our organizations to think about the responsibility for production deployments. The more teams can take responsibility, given appropriate automation and developer user experience with the tools, the more responsive they will be to customer issues and operational changes needed around their software. It is my contention that the more responsibility teams have for production operations the better production will operate.

Conclusion

In review, I probably write enough on Configuration Management Debt. In my consulting and product development activities since the book I have realized that focusing on Configuration Management Debt first makes steps to address the other four types of software debt (technical, quality, design, and platform experience) easier to see. The build pipeline(s) and the spread of responsibilities can tell teams a lot about where they can find optimizations.

An Experience with Microservices Approach

James Lewis and Martin Fowler published an article on Microservices in March, 2014. The tendencies of a microservices based architecture were well laid out by these highly regarded authors. In this article I would like to provide some first-hand learning we had implementing software using the Microservices architecture approach.

The Why

To start with, lets describe why we approached the software in this manner. When our team was forming into a cohesive unit we were using existing legacy platform tools within a company on a new product in an adjacent market. These platform tools were fairly progressive and yet were still under heavy development along with showing warts of a monolithic architecture approach over the past 5-10 years. The platform had tight coupling, circular dependencies and teams could not work in isolation on cross-cutting aspects of the platform such as the UI controls and client-side data stores. Also, there were performance issues on client and service APIs that were starting to be made visible with larger customers with more data to manage. Since we were creating a new product we soon found that using the same platform tools and APIs were going to slow us down and potentially we would inhibit other teams working on resolving these issues.

In my previous engagements, the architecture patterns that supported long-term needs were those that allow for changeability. Changeability tended to go hand in hand with a *nix-like approach of components that do a single thing (Single Responsibility Principle) and involved low coupling with adjacent and/or dependent components (Just check out SOLID principles for more detailed information that every developer should learn). I had success with approaches that supported these 2 main ideas on many software teams and witnessed as a consultant many more architectures that I would also deem as successful even over time. The visibility into the infrastructure and service design at Netflix also influenced just how far we should go to develop software that would evolve naturally with the changes in the business. Thus we embarked on a journey to implement our software in a manner that would allow for flexible deployment of business capabilities in microservices.

The Domain

We were developing software for an adjacent market that we had co-defined with customers through years of experience consulting in the domain and running experiments for problem/solution fit using a Lean Startup approach. The business capability had become fairly coherent at a high level domain model perspective. We knew the parts and how they would fit together in order to create our first MVP (Minimum Viable Product). The size of the business capability was still such that it involved multiple responsible components that each had their own logic and user interactions at the client and API. To not over-complicate our development we decided to create RESTful stateless services based on Dropwizard, an authorization and external API consumer layer, and a client-side UI based on AngularJS. We used MongoDB as a main persistent storage due to the nature of the data we were supporting and PostgreSQL for user permission management.

Even though we had sufficient learning to focus on delivering the MVP to customers there was still learning to be had with those customers we were co-creating the software with closely in a beta capacity in real world situations. This meant that we needed to absorb change in all aspects of the product, client-side or in our services. Not only that, we had to deploy those changes quickly to learn if they provided an actual solution to our customer’s need. We had an effective Continuous Delivery (CD) pipeline that allowed for all services and client-side UI to be built, tested from multiple perspectives, and deployed into staging and production environments. This also included a separate pipeline for Chef cookbooks that were used to bootstrap instances on Amazon EC2 from scratch. All of this infrastructure allowed us to deploy changes at any commit to master on any source code repository that was being watched by our CD pipelines.

The Product Owner had a button they could push at any time to deploy what was in staging into production without any scheduled downtime. This was enabled through our rolling deployment approach that involved taking vertical slices of our environment out of rotation, deploying to them, running smoke test verifications on that slice, and then putting them back into rotation and then continuing to the next slice. None of this necessitated a microservices approach although it was not much more difficult than other approaches I’ve had first hand knowledge of and it provided nice isolation of capabilities within the product.

And Finally, the Learnings…

There were many learnings that we came away with. Some were specific to the context of our company and others, the ones I will share here, were more general in nature. Of course, these are in retrospect and with some (OK, maybe a lot) of opinion baked into them and I hope they are useful to others whether or not they are followed directly or just spark conversations.

Aggregate Logging

Effective logging is essential for finding resolution to issues in any software. When you have services and clients running across many instances the need for aggregating logs to resolve issues becomes even more important. On top of that, if you have production access policies, such as those found in FDA, HIPAA, and PCI just to name a few then development, teams are restricted from direct access to the running instances, personally identifiable data, and network traffic. Therefore logs must trap and identify not only levels of logging but also define consistent patterns for logging. Teams should discuss and agree on their logging patterns that also include “backstops” for exceptions or unexpected issues that aren’t captured in the implementation code. Pulling these logs into a central services such as Logstash and Splunk.

Focus on Boundary Context of Services

Using techniques from Domain-Driven Design (DDD) with special attention paid to domain models, ubiquitous language, and bounded context will help in defining where capabilities are to be separated into their own services. The time put in by the whole team in defining and understanding the language and bounded context of each capability in the domain enabled client-side code to easily separate access to each service without coupling calls across multiple services. We could have a pair of developers working on one view and service and another pair working on a separate view and service typically without affecting each other’s work.

Lookup Configurations from Deployment Environment

When deploying into multiple environments, such as development, staging, and production, it is important to allow per-environment configuration. These environment configurations could include service endpoints, database access, logging, access tokens, and more. There are many techniques for setting configurations for lookup by running processes. Some examples are shell environment variables, by URL with XML or JSON response, and coordination services such as Zookeeper and etcd. This allows operational configuration of services and access policies to environment authorization tokens to be supported.

Cohabitate Highly Cohesive Code

For some reason, putting code based on multiple languages into the same source code repository feels a bit dirty, at least it did for me. At the same time, there are many different aspects of a service within its bounded context that may necessitate multiple tools to be used. For example, we may want to provide shell scripts to deploy our code alongside the service’s business capability focused code. Some other aspects that should be considered as part of the service’s source code repository are Chef or Ansible instance configuration code, PaaS (Platform as a Service) configuration files, build scripts, instance launch automation scripts, and probably the most controversial suggestion I will make is also serving client-side code that specifically interacts with the service’s endpoints. Since serving client-side code from the service itself may be controversial, here is an example:

Given the following service API endpoints:

GET /api/items

PUT /api/items

POST /api/items/{id}

We might serve a JavaScript API client that has functions for interacting with each endpoint:

{

getItems: function() { … };

addItem: function(item) { … };

updateItem: function(item) { … };

}

This allows the client-side code that interacts with the service to be tested and updated at the same time as the service itself. If the client-side code is put into a separate source code repository and is built and deployed separately then there become situations where the client and service code changes are not independent in the deployment process. This will lead to situations where code must be deployed in a specific manner that is not in alignment with the Continuous Delivery pipeline.

Monitoring and Alerting are Essential

The flexibility of horizontal scaling of stateless services and federated data comes at a cost. It is at times difficult to know the path that is being taken through the application and which instances are involved. Finding ways to identify specific service instances in logs, on clients and in monitoring tools such as New Relic. This will reduce the Time to Resolution in many cases and keep you sane when tracking down causes of issues.

On top of monitoring, finding ways to alert when issues need to be looked into using tools such as PagerDuty will help the team track down issues quickly after their introduction into the environment. Virtual machines get rebooted, instances fail, networks get blocked, syntax errors in configurations cause outages, and any number of other issues can cause problems in an environment. I recommend become familiar with the Fallacies of Distributed Computing to help with thinking about ways that your software can fail in any distributed system. Even browser or mobile clients connecting with services is a distributed system and can fall victim to these fallacies.

Isolate Deployment Slices for Verification

To keep your services highly available it is important to have a deployment process that allows for changes to be introduced without scheduling downtime maintenance. Mostly if teams are deploying frequently, maybe multiple times per day, into an environment. Finding mechanisms to continue serving consumers of the service while deploying new versions makes deploying software a business decision rather than a technical hurdle to leap over. Our process was something like the following:

Have at least two isolated slices deployed to in an environment

Take one isolated slice out of rotation and direct all consumers to other slice(s)

Deploy to the out of rotation isolated slice

Run smoke tests that are configured to test the out of rotation isolated slice

If smoke tests pass then bring the isolated slice back into rotation

Rinse and repeat with other slice(s)

This overly simplified high level process overview has many complications that need to be resolved based on the configuration of the environment being deployed to. PaaS and other approaches to automate service deployment, networking, and configuration management can go a long way to help make this process less impactful to a team’s feature delivery by taking care of the complications through tools and APIs.

Conclusion

The Microservices architecture approach can provide effective boundaries between capabilities in a system and provide tremendous flexibility not easily attained through other more monolithic approaches. There is a cost to the microservices architecture approach in terms of a more complicated environment setup and deployment. These can be overcome through the learnings above and applying techniques provided online by many folks using this approach. If your team is creating new business capabilities I highly recommend taking a deeper look into how a microservices approach could help provide essential flexibility and scalability that is needed in modern software solutions.