An Experience with Microservices Approach

James Lewis and Martin Fowler published an article on Microservices in March, 2014. The tendencies of a microservices based architecture were well laid out by these highly regarded authors. In this article I would like to provide some first-hand learning we had implementing software using the Microservices architecture approach.

The Why

To start with, lets describe why we approached the software in this manner. When our team was forming into a cohesive unit we were using existing legacy platform tools within a company on a new product in an adjacent market. These platform tools were fairly progressive and yet were still under heavy development along with showing warts of a monolithic architecture approach over the past 5-10 years. The platform had tight coupling, circular dependencies and teams could not work in isolation on cross-cutting aspects of the platform such as the UI controls and client-side data stores. Also, there were performance issues on client and service APIs that were starting to be made visible with larger customers with more data to manage. Since we were creating a new product we soon found that using the same platform tools and APIs were going to slow us down and potentially we would inhibit other teams working on resolving these issues.

In my previous engagements, the architecture patterns that supported long-term needs were those that allow for changeability. Changeability tended to go hand in hand with a *nix-like approach of components that do a single thing (Single Responsibility Principle) and involved low coupling with adjacent and/or dependent components (Just check out SOLID principles for more detailed information that every developer should learn). I had success with approaches that supported these 2 main ideas on many software teams and witnessed as a consultant many more architectures that I would also deem as successful even over time. The visibility into the infrastructure and service design at Netflix also influenced just how far we should go to develop software that would evolve naturally with the changes in the business. Thus we embarked on a journey to implement our software in a manner that would allow for flexible deployment of business capabilities in microservices.

The Domain

We were developing software for an adjacent market that we had co-defined with customers through years of experience consulting in the domain and running experiments for problem/solution fit using a Lean Startup approach. The business capability had become fairly coherent at a high level domain model perspective. We knew the parts and how they would fit together in order to create our first MVP (Minimum Viable Product). The size of the business capability was still such that it involved multiple responsible components that each had their own logic and user interactions at the client and API. To not over-complicate our development we decided to create RESTful stateless services based on Dropwizard, an authorization and external API consumer layer, and a client-side UI based on AngularJS. We used MongoDB as a main persistent storage due to the nature of the data we were supporting and PostgreSQL for user permission management.

Even though we had sufficient learning to focus on delivering the MVP to customers there was still learning to be had with those customers we were co-creating the software with closely in a beta capacity in real world situations. This meant that we needed to absorb change in all aspects of the product, client-side or in our services. Not only that, we had to deploy those changes quickly to learn if they provided an actual solution to our customer’s need. We had an effective Continuous Delivery (CD) pipeline that allowed for all services and client-side UI to be built, tested from multiple perspectives, and deployed into staging and production environments. This also included a separate pipeline for Chef cookbooks that were used to bootstrap instances on Amazon EC2 from scratch. All of this infrastructure allowed us to deploy changes at any commit to master on any source code repository that was being watched by our CD pipelines.

The Product Owner had a button they could push at any time to deploy what was in staging into production without any scheduled downtime. This was enabled through our rolling deployment approach that involved taking vertical slices of our environment out of rotation, deploying to them, running smoke test verifications on that slice, and then putting them back into rotation and then continuing to the next slice. None of this necessitated a microservices approach although it was not much more difficult than other approaches I’ve had first hand knowledge of and it provided nice isolation of capabilities within the product.

And Finally, the Learnings…

There were many learnings that we came away with. Some were specific to the context of our company and others, the ones I will share here, were more general in nature. Of course, these are in retrospect and with some (OK, maybe a lot) of opinion baked into them and I hope they are useful to others whether or not they are followed directly or just spark conversations.

Aggregate Logging

Effective logging is essential for finding resolution to issues in any software. When you have services and clients running across many instances the need for aggregating logs to resolve issues becomes even more important. On top of that, if you have production access policies, such as those found in FDA, HIPAA, and PCI just to name a few then development, teams are restricted from direct access to the running instances, personally identifiable data, and network traffic. Therefore logs must trap and identify not only levels of logging but also define consistent patterns for logging. Teams should discuss and agree on their logging patterns that also include “backstops” for exceptions or unexpected issues that aren’t captured in the implementation code. Pulling these logs into a central services such as Logstash and Splunk.

Focus on Boundary Context of Services

Using techniques from Domain-Driven Design (DDD) with special attention paid to domain models, ubiquitous language, and bounded context will help in defining where capabilities are to be separated into their own services. The time put in by the whole team in defining and understanding the language and bounded context of each capability in the domain enabled client-side code to easily separate access to each service without coupling calls across multiple services. We could have a pair of developers working on one view and service and another pair working on a separate view and service typically without affecting each other’s work.

Lookup Configurations from Deployment Environment

When deploying into multiple environments, such as development, staging, and production, it is important to allow per-environment configuration. These environment configurations could include service endpoints, database access, logging, access tokens, and more. There are many techniques for setting configurations for lookup by running processes. Some examples are shell environment variables, by URL with XML or JSON response, and coordination services such as Zookeeper and etcd. This allows operational configuration of services and access policies to environment authorization tokens to be supported.

Cohabitate Highly Cohesive Code

For some reason, putting code based on multiple languages into the same source code repository feels a bit dirty, at least it did for me. At the same time, there are many different aspects of a service within its bounded context that may necessitate multiple tools to be used. For example, we may want to provide shell scripts to deploy our code alongside the service’s business capability focused code. Some other aspects that should be considered as part of the service’s source code repository are Chef or Ansible instance configuration code, PaaS (Platform as a Service) configuration files, build scripts, instance launch automation scripts, and probably the most controversial suggestion I will make is also serving client-side code that specifically interacts with the service’s endpoints. Since serving client-side code from the service itself may be controversial, here is an example:

Given the following service API endpoints:

GET /api/items

PUT /api/items

POST /api/items/{id}

We might serve a JavaScript API client that has functions for interacting with each endpoint:

{

getItems: function() { … };

addItem: function(item) { … };

updateItem: function(item) { … };

}

This allows the client-side code that interacts with the service to be tested and updated at the same time as the service itself. If the client-side code is put into a separate source code repository and is built and deployed separately then there become situations where the client and service code changes are not independent in the deployment process. This will lead to situations where code must be deployed in a specific manner that is not in alignment with the Continuous Delivery pipeline.

Monitoring and Alerting are Essential

The flexibility of horizontal scaling of stateless services and federated data comes at a cost. It is at times difficult to know the path that is being taken through the application and which instances are involved. Finding ways to identify specific service instances in logs, on clients and in monitoring tools such as New Relic. This will reduce the Time to Resolution in many cases and keep you sane when tracking down causes of issues.

On top of monitoring, finding ways to alert when issues need to be looked into using tools such as PagerDuty will help the team track down issues quickly after their introduction into the environment. Virtual machines get rebooted, instances fail, networks get blocked, syntax errors in configurations cause outages, and any number of other issues can cause problems in an environment. I recommend become familiar with the Fallacies of Distributed Computing to help with thinking about ways that your software can fail in any distributed system. Even browser or mobile clients connecting with services is a distributed system and can fall victim to these fallacies.

Isolate Deployment Slices for Verification

To keep your services highly available it is important to have a deployment process that allows for changes to be introduced without scheduling downtime maintenance. Mostly if teams are deploying frequently, maybe multiple times per day, into an environment. Finding mechanisms to continue serving consumers of the service while deploying new versions makes deploying software a business decision rather than a technical hurdle to leap over. Our process was something like the following:

Have at least two isolated slices deployed to in an environment

Take one isolated slice out of rotation and direct all consumers to other slice(s)

Deploy to the out of rotation isolated slice

Run smoke tests that are configured to test the out of rotation isolated slice

If smoke tests pass then bring the isolated slice back into rotation

Rinse and repeat with other slice(s)

This overly simplified high level process overview has many complications that need to be resolved based on the configuration of the environment being deployed to. PaaS and other approaches to automate service deployment, networking, and configuration management can go a long way to help make this process less impactful to a team’s feature delivery by taking care of the complications through tools and APIs.

Conclusion

The Microservices architecture approach can provide effective boundaries between capabilities in a system and provide tremendous flexibility not easily attained through other more monolithic approaches. There is a cost to the microservices architecture approach in terms of a more complicated environment setup and deployment. These can be overcome through the learnings above and applying techniques provided online by many folks using this approach. If your team is creating new business capabilities I highly recommend taking a deeper look into how a microservices approach could help provide essential flexibility and scalability that is needed in modern software solutions.