Testing Microservices the sane way

ekidd · on Jan 1, 2018

> The Vagrant repo itself was called something along the lines of “full-stack in a box”, and the idea, as you might imagine, was that a simple `vagrant up` should enable any engineer in the company (even frontend and mobile developers) to be able to spin up the stack in its entirety on their laptops.

This is honestly not that hard to get working, if you have Docker and good tools. For simple cases, "docker-compose" will work well enough. For complex cases, my employer open-sourced a tool for exactly this: http://cage.faraday.io/ This extends docker-compose with the idea of multiple pods, staging/production environments, etc.

For testing, we have plenty of unit tests, and we've been experimenting with pact https://github.com/pact-foundation/pact-js, which turns microservice consumer mocks into provider contracts.

But at the end of the day, it appears to be absolutely necessary to have end-to-end tests that run in a staging environment whenever a service is updated. Otherwise, something inevitably falls through the cracks. It's a nice idea to think that you can develop each microservice in isolation, but somebody ultimately has to watch site-wide quality.

bpicolo · on Jan 1, 2018

It's easy at low scale. It breaks down hard with dozens of services +.

evfanknitram · on Jan 1, 2018

bpicolo · on Jan 1, 2018

A variety different factors. More moving parts == more stuff that can break. Knowledge about a specific service becomes less widespread, so issues take longer to resolve. Also, just hardware scaling becomes a problem. It also really sucks to have to have 50 people rebuild dev environments because you bumped a system dep in a container that your app now depends on. At some point, just building and bringing up all those containers becomes really slow, and you're going to have to build more and more systems to aid in that.

There's also all the stuff that's tricky to automate - say you're using AWS Lambdas, step functions, SQS queues. Translating that across different developer envs is a challenging problem.

I'm very much on the monolith-first approach. There are cases for microservices for specific business functions when you have few engineers, but in general it's far more of a scaling layer for engineer count. Single-app workflows are typically much more productive compared to microservice workflows up until a certain point.

falcolas · on Jan 1, 2018

Perhaps it's just me, but if you can't run your stack locally for any reason but space or metadata, you're too tightly coupled to your platform. This is doubly true in this age of Docker. Of course, Amazon and Google and Azure love tightly coupled applications, but it's not good for creating portable code. And portable code is something that's good for your pocketbook as well as your code quality.

I'll admit, I'm not a fan of testing in production, especially when you're doing that testing against paying customers. I've just seen it go poorly too often, with the common result of lost customers. When you're B2C, it's bad since you're now having to acquire more customers not to grow, but to remain even; a death-knell for VC-backed startups. When you're B2B, the loss of customers signals a coming winter for the business. I've been through a couple of layoffs due entirely to lost customers.

Remember that your product is most likely a fungible asset - your service easily replaceable by a competitor - and if you piss off your customer base by allowing out blatant bugs, you will lose customers. The sheen of newness has worn off tech companies, and customers are not going to be as forgiving of their time being wasted as they once were.

dastbe · on Jan 1, 2018

A pretty pithy response would be that any time you're deploying new software to your production environment, you're testing against real customers. While you should have comprehensive pre-production testing, there's no good substitute for making sure the thing going into Prod is working by running tests when it is in Prod.

Looking at her list of production tests, her list generally falls into 3 categories:

* instrumentation of code for observability * staged rollout of new software to observe issues on a subset of traffic/hosts against the rest of Prod * simulating failures in Prod to test against possible random failure scenarios

Of those, the first should have zero impact on your customers, the second should be making your customer experience better by reducing the blast radius of a bad deployment, and the third should be making your service more resilient to failures over time for your customers. Would you rather find out that you can't survive an availability zone outage when you can easily return that AZ back to healthy in a minute, or when you have to wait for recovery for an hour? Honestly I think its disrespectful to your customers to gamble on what might happen in the future rather than asserting that your service can handle the wide array of known failures that can happen in this world.

falcolas · on Jan 1, 2018

If there's any question of how something is going to respond to production load or production data, there's a missing step somewhere in the release process.

WRT the three points you bring up:

1) Instrumentation is a part of any good deploy, regardless of where, when, or why. Instrumentation is not a replacement for testing.

2) Staged rollouts are a good thing, but they don't prove the new object being rolled out is production ready. They can cordon off major failures, yes, but see my comment above about missing steps when it comes to major failures.

3) Not all companies can afford the Netflix model of having multiple fully redundant data centers at all times. I'd even hazard a guess that most companies can't really afford to triple (or more) their infrastructure costs in addition to the higher development and maintenance costs. Ultimately that's one thing that testing is good at: ensuring your disaster recovery plan works without honking off customers when you discover gaps in that plan.

None of these three strategies will replace pre-customer testing. At best full adherence to the policies (in the absence of end-to-end testing) will only limit the amount of damage major bugs can do. The question to me is: why are we OK with just limiting the damage of otherwise findable bugs?

stephen · on Jan 1, 2018

I'm going to have to re-read the post a few times, but my thoughts on this topic are that cross-system automated tests are futile because, by definition of being "not your system", you can't easily and deterministically control the input state (of the combined your system+not-your-system), so any tests you build on top of this will be like building on quick sand:

http://www.draconianoverlord.com/2017/08/23/futility-of-cros...

My preferred solution, that I haven't had a chance to actually flush out at scale, so disclaimer/YMMV, is for all services to ship with stubs:

http://www.draconianoverlord.com/2013/04/13/services-should-...

Where the stub is an in-memory version of the service that its authors maintain (not the client), so you can achieve the proverbial "deploy all systems on your local machine", but since they're stubs, they're extremely quick to boot/reset/etc., and also with allowances (again since they are stubs) to let you set the per-test input data of any stub your system talks to.

I believe this would work best with homogenous/noun/REST-based services, e.g. all entities in your corporation have a strict/unified CRUD API, so then "integration" tests (e.g. the proposed stub/no-actual-wire call tests) can define their input data in terms of entities and be fairly oblivious about which stubs/systems those entities actually live in.

chrisweekly · on Jan 3, 2018

Good thoughts, Stephen!

I was recently reading about Hypermedia HAL APIs[1] (TLDR: add metadata to API responses, helps w/ discoverability etc) which could conceivably play a role in solving this kind of problem.

[1] https://sonalake.com/latest/hypermedia-apis/

chatman · on Jan 1, 2018

Discarding the "full stack in a box" idea, in general, just because of past experience with a poorly implemented vagrant setup seems naive. A good "full stack in a box" implementation (docker-compose, swarm, kubernetes etc.) can be a useful tool in testing/developing microservices.

justincormack · on Jan 1, 2018

Well there is obviously a point at which it will no longer work (way before Google scale). But the point is to stop testing services in a coupled way, the whole point of microservices is to make decoupling real, not to build a distributed monolith. Testing in a decoupled way helps this enormously.

_ZeD_ · on Jan 1, 2018

Yeah, excepts it doesn't. In those microservices setup more than not the change in the data to accomplish a client request is 3 or more services of "distance". Still, it needs to be accomplished. If you don't coordinate all the services involved (all with different developers team, maybe from different contractors) AND if you don't do an end-to-end test, how can you be sure to do the requested change?

lmm · on Jan 1, 2018

> In those microservices setup more than not the change in the data to accomplish a client request is 3 or more services of "distance". Still, it needs to be accomplished. If you don't coordinate all the services involved (all with different developers team, maybe from different contractors) AND if you don't do an end-to-end test, how can you be sure to do the requested change?

You're doing it wrong. If these services are really so deeply entangled that you can't change and test them one at a time, they shouldn't be independent services. Merge them, or otherwise rethink your service boundaries.

ramchip · on Jan 1, 2018

You can do these tests against a UAT environment. It doesn't have to run on your own box.

In my apps I use fakes in dev and test mode, which makes development very fast and easy. A few tests run against the actual UAT environment, but these are skipped unless a command line flag is passed.

Mostly it's inspired from the article "Mocks and explicit contracts": http://blog.plataformatec.com.br/2015/10/mocks-and-explicit-...

shcallaway · on Jan 1, 2018

My company is in the process of building out an integration test suite for our microservices platform. A few pain points:

1. The tests themselves are housed in a separate repository, so you can't update the tests alongside your service. This means every change to a service has to be backwards compatible. Hello, multi-step rollouts.

2. Our environments must be highly configurable, so that every permutation of versioned services can be integration-tested. This is forcing us to adopt an unnecessarily complex container orchestrator.

3. Service owners are not excited about setting up and contributing to yet another project. We end up with a lot of out-of-date integration tests. Lots of noise, if you ask me.

I think a combination of contract testing (e.g. using Swagger, Pact) with monitoring, canary deployments, and automatic rollbacks would be easier to maintain and just as effective at catching bugs.

bpicolo · on Jan 1, 2018

Once you dive into SoA, you see how much sense single-source-repository starts to make. Developing features between <n> different repositories is tedious, and the single-repo gives ample benefits. Having easy-to-work with interface layers like protobuf in a shared repo makes tons of sense, for example.

It's interesting to see how things like Golang may well have evolved out of this problem. Calculating a golang app's dependencies is as trivial as a grep or two, so it's easy to know what tests to run when libraries change.

shcallaway · on Jan 1, 2018

At the very least, this experience has made me question the conventional wisdom that microservices = good and monolith = bad.

cle · on Jan 1, 2018

That's conventional wisdom? Outside of loud naive bloggers preaching that microservices solve the world's problems, most engineers I've chatted to IRL have been pretty skeptical of the overall utility of microservice architectures because of the high complexity they add.

pbreit · on Jan 1, 2018

Conventional wisdom has always been and remains the opposite.

msangi · on Jan 1, 2018

Wouldn't you need backwards compatibility anyway to avoid the need of upgrading all the services at the same time?

cpitman · on Jan 1, 2018

I usually advocate for storing (and therefore versioning) tests with the code under test, but this one complaint may have changed my mind. Backwards compatibility is incredibly important for microservice architectures. The more tests are black box, the more this would work.

maxxxxx · on Jan 1, 2018

the discussion about the difficulty of testing microservices makes me wonder how the programmers in 20 years will look at the currently modern systems that are being built now. Let's say the system has been in use for 10 years, everybody has moved on and now you have to make extensions or bug fixes to a legacy cloud/microservice/multi-language/multi-server/distributed/queuing/ system. At first look this makes updating an old COBOL system where you have one big codebase in one place look easy in comparison.

nine_k · on Jan 2, 2018

Won't the fact you need to only replace some parts with well-defined interfaces help?

In my company, we had a legacy PHP system that seemed to me excessively layered and cut too thinly. In reality, it allowed to completely replace that system with Python and Java piecemeal, without ever stopping it.

maxxxxx · on Jan 2, 2018

Now imagine in 10 years someone comes in and has to figure out how all these services written with outdated libraries fit together and how to fix bugs.

nine_k · on Jan 2, 2018

HTTP does not get outdated for quite some time already. The standard connection points with extremely late binding is what makes microservices somehow easier to update, as opposed to rebuilding a monolith.

Fitting together is an aspect that becomes easier when you decompose your app properly microservices or not. When a component has clear responsibilities, it's easy(-ier) to understand, modify, and replace. When a component is reasonably small and isolated, it's easy(-ier) to analyze and understand, even if it's written in an ancient language using dust-covered unmaintained libraries.

DenisM · on Jan 1, 2018

I call it the circle of life. Old code devoid of support, and with it old systems and old companies even, wither and die to make way for new code, new systems, and new companies. Sometimes the new thing is the same as the old thing, sometimes it’s really new.

jimbokun · on Jan 1, 2018

A lot of interesting content in this article, but think it could benefit from some editing. Maybe an article half the size, or two or three separate articles breaking out the various topics covered.

coredog64 · on Jan 1, 2018

She's currently writing a book on the subject. I believe these blog posts are teasers for the final product.

jimbokun · on Jan 2, 2018

That would make sense! Seems like an early book draft.

friendly_chap · on Jan 1, 2018

> Ultimately, every individual team is the expert given the specific context and needs.

Most companies don't have specific needs. They think they do, but 90%+ of the companies out there are either building run of the mill CRUD apps or something barely more technically difficult.

kasey_junk · on Jan 1, 2018

I didn't down vote you, but I can see why the comment is downvoted. Its condescending and doesn't offer any actionable advice compared to the article it is commenting on.

Even teams that are building "run of the mill CRUD apps" need to be expert on the specific context and needs of their run of the mill CRUD app. If it truly is generic then it doesn't need to be built (and thus doesn't need to be tested or discussed on HN).

Also anecdotally, in my career, while I've seen lots of CRUD apps they haven't been the focus, the challenge, or the money maker for any of the businesses I've worked with/for. So I'll buy that 90% of companies don't need this advice because 90% (or more) of companies don't do software development at all. But for the ones that do, saying that 90% of them don't have specific needs rings hallow.

tbrownaw · on Jan 1, 2018

Do you have any numbers to back up that 90%?

Do you have anything to back up the link you're claiming between not being technically difficult and not having specific needs? Interfacing with several dozen external services probably doesn't fit what you'd call "technically difficult", but making sure they don't break would count as a specific need.

What is your definition of "technically difficult", anyway? Is it something that links in the Tensorflow library? Something with separate transactional vs reporting databases that have to stay mostly in sync? Something that has to stay easy to update when regulations change? Something like I mentioned above, with multiple-dozen external integrations and contractual or legal penalties if they break? Is it things like the auto-level feature that hobby quadcopters have?

pbreit · on Jan 1, 2018

I would say 95% of projects are basic CRUD.

friendly_chap · on Jan 1, 2018

Does the downvoter care to explain?

I have spent a good chunk of my life explaining companies why they don't need any special hand rolled platform - leaving hundreds of thousands of dollars on the table.

I guess parroting buzzwords is always more popular in this industry.

Abekkus · on Jan 1, 2018

Managers in an organization often have more incentive to make the problems they work on look difficult, and therefore important, rather than fixing their problems quickly, cheaply, and reliably, which can make them look less valuable to leadership.

jimbokun · on Jan 1, 2018

If they are professional and honest, the individual team is still the expert on the requirements, including on whether or not custom software development is needed at all.

zmmmmm · on Jan 1, 2018

Seems a little bit inflated - yes, docker-compose et al, are a problem if you have dozens of services all using different databases. But then, that's a level of complexity and scale that most applications / organisations won't reach. And if you do reach that, then you probably can afford to invest in something more sophisticated to solve the problem (service mocks, etc).

For situations where the true overall complexity is manageable, I think that making sure that the environment can be entirely replicated and bootstrapped easily is actually very good discipline for keeping complexity under control.

akud · on Jan 1, 2018

No mention of mocking out service dependencies, which seems like the obvious way to go. Does anyone do this?

Clubber · on Jan 1, 2018

Absolutely. We have several front end layers, then nested service layers and then nested repo layers, all get mocked depending on the level of object we are testing. We then have separate integration tests that test the repo against an actual database. We design the integration tests so they are largely data independent (we load the data we test rather than just mock data).

sytse · on Jan 1, 2018

Is the conclusion to monitor incremental rollouts of microsevice updates?

pbreit · on Jan 1, 2018

Article makes me want to scream “just fire up rails/Django and build the dang app already!!!”

coldtea · on Jan 1, 2018

So the article makes you want to address totally different stacks for totally different needs and assume they only used multiple microservices because they are idiots?

hendry · on Jan 1, 2018

tl;dr though I thought I'd mention that Postman's monitors is saving the day for me. You can go further and script white box tests in its little embedded JS language.

shcallaway · on Jan 1, 2018

Postman can be handy for quick and dirty HTTP tests. Unfortunately, I find the Postman GUI unintuitive. The alternative -- working with minified JSON -- is equally painful.

Edit: English

matthewtovbin · on Jan 1, 2018

TL;DR