Why Reactive Principles Produce Resilient Cloud Systems: A Conversation with Ben Christensen
The Reactive Principles, a new set of guidelines and techniques published by the Reactive Foundation, incorporate the ideas and patterns from both Reactive Programming and Reactive Systems into a set of practical software design principles. These principles distill the experiences of leading experts from the broader distributed systems communities into a collection of principles and patterns for building Cloud Native, Edge Native, and Internet of Things (IoT) applications.
In this post, I catch up with Ben Christensen, RxJava and Hystrix creator, software engineer, and contributor to The Reactive Principles. We discuss his introduction to Reactive programming, why resilience is a key driver for building distributed systems, and the practical challenges Reactive principles solve in everyday enterprise environments.
How did you get introduced to Reactive programming/Reactive systems?
BC: While working on the Netflix API, I was trying to make it possible for teams to define their own endpoints optimized to different use cases and client devices. This meant exposing an internal API that could be composed, and nearly every internal API represented a remote microservice. This meant composing asynchronous network calls.
While exploring various (bad) options, I ended up in debates with Jafar Husain who introduced me to Rx (Reactive Extensions), which he had experience with at Microsoft. Several months later I admitted his solution was by far the best and wanted to adopt it. However, we were using Java and Rx didn’t yet exist. RxJava was created to solve the composition of multiple network requests in a web service gateway without tiers of blocking calls or awkward use of non-composable Java Futures.
This is how I was first introduced to reactive programming.
What makes Reactive principles so important for building distributed systems?
BC: There are performance and efficiency benefits if used correctly in the right circumstances when working with network and multithreaded environments, but resilience has been the key driver for me.
While working on fault tolerance at Netflix and creating the Hystrix library to add bulkheads and circuit breakers, the Reactive principles became very important to me in my work, and to the systems I worked on. For example, latency is one of the hardest problems to solve in a distributed system if it’s not accounted for as part of the system design, as queues/buffers backup everywhere and the system struggles to recover if latency spikes unexpectedly. But if bulkheads are used, queue limits imposed, latency assumed, and asynchrony embraced it becomes just a normal part of operations, and the system can easily handle this, shedding load if needed—and recovering quickly (and without on-call human intervention).
Asynchronous systems, whether it’s just making network calls or also dealing with multiple threads, can be very painful to operate and scale, especially when handling errors. But Reactive programming and more generically the Reactive principles make all of this normal – handling latency without resource bloat, dealing with timeouts and load shedding, propagating errors across threads, queues, and networks, and composing all of these together, which any distributed system of moderate complexity needs to do.
What practical challenges do the Reactive principles solve in everyday enterprise environments?
BC: For me the benefit of Reactive principles (particularly Reactive programming) is that the single paradigm works with asynchronous networking, asynchronous threading, and overall uncertainty of a distributed system. The same approach handles success, failure, latency, and non-determinism all the same way. It even handles a mixture of synchronous and asynchronous IO calls (if necessary) in the same approach.
The approach does have a higher learning curve than classic imperative approaches, but once understood, in my opinion the complexity ceiling for code and system design is far lower in order to solve the challenges of a distributed system.
All systems break, but distributed systems break in varying degrees nearly all the time. Reactive principles and Reactive programming builds those assumptions into the architecture and code. This lowers the burden of scaling both the system itself, but perhaps more importantly the operational effort and costs.
In your opinion, what makes Reactive principles such a good fit for cloud native applications?
BC: Reactive principles embrace the inherent challenges that come with distributed systems, which are a foundational characteristic of “cloud”. All interactions in the cloud should be assumed to be asynchronous, latency, and failure prone, and Reactive principles and programming approaches make it normal to work with those traits.
How do you recommend software architects and developers get started using these principles?
BC: The hardest part is getting over the initial shift in thinking, and relearning basics that most of us have entrenched in an imperative approach. For example, how does one wait on an asynchronous network call, make a conditional decision, and make sure to handle errors? We all know what this looks like imperatively, but “reactive programming,” its close relative “functional programming,” and talk of lambdas, monads, actors, and flatmap are all confusing and come across way more complicated than they really are.
In most cases, I suggest starting in an existing codebase by migrating synchronous network calls to asynchronous request/response network calls and getting accustomed to the paradigm shift this causes. You can then start adding things like bulkheading, fallbacks, composition, and message passing.
This incremental approach allows refactoring existing code, learning a little at a time, and getting resilience benefits as you go. And then eventually Reactive programming (Rx, Akka, Erlang, and many others) may be enticing as it’s then more clear why it helps once you’ve got more asynchrony and are trying to compose it.
Trying to adopt a Reactive programming model when your problem only involves a single network call can be a letdown on the initial learning curve for little benefit. This is why I suggest first adding the asynchrony, feeling some of the pain that eventually comes, and then embracing the rest of the toolkit for making it easier to apply Reactive principles.
Mark your calendar for Reactive Summit, a one-day virtual conference on Tuesday, November 10th hosted by the Reactive Foundation and focused on Reactive principles, patterns, and projects. I hope you can make it!