Decouple Space: The Reactive Principles, Explained
Create flexibility by embracing the network
One of the keys to building resilient, elastic, scalable cloud applications is decoupling space, or location transparency. From the Reactive Manifesto:
Reactive Systems rely on asynchronous message-passing to establish a boundary between components that ensures loose coupling, isolation and location transparency. This boundary also provides the means to delegate failures as messages. Employing explicit message-passing enables load management, elasticity, and flow control by shaping and monitoring the message queues in the system and applying back-pressure when necessary. Location transparent messaging as a means of communication makes it possible for the management of failure to work with the same constructs and semantics across a cluster or within a single host.
In this blog post by guest author Evan Chan (@evanfchan), Data Architect and Engineer, we review the seventh of the Reactive Principles: Decouple Space, and why the location transparency provided by decoupling space is so important to building Reactive systems.
In this explanatory series, we look at The Reactive Principles in detail and go deeper into the meaning of things. You can refer to the original document for full context as well as these excerpts and additional discussion.
Traditional Application Architecture
In a traditional application, the methods that each application process talks to another process is vastly different from the way that different parts of an application talk to each other within the same process.
Usually, an application process runs on multiple threads, and functions and methods call one another, traditionally in synchronous method calls. To coordinate state across threads, various mechanisms including thread-safe queues, maps, and other data structures are used to share state.
Over time, some of these architectures have been replaced by more asynchronous architectures where tools such as Futures/Promises and asynchronous callbacks are employed. However, asynchronous code within a process typically still relies on the same mechanisms for coordination–namely, thread-safe data structures shared amongst different parts of the application.
Typically, resilience and backpressure concerns are addressed by patching these same mechanisms–for example, by checking and controlling the size of data structures, using a LinkedHashMap to evict oldest entries for caching, etc.
Sharing of data and coordination takes on a vastly different shape across processes. An external database or message broker such as Kafka might be employed between services–many queues have been implemented in MySQL!
Sometimes an external coordination service such as etcd or Zookeeper is used. There are a myriad number of protocols for communicating directly between services as well–most commonly HTTP and its variants, but there are others (low-level protocols like TCP, UDP, and high-level ones like gRPC, RSocket, Aeron, etc).
What we can notice is that there are two major differences from the single-process scenario: the handling of state, and communications. These intra-process communication patterns and tools have very different characteristics from the within-process RPC calls:
The vastly different mechanisms of inter-process communications–having to deal with the network and its failure scenarios, for example–is often a barrier to teams scaling out their services and pipelines beyond just a single node.
Oftentimes, scaling mechanisms such as sharding are chosen such that multiple copies of the same service do not need to coordinate with one another. The pain of migrating to and running more distributed systems, queues, etc. and their cost often discourages teams from reaping the benefits of scaling out.
Actors and Message Passing
Now let’s look at a different paradigm commonly used in reactive applications and architectures: actor systems. An actor represents a single unit of execution defined by the following characteristics:
An actor encapsulates its own state and does not share state with other actors
An actor ONLY communicates with the outside world by sending and receiving messages (ie, an actor is not a class and you cannot invoke methods inside of an actor)
These messages should be immutable and not reference state in any actor
Messages are sent from one actor to another in one direction. This forms the basis of different communication patterns. Request-response consists of actor A sending a message to actor B, and the actor B responding back to actor A.
Instead of spinning up multiple threads and having them update shared data structures, or doing the equivalent in a distributed fashion (through databases and file/object systems), in an actor system you create multiple actors and have them communicate to distribute work and scale out.
What do we notice about the properties of actors above? Since all state is encapsulated within each actor and not shared, and messages between actors are immutable, it doesn't really matter what the boundaries between the actors are (if they run locally next to each other, or are separated with the network). In other words, actors and immutable message passing allows one to build systems that decouple space.
In the diagram above, Actors A and B are both in node 1, and Actor C is in node 2. However, the way that Actor A talks to Actor B, and Actor B talks to Actor C, is exactly the same.
How is this possible? In both cases, the actor runtime takes care of abstracting away the message passing mechanism. Actors just need to know the actor handle (its virtual address) it is talking to and the immutable message is passed in the appropriate way by the runtime (locally or serialized over the network).
Due to the uniform approach to message passing, the code would be exactly the same if all three actors were in the same node, or all three in different nodes, or any configuration really of nodes and processes.
In fact, you can imagine a world where there are no nodes or containers, just lightweight actor processes which are managed by a runtime–the serverless world is moving in this direction, and other trends such as WebAssembly is also portending a shift this way. See Akka Serverless for a sneak peak at this new world!
Decoupling Space Allows Architectural Flexibility
The actor and immutable message passing paradigm is so different from the traditional way of building applications. It is very similar to many of the mental shifts required when building distributed systems - not sharing state, communication patterns, etc. The benefit of actor systems, though, is that building systems this way gives one tremendous architectural flexibility.
The unit of encapsulation becomes not classes, threads, processes, and containers, but just actors. Actor systems can easily be brought up on a single node, on the developer’s laptop, and tested this way, and with almost no changes in code, be elastically scaled out to many nodes. This flexibility is crucial in today’s world of on-demand scalability.
When I architected the query engine of the FiloDB distributed time-series database, I designed the queries to flow through an Akka actor system. From the top of the stack, parsing and distributing queries, to intermediate aggregation nodes, to the leaf nodes on each shard which fetch raw data, each step of the way was powered by an actor.
The result of this was tremendous flexibility with the query architecture. For simple testing and deployment, we could fit all the actors on a single node. If intermediate aggregation tasks took too many resources, they could be reallocated to be on different nodes. A tree-style query architecture could be migrated to with little effort as well.
Since there is no difference between local and remote calls, there is no lock-in as to which parts of the architecture are local vs remote, but these decisions can be made and optimized at run-time!
It is not just actors that allow for decoupling space. Reactive streaming technologies such as Akka Streams and RSocket are key enablers of this paradigm of decoupling space and boundaries, for stream processing and other spaces.
Resilience and State Recovery
Of course, flexibility and scalability is not useful without resilience. Actor systems come with built-in supervision hierarchies, forming an important foundation for resilience by automatically handling failure and restarting actors. Due to the flexible topology of actors, one can bring up actors on a different, healthier node/container/etc and the system should be able to carry forwards.
How about state though? To move actors around and recover, they must recover their state. While we can use traditional means of state recovery–by snapshotting, writing state to some database, for example, and recovering such state—actors and Reactive systems that pass messages enable a new way to think about state.
One observes that each actor’s state is built up by passing in messages, in the order they arrive. If we now store these messages/events in their causal order in a durable “event log” we have the full history of all state-changing events to that actor. This gives us much more flexible means to provide for stateful resilience.
For example, we can provide redundancy by having multiple replica actors that build up the same state by receiving the same messages, spread out to sustain the loss of one or more replicas. Message-based replay can be combined with snapshots for efficient state persistence and recovery. Many of these patterns are encapsulated in a pattern called event sourcing.
Decoupling Space and Data Processing
We can put all of the above principles together for a flexible, resilient, low-latency, and scalable data processing pipeline:
Incremental chunks of data (partial dataframes, if you will, or individual messages) flow between actor nodes
Actor nodes can split up work amongst multiple actors, which can easily scale out to multiple nodes if needed
For resilience, messages can be persisted to an append-only log and replayed to recover state
The above can be combined with snapshotted state such as models, features, etc.
Supervision of failures and restarts for resiliency
Low overhead design for easy local testing
Serverless, Kubernetes, and the Future
The advent of serverless is showing us that rethinking the application model is the key to breaking existing constraints around scalability, resilience and productivity. In the same way, Kubernetes showed in the deployment world that redefining abstractions around containers, pods, and clusters can simplify and streamline operations and remove barriers.
With traditional boundaries changing rapidly, the reactive message-passing application model is one that developers can embrace to not only bring resilience, scalability, and flexibility to their applications, but future proof their applications as Kubernetes, Serverless and other newer trends shift boundaries. Message passing shatters boundaries!