Handle Dynamics: The Reactive Principles, Explained
Continuously adapt to varying demand and resources
Humanity as a whole has always had to contend with changes in the availability of valuable resources like water, food, and shelter. Why shouldn't the software we engineer be able to adapt the real-time needs of users? In this blog post by guest author Simon Baslé (@simonbasle), Reactive Software Engineer at VMWare, we'll explore how the eighth of the Reactive Principles: Handle Dynamics represents a natural extension of two former principles: Decouple Time and Decouple Space.
In this explanatory series, we look at The Reactive Principles in detail and go deeper into the meaning of things. You can refer to the original document for full context as well as these excerpts and additional discussion.
Applications need to stay responsive under workloads that can vary drastically and continuously adapt to the situation—ensuring that supply always meets demand, while not over-allocating resources. This means being elastic and reacting to changes in the input rate by increasing or decreasing the resources allocated to service these inputs—allowing the throughput to scale up or down automatically to meet varying demands.
Applications nowadays integrate into environments that are very dynamic in nature. Not only is there a network to link the various components of a system, but with the Cloud the network's topology can and will continuously change! With Service Discovery, components can come and go. They could also become temporarily unresponsive. Both decoupling help with these aspects.
Handling dynamics is a way of saying "embrace that change at all levels". This principle also covers the Application's workload, on top of its environment. It is another way of saying an application should be elastic.
Being elastic means that the Application needs to adapt to a growing and shrinking workload. It must stay responsive by allocating more resources when the workload grows. On the contrary, if the workload shrinks it must not waste precious resources for nothing but rather free up any over-allocated resources.
Sure, an ops team could be in charge of monitoring the application and spinning up new instances. Or the team could plan ahead and schedule these scale-up and scale-down operations, provisioning for events known ahead of time...(looking at you Black Friday provisioning).
But there is one problem: the future is notoriously hard to predict :)
Reacting to events is obviously the only way to go. But watching metrics and reacting to thresholds is a job for which a machine is far better suited than any human. So why not embrace that aspect and let the application deal with it?
By gathering relevant metrics, an application or platform can better equip itself to make educated guesses about when and how to scale. Of course, we're talking about horizontal scaling here: we grow the capacity of the system by adding more servers to the topology on which new application instances can run. This is an operation that is way more automated than vertical scaling, ie. adding capacity to an existing server (although I imagine this would be quite a sight in the datacenter, all these little robot arms adding and removing RAM and hard disk drives and whatnot).
Autonomy as a Pre-Requisite to Automatic Horizontal Scaling
This greatly helps, because without these principles the components of a system cannot scale horizontally in the first place. With decoupling, we can ensure the whole system can dispatch the workload to newly introduced instances or shards transparently.
By feeding relevant metrics to scaling algorithms, full automation can be achieved. Note that the components themselves report metrics, which should be carefully chosen and crafted with an eye for scalability.
On most cloud platforms, the platform itself can gather higher level metrics like CPU usage, RAM usage, disk space, request rate, error rate, etc...these are a good start, but business-oriented metrics or finer grained information is better. If we know the application regularly runs an off-peak CPU intensive task, there is no use in reacting to 100% CPU usage base metric for instance.
The algorithms in charge of auto-scaling should also be adapted to the system.
They can be reactive, in the more common sense: if metric X reaches threshold Y, spin up a new instance. Or they can be predictive while keeping historical data and automatically analyzing it. With machine learning, one can imagine an algorithm that finds hidden precursors to a workload spike in the various metrics reported by a component and learns to preemptively scale accordingly...
Lastly, higher autonomy in the design of the components also means that each component can make decisions regarding its workload without any need to coordinate with its siblings.
What If Resources Are Fixed?
Where resources are fixed, the scope of processed inputs needs to be adjusted instead, signaling this degradation to the outside. This can be done by discarding less relevant parts of the input data, for example: discarding older or more far-reaching sensor data in IoT applications, or shrinking the horizon or reducing the quality of forecasts in autonomous vehicles.
That last point can become key if an application comes close to its resource ceiling.
In order to stay responsive, the application must make a different tradeoff: if it can't scale anymore, then it must fall back to a degraded mode of execution and of course signal that to the outside world.
By having autonomy, application component can each have different strategies to cope with that situation:
Use a simple sampling strategy to discard part of the incoming workload and reduce the rate of incoming data to process
Signal backpressure to an upstream component to let it make the decision on how to reduce the workload
Drop incoming timestamped data according to a maximum age criteria
Drop geographically tagged data according to a maximum distance criteria
Use a faster but less perfect algorithm to process all the data but produce slightly less accurate results
Only process part of each incoming message, discarding heavy-to-process but least relevant pieces of the messages
All of these are tradeoffs where some reduction in efficiency is involved, and the application is the only one to have the knowledge to make it. An autonomous component can safely make that tradeoff without impacting another component for which the same tradeoff would be unacceptable.
Once an application reaches the capacity to auto-scale first and then trade off efficiency for availability as a last resort, it becomes very well equipped to handle dynamics.