Serverless Craic Ep10 Well Architected Reliability Pillar

Serverless Craic from The Serverless Edge

0:00

-10:35

Serverless Craic Ep10 Well Architected Reliability Pillar

Treasa Anderson

Jan 26, 2022

In this episode the team continue their conversation on the well architected framework with the Reliability Pillar.

It has 4 sections: foundations, workload architecture, change management, and failure management. If you're building in a traditional way then reliability can be a huge amount of work. But we're serverless heads. AWS make it a lot easier to do some of these things.

A lot of service quotas and constraints are baked into the foundations. From a change management perspective, you want to get into the continuous delivery kind of mindset, so there's a lot of monitoring if you use the modern tools. From a failure management perspective, serverless is ephemeral, so it's built for retries. You can design so that those areas are slightly easier to work with.

When you look at the foundation section of the reliability pillar, you're probably looking at how to plan and not over provision or overspend and how to scale up effectively. You put a lot of time into the foundation section, where as if you are in Serverless you still need to look at foundations, but I think it's less intensive.

From a change management point of view, adapting to change is built into serverless capabilities and how managed services operate.

From a failure management point of view, a lot of that is baked in especially if you've built an event driven asynchronous workload, using SNS, SQS or Eventbridge. A lot of circuit breaker type mentality, retry and dead letter queues are coming out of the box now. Increasingly they are maturing those capabilities to make it easier for teams to have a default, resilient driver and reliable capability.

There's lot of stuff that AWS has thought about for you in relation to the foundational side and you can benefit from that. And that influences how you actually assemble your workload in terms of the workload architecture as you've got to work within those constraints.

One of the questions is how do you design interactions in a distributed system to prevent failures? So there's a specific section on how to design distributed workloads. That's more than a nod. It's proof that a lot of customers are moving towards distributed micro service and modern application stacks.

Some of the questions look at: if we auto scale this serverless component, what sort of load or pressures are put on something that doesn't auto scale? It forces you to think about protection and where are the choke points? Should you be throttling your workload? Should you be setting constraints on the scalability of your serverless workload?

Those type of questions get to something that we're very passionate about, which is testing for resilience and continuous resiliency, and having test days or game days that tease out where the choke points are. Where are your failure cases? Where are the downstream systems that can't respond or can't take the load as you pass to it?

All of us agree that it is easier to do it with serverless and it's important that the setup is designed properly. But you can get good feedback by testing for this stuff. And you've seen the maturity of the fault injection service that has come out. It's easy to use and I'm hoping to see that evolve and mature to be much more serverless focused as well. But it's a lot easier to test for resiliency. So you're not guessing anymore. You have real automation and you've baked that into your CI CD pipeline a

Serverless Craic from The Serverless Edge
Check out our book The Value Flywheel Effect
Follow us on X @ServerlessEdge
Follow us on LinkedIn
Subscribe on YouTube

The Serverless Edge

Serverless Craic Ep10 Well Architected Reliability Pillar

Discussion about this episode