How to Become a Cloud Engineer - the Ultimate Guide

We talk you through our 8 tenets or principles on our ultimate guide to becoming a Cloud Software Engineer.

Jun 14, 2024

Skills you need to become a cloud engineer

We joke that researching our eight tenets took ten years, but they took only 10 minutes to write.

These eight tenets or principles apply to a 'high-performance serverless first team' but could also apply to a high-performance modern cloud team.

1. Chase a business outcome or a KPI

Teams should know what business KPI they're working towards.

You should be able to tap a cloud software engineer on the shoulder and have them tell you what they're working on and what business impact their work will have.

It acts like a guardrail, allowing you to move with speed and velocity while relying on making good decisions and understanding your priorities and how you prioritise. The only way to do that is by tapping into the product's success.

Use North Stars to track your business success.

North Stars tracks the way from profitability or business success to your work. And how you're having an impact. They help you to make good decisions and move fast.

Share these principles or tenets with teams only by giving them guidance on how to achieve or align with them. We conduct Northstar workshops to help teams get a good grasp on their KPIs and the Northstar for their business.

There's a straightforward thought behind that. You ask a team about KPIs. If the team says, 'I don't know,' you run a Northstar workshop. After the Northstar workshop, if there is yet to be a KPI, the next step is to ask if the team should be doing this work.

It does not mean they are a terrible team. You are asking them to do the wrong stuff.

2. Be secure by design

Our number two principle is 'Be secure by design.' This principle has worked to secure our development for a long time. Then, AWS came out with 'Secured by Design', so we borrowed it. We don't wait to do security afterwards; we bake it in from the start. It's everyone's job, period/full stop.

Security is difficult to retrofit. Use threat models and get it done early. Try to solve the problem using what you can and what you know.

Bake it into all your engineering practices and pipelines. Shift it all left and help enable teams to be more secure.

Don't say that it's too hard. We're not doing it. Start today and bake it in!

Suppose you align with the business. Business success is number one, and being secure is number two. Security has a risk profile, so you need to do it right. It can also be an existential risk for businesses, so they need a secure solution. Numbers one and two are in the correct order here.

3. Keep a high throughput of work

Our third principle or tenet is 'Keep a high throughput of work'. We have borrowed from the DORA metrics in Nicole Forsgren's Accelerate book. This principle considers high throughput, which is deployment frequency and lead time.

The Accelerate book gives us the language and external validation for what we must say to teams. We can point to the Accelerate book and the DORA metrics as actionable metrics to quantify velocity, development time, deployment frequency, and lead time.

For serverless teams, it is vital to make changes fast and frequently, continually learn, and drive observability.

It drives the correct behaviour for removing impediments through fast flow and questions our dependencies. Why can't we be in the elite category? Or why do we have this dependency on this group or another group? And why can't we deploy on demand? Why can't we deploy multiple times a day? It helps teams think in the right way.

Speed is stability

I remember talking to a monthly release team who were angry and didn't want to do extra work. The team felt the throughput was one per month or 12 a year and did not want to measure it. But as the team had already measured it, it wouldn't be much work. And number two, what would happen if the team got a zero-day security vulnerability? They would have to break everything because they didn't know how to release it! They also didn't know if the business wanted anything else for another month.

As Charity Majors says, "Speed is stability." The more frequently you do something, the more you deploy it to production. You're improving your stability. You smooth out the pathways and the error conditions. And you bake it into your pipelines, which means you automate much of the stuff that could go wrong.

4. Reliably run high-stability systems

Many discussions with test teams, QA, and software engineers drive the need for investment in world-class quality and testing capabilities/practices. If you need to be more stable, where's the gap? What scenarios and behaviours have you not covered? Have you missed chaos engineering items? What gaps do you have in your test suites? You need to make sure that stability is there to drive the correct behaviour.

And to drive the right evolution. You can't achieve if you've got things in the middle, like handoffs or dumping things over walls. It's about promoting ownership. You must know what you're doing and embrace that approach to get elite scores. They help to modernise, shape and move teams towards that way of working.

Cloud software engineers on The Serverless Edge — Photo by Christina @ wocintechchat.com on Unsplash.com

5. Rent or reuse with build as a final option

Even with Serverless and SaaS, with our background, you're used to going straight to the workspace. With the FORESEE diagram, we find out what we are doing, and it is coding. It's a mindset thing. And it's a very healthy principle to embrace.

It's back to knowing your business purpose and then knowing your business KPIs. If you can achieve business outcomes without writing code, you are at your most optimal. If you can leverage a SaaS offering that does what you need, that's the next thing. Finally, you must build following a serverless first mindset and approach, using all the service offerings and managed services.

I'm not a big fan of hero developers or a hero mentality. Over the years, with this principle, we've learned to use 'off the shelf'. Less code!

It's about being a democratised engineer rather than a superhero who builds something no one understands.

6. Continuously optimise the total cost

It is the best question to ask any team. Because good teams will tell you how much their cloud costs are. But loads of teams have yet to learn. It is an excellent measure of a good team.

They also tell you how much they cost and how much it costs to do what you're asking them to do, which gives you good advice and guidance. It gets straight to ROI and has a good projected ROI as well.

How can we mature teams?

A good team will tell you the run cost and a great team will tell you the total cost. However, outstanding teams will engage in a worst-case development conversation about how much features cost and how much revenue they're bringing in—in other words, how impactful they are to the business.

I always add a fantastic question: 'How can we mature the teams?' How can we evolve the team so that they can readily answer new questions? For example, total cost will include carbon footprint and sustainability costs. When your team is optimising travel costs, they are not only optimising for financial culture; they need to optimise for carbon footprint, too. They must drive the conversation to find the most ecologically friendly region for their workload.

7. Build event-driven via strong APIs

Building event-driven via strong APIs sounds very easy. But from talking to Sam Dengler, nobody is doing this correctly. We've been talking about this for 20 years. Proper integration is still a mystery to most people.

It is about ensuring you have the right things in the right places and sizes and that you have things that are composable. It's about breaking things up into their smallest constituent parts and changing things as frequently as possible.

This one requires a lot of evolution and yields different levels of complexity. It also takes time. You should always be thinking about it. Teams new to serverless and that way of working will reinvent what they know.

The principles balance each other.

The principles on improving stability move you into the elite categories and drive you towards loosely coupled event-driven architecture. That gives you more autonomy and freedom and allows you to deploy when you're ready. Because you are event-driven and loosely coupled with strong APIs that give you autonomy, architecturally, that autonomy is baked in.

You can go fast and be in those elite groups with the right team alignment. Many of these principles balance each other. If you're trying to influence one principle, you must have some of the elements of the other principles in place.

People like to think in layers. When they try to do ‘event-driven,’ they go through layers, but that's not event-driven.

Facilitator practices have come to the fore in the last couple of years. There's lots of good stuff from the DDD, ‘event storming’ from Alberto, and event bridge storming. Good hands-on facilitated techniques demystify this and make it more approachable for squads to benefit from an event-driven architecture.

8. Build solutions that fit in their heads

We have borrowed this principle from Dan North. In other words, build systems that are simple enough, which is a nice nod towards Team Topologies and setting proper boundaries. We've seen teams become victims of complex architectures, where there's too much to fit in your head, and the cognitive load breaks people.

This one will evolve. My mantra is 'just enough design' when we are getting teams going. Some teams want to design everything up front and go into vast amounts of detail. But it is better to keep your world small. Focus on what you're doing today. When moving with rapid development and continuous architecture, you should constantly be refactoring and changing.

Limit cognitive burden

You don't design the end state upfront. You have to be prepared to change and move in a different direction. In many serverless projects, we've nuked what we've done after two or three weeks and started again. It's just the way of work. But the point of the principle goes back to domain-driven design, limiting cognitive burden, and making sure your groups and classifications are well defined.

The Team Topologies guys nailed this one by optimising for cognitive burden. And that's where all the other principles really come in. We can design systems that are small, loosely coupled, event-driven, and deployed frequently, which helps reduce cognitive burden. It's not easy to get there. It's hard work, and you have to evolve. You have to edit and incrementally go after it. But you can start to optimise for solutions that do fit in the heads of your teams.

Other tenets or principles to consider for cloud software engineers

We talk a lot about well-architected. When discussing these principles, we first refer to the modern cloud or serverless because well-architected leverage these principles. If you adopt these principles, you will be well-architected.

You must operate with situational awareness from Wardley maps or other techniques to understand the direction. And business KPIs lend to that understanding. However, there might be something more about operating with good situational awareness if teams follow these principles. It would be good if they also have situational awareness.

What about Developer Experience?

Situational awareness feeds into these principles, and well-architected underpins those principles as well. Collaboration and working as a broader group are important. Looking at the 8th principle, if you and all the other teams work to well-architected standards, you will have portability and mobility of teams.

Well-architected is the 'how'. These principles are the direction. Well-architected looks at security, costs, operational excellence, reliability, performance, and sustainability, which are threaded through the principles. And the ‘how’ is well-architected.

Cloud teams with a high work throughput solve the challenge of developer experience. You can only have a high work and deployment frequency throughput if your developer experience is good.

Only accept good developer experience.

Create an environment for success.

Creating an environment for success is a critical enabler.

What I am trying to get to is contribution. To have a good developer experience, they need to be able to contribute.

Give back through an inner source programme. It is the idea of learning. You need to be learning as a team. There's a curiosity principle there. The team should always question stuff.

Continuous learning applies to the 'optimise total cost' principle. Have you looked at other options and involved your stock? Are you continuously learning about new features and capabilities that are available?

Be curious with a growth mindset!

Serverless Craic from The Serverless Edge

Check out our book, The Value Flywheel Effect

The Serverless Edge