How to make well-architected work for organisations (2)
Security Cost OpEx Reliability & Performance (SCORP) Process Cycle – part 2
This article is the second in a series – part 1 gives an introduction to the well-architected tool you need to implement the well architected framework.
How the process works
Now that we have established the need for well-architected in an organization and the need for a lightweight process or well architected tool to drive improvements, let’s deep dive into how that process, the SCORP process, works in practice.
In Figure-1 I have tried to model the process cycle. It is based around two main cadences.
Target a well-architected review (WAR) Benchmark each quarter
Target a group SCORP Team Dashboard review each sprint
As you may realise, the quarterly well-architected review (WAR) process is a standard review of the team’s workload with a Solutions Architect. This is aimed to be a thorough deep dive into each of the five pillars and is described above.
SCORP Review Process
The SCORP review process is aimed at facilitating more frequent reviews of the team’s operational metrics. The frequency is aimed at frequent cross-team collaboration to share experiences, practices, approaches and learnings in relation to developer operations and production workload performance. “Learning from the mistakes of others.” But also learning from their successes, of which I can tell you are many.
The rationale for the frequent 2-week cycle is the following:
Why solve the same problems in every team? A problem shared is a problem halved. But creating situational awareness into Engineering Excellence across all teams is a potent force for good.
Connecting developers and teams. Ideally, we should be working together to improve and gain more economy of scale. Teams should be helping each other improve.
Maintain emphasis on the good fight, issues with the operation and performance need to be surfaced with regularity, and sometimes that means pushing back or creating space in the development backlog for improvement.
Build the passion for excellence/fuel curiosity, get ideas out into the open, celebrate the small wins, generate pride in a good job.
The process can be used as an Alignment and Well Architected tool for architecture to remain engaged at the team level.
The SCORP Review Process Flow
With the solid rationale above for the 2-week collaboration cycle, we have to make the process work. This is a high-level description of the flow and how it currently works for us today:
All ten teams are represented by the teams Principal Engineer and Delivery Lead/Scrum Master.
The Architect facilitates the session.
If not already fully automated, then before the session, the Principal Engineer will prepare their team’s SCORP report/dashboard for review.
The review lasts around 90 mins to 120 mins. Each team takes 10 mins to talk through key trends, typically focusing on previous reports’ deltas.
The Architect will go through any high-level ‘Notice To All Developers” (NOTADs). These are things like enterprise impacts, security mandates or things like changes to the pipelines etc.
We will review current DevOps actions for each team as well and getting a quick summary of progress. If there are items that come up during the review that the team wants to research, they will follow up.
For deeper dive topics that come up frequently (i.e., testing methodology, review of analytics tool setup and cfg), These get added to the portfolio summary. The Architect or Principal engineer will then typically set up a future tech share on the topic.
The role of the well-architected facilitator
The facilitator has an important role. On a purely mechanical level, the facilitator has the responsibility for keeping the cycle on track, no mean feat when you have ten squads. This requires discipline and the process to be used wisely, i.e., knowing what to pack and what to get into. The facilitator ideally is an architect with plenty of experience in DevOps and the well-architected tool and framework, but they also need to be in a position of authority and have some influence on the team’s ability to prioritise their work.
The rationale for the facilitator role
The rationale for this is the following:
The facilitator has to ask questions? They should be comfortable directing the team to look at an area that is maybe not trending well, i.e., cost, performance response times etc. but realistically with continuous improvement in mind.
The facilitator should always be attempting to connect the engineers and teams. I.e., if one team has an amazing BDD technique and another could benefit from it, then the facilitator should suggest them pairing up.
They should constantly be evolving the SCORP process. It has to be a value add for all the engineers and teams. If something is not working or adding value, then the facilitator has to address it.
The facilitator should always be attempting to generate interest in what the teams are trying to achieve, whether in the review or post review with product owners or management.
They will celebrate the successes and wins of each team with the group, no matter how small. Progress is progress! Teams and engineers should leave feeling like a million bucks when improving.
They will facilitate a positive sharing environment where all voices get heard. Failures are never negative. They are an opportunity to learn and become better as we do!
It is a challenging role to fulfil, but it is a super important one to ensure the success of the process cycle.
SCORP Process Cycle Day Zero
Having tried to facilitate similar processes earlier in my career, the most significant learning that I had was that “You need to meet the teams where they are at.” Some teams will have been together for an extended period of time and have great DevOps with high levels of automation and insight. Other teams will not; they may be newly formed, or for one reason or another might not have had the time, expertise or capacity to achieve the operational maturity of some of the more experienced teams around them.
You also need to make sure that management and the business are aware of the well-architected tool initiative’s aims and goals. This is because it does require investment from the teams in terms of participation, but sometimes doing the right thing involves slowing down from time to time which is sometimes not a straightforward conversation when it comes to meeting delivery timelines.
SCORP Docs
Before entering into the cycle, I had several meet-ups with the Principal Engineers and Delivery Leads for our squads. This is important as its aimed at getting the teams and lead engineers to own the SCORP Review Cycle. They have to buy in and feel like they have a stake in the process. During this time, I worked with them to agree on the following things:
Ways of Working Agreement — Frequency of the cycle, Safe Space Commitment, attendance, length of meeting etc. All the typical things you would tend to find in a working agreement.
The SCORP Report Template — The SCORP report template structure was the task that we, the group, spent the most time on. The SCORP Report contains all the critical operational metrics relating to team and workload performance. The insights to be collected and reviewed via this template had to be influenceable and impactful to all teams and structured around the five pillars of the Well-Architected Framework. In the beginning, we agreed for each team to begin collecting these metrics using a Confluence wiki page. It might be quite controversial, but I told the teams upfront that having to manually update their wiki page every two weeks with the key metrics was part of a rather devious plan that annoys them enough to invest in automating these insights. Luckily this has been the case.
A sample SCORP report/dashboard template for the well-architected tool
Wrapping Up
Becoming a High Performing Team takes time and effort. It requires investment in your craft, learning from your successes and failures but also the successes and failures of others. The process is certainly not perfect, and it will evolve and change, but that is what I would expect as the software industry is constantly changing and evolving. To borrow a term from Jeff Bezos we are well on our way to creating our own operational FlyWheel.
SCORP Operational Improvements
Now that we’re observing our trends regularly, we are beginning to observe some positive changes. I wish that I could share more quantifiable data, I will endeavour to do that with future write-ups, but in Qualitative terms, all teams involved in my SCORP Review Cycle have made significant operational improvements. I have summarised some of my favourites and most impactful below:
The majority of teams have produced automated dashboards (DataDog, Splunk etc.) We are now tracking significantly more data points than we were three months ago.
We have seen a significant focus on performance improvements. All squads have reduced response times through activities such as performance tuning.
The collaboration around test automation has been phenomenal. Once we saw that integration testing was an issue across all teams, it became a safe topic of discussion, and the innovation began. BDD is now starting to take hold across the teams.
Security is front and centre. We have seen much more investment in activities such as threat modelling and the facilitation of mitigations of identified threats.
From an operational excellence perspective, we have had a sub-community form around observability and workload release confidence.
The progress is clear to see in subsequent WAR reviews.
Conclusion
As an architect, I couldn’t be happier or prouder of the team’s contributions. We are well on our way to meeting the goals of my original hypothesis. I have never been in any doubt about what we could begin to achieve through a bit of investment in continuous improvement but seeing it, in reality, is quite rewarding. I would recommend that any technical leader implement a SCORP review Process Cycle based on Well-Architected tool to drive engineering excellence within their organisation.
Originally published on The Serverless Edge