a·gen·tic a·gil·i·ty

Resilience is Part of the Product, Not an Afterthought

Resilience must be designed into products from the start, not added later. Build systems to detect, contain, and recover from failures, making resilience a core feature.

Published on
4 minute read
Image
https://nkdagility.com/resources/EtzHUfsWjsD

Resilience is not a nice-to-have. It is not a department. It is not something you bolt on later if you get around to it. Resilience is part of the product. If you are serious about delivering value, you design resilience deliberately from day one. Any other approach is just gambling with your business, and is adding to your technical debt .

Real resilience is not about having good people with pagers. It is not about heroes. Heroes emerge when systems lack resilience. They hoard work, avoid transparency , and justify cutting corners by claiming they are “doing whatever it takes.” In reality, they introduce silent risks, undermine teamwork, and erode quality standards.

If your resilience depends on a hero, you are not resilient. You are vulnerable and you just have not been exposed yet.

Resilience is a Core Feature

Resilience must be treated like any other core feature. It must be designed, built, and continuously improved. It must be part of your product definition, your architecture, and your engineering culture. It must be owned by the same people who build the product. At Microsoft, the Azure DevOps engineering teams did exactly that, they built resilience which was engineered into every layer of their system — not handed off to a separate Ops team, not left to wishful thinking. Engineers owned their live site experience end-to-end form ideation to validation and all of the design, build, test, release and run in between.

Incidents were expected, contained, and learned from, not blamed on individuals. They did not hope for resilience. They built it.

If they did have an incident, they would own it, not just fix the problem and sweep it under the rug.

Build for Containment, Not Perfection

Every serious product needs resilience capabilities: telemetry, rapid roll-forward, observability, and risk containment.

Without telemetry, you cannot see what is happening. Without rapid roll-forward, you cannot respond fast enough. Without observability, you cannot understand why things are happening. Without risk containment, small failures turn into major outages.
If you have to shut down your entire platform to fix one feature, you have already failed.

Microsoft’s teams built telemetry into everything. They measured customer experience directly — failed or slow user minutes — not just server uptime. They tuned alerts to detect real-world impact. They used safe deployment rings with deliberate bake times to catch problems early. They separated deployment from exposure using feature flags, and stopped cascading failures with circuit breakers and throttling.

Failures were not exceptional. Failures were normal.
Resilience was not improvised. It was engineered.

Treat Resilience as a First-Class Investment

Resilience is not free, but the cost of neglecting it is far higher. Downtime kills customer trust. Outages cost revenue. Slow recovery wrecks morale. Ignoring resilience is gambling with your business.

Treat resilience like a feature. Design it. Engineer it. Continuously improve it. Put it in your Definition of Done . Make it part of every code review, every architecture discussion, every release decision. If you are not actively designing for resilience, you are designing for fragility whether you mean to or not.

Build for failure. Measure resilience empirically. Improve relentlessly.

Pragmatic Steps to Build Resilience

You do not need permission to start. You do not need to fix everything at once. You just need to move with intent:

Failure is Inevitable. Your Response is Optional.

You will never eliminate failure. That is not the goal.
The goal is to ensure that failures are small, contained, quickly detected, and rapidly recovered without compromising your product or your business.

If you want resilience, build it deliberately. Make it part of your product. Treat it with the same seriousness as security, scalability, and usability. Anything less is just gambling that the next crisis will not be the one that takes you down.

Resilience is not heroism. Resilience is system design.
Own it as you would any other critical feature. Because it is one.

Technical Mastery Pragmatic Thinking Site Reliability Engineering Technical Excellence Operational Practices … 2 more Software Development Engineering Practices
Subscribe

Related Blog

Related videos

Connect with Martin Hinshelwood

If you've made it this far, it's worth connecting with our principal consultant and coach, Martin Hinshelwood, for a 30-minute 'ask me anything' call.

Our Happy Clients​

We partner with businesses across diverse industries, including finance, insurance, healthcare, pharmaceuticals, technology, engineering, transportation, hospitality, entertainment, legal, government, and military sectors.​

Jack Links Logo

Jack Links

YearUp.org Logo

YearUp.org

Emerson Process Management Logo

Emerson Process Management

MacDonald Humfrey (Automation) Ltd. Logo

MacDonald Humfrey (Automation) Ltd.

Big Data for Humans Logo

Big Data for Humans

Deliotte Logo

Deliotte

Freadom Logo

Freadom

Cognizant Microsoft Business Group (MBG) Logo

Cognizant Microsoft Business Group (MBG)

Genus Breeding Ltd Logo

Genus Breeding Ltd

Healthgrades Logo

Healthgrades

Slicedbread Logo

Slicedbread

Schlumberger Logo

Schlumberger

Akaditi Logo

Akaditi

Bistech Logo

Bistech

Boeing Logo

Boeing

Qualco Logo

Qualco

Microsoft Logo

Microsoft

Ericson Logo

Ericson

Washington Department of Transport Logo

Washington Department of Transport

Washington Department of Enterprise Services Logo

Washington Department of Enterprise Services

Department of Work and Pensions (UK) Logo

Department of Work and Pensions (UK)

New Hampshire Supreme Court Logo

New Hampshire Supreme Court

Ghana Police Service Logo

Ghana Police Service

Royal Air Force Logo

Royal Air Force

Lockheed Martin Logo

Lockheed Martin

Teleplan Logo

Teleplan

Milliman Logo

Milliman

New Signature Logo

New Signature

Genus Breeding Ltd Logo

Genus Breeding Ltd

Boeing Logo

Boeing