Availability as experienced by users, is the tip of the ice cube. The way some services fail, the incident is changing even as you’re trying to resolve it. The underlying problems are “open, complex, dynamic and networked” (Dorst, K). It is why we need systems engineering for service design.
Traditional RCA methods don’t account for feedback loops. People and things involve may behave and respond in ways that frustrates the recovery effort. And that would be across many different sites and locations. We really need new ways to approach new kinds of failures.
In the era of AI, IoT, and cloud computing, human-centered design is “necessary but not sufficient” for designing services. We also need thing-centered design.
I cover some of this in my upcoming book.