Skip to main content

In the realm of software development, encountering failures is inevitable. Whether it's a bug in the code, an unexpected system state, or an invalid input, failures can cripple an application if not managed effectively. Two common approaches to handling failures are failing safely and failing fast. But which one is the better strategy? In this blog, we'll delve into the benefits of failing fast and explore how it can improve the robustness of your systems.

  1. When it comes to handling failures on the user interface (frontend), failing safely is crucial. The frontend is the closest you can get to the end-user, and crashing the UI in the event of a failure is the worst response possible. Instead, it's important to gracefully handle failures by serving cached data, providing partial responses, or offering a retry option. As a final resort, a generic error message should be displayed, ensuring a better user experience and preventing churn and uninstalls.
  2. In any other situation outside of the frontend, failing fast is a superior strategy for improving system robustness. Failing visibly by halting execution as soon as an invalid state is encountered makes bugs much easier to find. When a system clearly indicates that it has reached an invalid state, errors are more likely to be noticed by all parties involved. Even with logging in place, an error that "blows up in your face" is harder to miss and easier to debug. By failing fast, you simplify the debugging process. Instead of having to trace back all the execution steps to find the point where the program diverted into an invalid state, you have a clear indication of where the failure occurred. This enables you to pinpoint the issue quickly and effectively resolve it, saving valuable time and resources.
  3. Allowing execution to continue after encountering an invalid system state can lead to cascading failures. Without careful control over all possible continuation scenarios, the system may enter unknown territory, resulting in more invalid states. This domino effect can quickly spiral into a larger failure, causing more damage than if it had been stopped in its tracks. Failing fast acts as a barrier against cascading failures, preserving the integrity of the system.
  4. Failing safely often requires adding branches of code paths to handle various scenarios, increasing the complexity of the system and adding cognitive load. This can ultimately lead to more mistakes and a higher likelihood of errors. On the other hand, failing fast promotes predictable and deliberate programming. By failing fast, you eliminate the possibility of unexpected states, allowing you to work with a simpler mental model of the system. This clarity and confidence in the system's state enable you to write code more confidently, reducing the likelihood of mistakes and improving overall code quality.
  5. Assertive programming is a software development methodology that aligns well with the fail-fast approach. It involves using assertions in the code to continuously validate the system's state, crashing if the assertion's criteria are not met. This practice helps validate assumptions as you write code, providing an additional layer of confidence and ensuring that potential problems are caught early on. For example, let's say we're selling items to adults. An assertion could be added to ensure that only individuals aged 18 or above are served:
const adult = generateAdult();
assert(adult.age >= 18);
sellItemTo(adult);

While this may seem like a simple check, it serves as a strong defense against unexpected scenarios and safeguards the system's integrity.

In the world of software development, failures are inevitable. But how we handle these failures can make a significant difference in the robustness of our systems. Failing fast, except for the frontend where failing safely is crucial, offers several advantages. It helps us identify and address bugs quickly, prevents cascading failures, simplifies mental models, and promotes assertive programming. By embracing a fail-fast approach, we can build more resilient systems and deliver better experiences to end-users. So, the next time you encounter a failure, remember the importance of failing fast and leveraging its benefits to your advantage.

Integrate People, Process and Technology