Recent years have seen growing concern among policymakers and the public about the “explainability” of artificial intelligence systems. As AI becomes more advanced and is applied to domains like healthcare, hiring, and criminal justice, some are calling for these systems to be more transparent and interpretable. The fear is that the “black box” nature of modern machine learning models makes them unaccountable and potentially dangerous.
While the desire for AI explainability is understandable, its importance is often overstated. The term itself is ill-defined—what criteria exactly makes a system explainable remains unclear. More importantly, a lack of explainability does not necessarily make an AI system unreliable or unsafe.
It’s true that even the creators of state-of-the-art deep learning models cannot fully articulate how these models transform inputs into outputs. The intricacies of a neural network trained on millions of examples are simply too complex for a human mind to fully grasp. But the same could be said of countless other technologies we use every day.
We don’t completely understand the quantum mechanical interactions underlying chemical manufacturing processes or semiconductor fabrication. And yet that doesn’t stop us from benefiting from the pharmaceuticals and microchips that are produced using this partial knowledge. What we care about is that the outputs succeed at accomplishing their objectives and are reliable.
When it comes to high-stakes AI systems, we should focus first and foremost on testing them to validate their performance and to ensure they behave as intended. Probing a criminal sentencing algorithm to understand exactly how it combines hundreds of features is less important than assessing its empirical accuracy at predicting recidivism rates among ex-cons.
An emerging field called AI interpretability aims to open up the black box of deep learning to some extent. Research in this area has yielded techniques for identifying which input features are most salient in determining a model’s predictions, and for characterizing how information flows through the layers of an artificial neural network. Over time, we will gain a clearer picture of how these models process data to arrive at outputs.
However, we shouldn’t expect AI systems to ever be totally explainable in the way a simple equation or a decision tree might be. The most powerful models will likely always entail some level of irreducible complexity. And that’s okay. Much of human knowledge is tacit and hard to verbalize—a chess grandmaster can’t fully explain his strategic intuition, and a skilled painter can’t fully articulate her source of inspiration. What matters is that the end results of their efforts are valued by themselves and others.
Indeed, we must be careful not to fetishize explainability to the detriment of other priorities. An AI that can be readily interpreted by a human is not necessarily more robust or reliable than a black box model. There can even be trade-offs between performance and explainability. Michael Jordan may not be able to explain the intricate details of how his muscles, nerves, and bones coordinated to execute a slam dunk from the free throw line. Yet he was able to perform this impressive feat regardless.
Ultimately, an AI system should be evaluated based on its real-world impact. A hiring model that is opaque but more accurate at predicting employee performance is preferable to a transparent rule-based model that recommends lazy workers. A tumor detection algorithm that can’t be explained but catches cancers more reliably than doctors is worth deploying. We should strive to make AI systems interpretable where possible, but not at the cost of the benefits they deliver.
Of course, this doesn’t mean AI should be unaccountable. Developers should test AI systems extensively, validate their real-world performance, and strive to align them with human values, especially before unleashing them on the broader world. But we shouldn’t let abstract notions of explainability become a distraction, let alone an obstacle, to realizing the immense potential of artificial intelligence to improve our lives.
With appropriate precautions taken, even a black box model can be a powerful tool for good. In the end, it’s the output that matters, not whether the process that delivered the output can be explained.