no trust in black box ai

I'm a software guy, and have been a while. I've had the pleasure of witnessing or studying many a software failure, and even causing a few. Comes with part of the job. When a software system fails, we open it up, take a look at how it works, make a patch, then close 'er up and release a new version. Done, more or less, usually. This is possible because the "how it works" part - the computer program - is generally available for inspection and modification. This is especially true in the free/open-source part of the industry, where all the program source code is available to end-users.

Enter the brave new world of neural-network or "machine learning" based AI - the new hotness in the last decade or so. All of the above goes out the window. There is nothing to open, because all there is is an array of numbers, millions and billions of them. There is no finding out how it works. All the developers know, roughly, is that after tremendous amounts of automated computation, the neural nets generally produce expected outputs for most inputs they tested. That means that if something goes wrong, there is usually no practical engineering fix, other than running a new batch of automated computation, hoping the new network learns by itself how to avoid the problematic case.

The result is something that looks like it works, but we don't really know how & why. What could go wrong?

Tesla autopilots are based on computer vision, which uses a bunch of neural networks that consume video data from several cameras around the car. Some researchers successfully fooled the neural nets to misinterpret speed limit signs by putting little stickers on real signs. We know there have been a bunch of autopilot crashes where the software was just not good enough, where road markings were weird or construction equipment confused the thing. Hey, don't worry, a later version probably corrects some of those problems, maybe!

And then how about this bit. Researchers proved mathematically that it is possible to deliberately mistrain a neural network to give it undetectable backdoors: undesired responses that the adversary can trigger on demand. The implications are grave: the chain of custody for artifacts of the neural-network training process needs to be pristine. Ken Thompson's classic Reflections on Trusting Trust paper is analogous for normal software - but at least normal software is inspectable.

Oh and how about some politics? That always helps! Have you heard of "bias in AI"? This is referring to the phenomenon whereby neural networks, computing accurately from their input data, can produce politically undesirable outputs. Very real inequalities in real life naturally show up, and golly gee howdy, they must be suppressed by the AI system operators, or else. So google plays with its search engine to "diversify" its hits; OpenAI adds fake keywords for the same purpose; and many more examples are all around the economy. On the bright side, at least these technical measures tend to be crude and leave visible trauma in the computer system: more like a electro-shock-treatment than relearning. As the political winds change, at least they can be removed. The fragility of such mechanisms is the exception that proves the rule of neural network unfixability. To an end-user, in either case, it's still opaque. See also.

Finally, if you are just tired of sleeping, check this out. An image-synthesizing neural network you can play with has apparently invented its own monster. This was not in its training set but something new, and hallucinated a name for it: crungus. Check out the whole thread or the #crungus hashtag. While it is possible that the operator of this system planted this as an easter egg of sorts, akin to the researchers "hidden backdoors" above, it could also be just an emergent genuine artifact of the giant neural network behind this service. There is literally no way to know how many such hidden concepts the thing has inside its billions of numbers - how many are real, how many are fake, how many are ... messages from the occult? This is unusable for serious purposes.

Normal software can and often does become incredibly complicated, requiring discipline and aptitude to develop and troubleshoot, and yeah things still go wrong. But neural networks represent a leap off the cliff: don't even try to engineer the thing. Putting such systems into roles of safety-critical judgement is a dereliction of duty. For what it's worth, symbolic AI systems like Cyc are explicitly engineered, but there is no free lunch: they are expensive and complicated.