AI has been all the rage lately, and for good reason. AI promises to solve problems for us. What kinds of problems? Well…all of them, hopefully! If we can make AI that is as smart and capable as us, it must by definition be capable of solving at least all the problems we could solve ourselves, and probably many more.
But just because a system can solve a problem does not mean it will.
The world’s governments could solve climate change and world peace, but will they? It’s kind of strange, isn’t it? We’ve built institutions that are powerful enough to solve global crises like these but they…won’t do it? But we built them! Why won’t they do what we want? And what does this tell us about AI?
Being Good
What does it mean to “be good”? Nevermind the philosophical hairsplitting, just intuitively, there is a sense of how there are some kind of things we want in the world that are “good”. Peace, happiness, freedom, prosperity, whatever. And we can think of anyone, or anything, that is acting to bring our world closer to those ideals to be “good”1.
But as you can surely tell just from thinking about this for 15 seconds…it’s really, really not that simple!
How many times historically have we learned the lesson that taking even a well meaning, but naive, conception of what is “good” and applying it forcefully is usually not a good idea? There are so many edge cases!2 How many disagreements do even completely sane and reasonable people3 have about what is truly “good”?
Well darn, this seems tricky.
So the first thing for us to notice is that, even under ideal circumstances, choosing what to do to “be good” is really hard4, because it involves all kinds of sticky questions about morality, emotions, compromises, etc.5
Being Good at Something
Now there is also another meaning of the phrase “to be good”, which is to be good at some specific skill. “To be good at basketball” or “to be good at programming” or “to be good at making money” or whatever.
Immediately, it feels like this is so much easier than what we were talking about before. What does it even mean to “be good”? But being good at basketball? That’s something we can understand, measure and improve!
And AI loves things that you can understand, measure and improve.
You’ll see again and again news articles breathlessly talking about what new game, or benchmark or task AI has aced, getting higher and higher scores, learning new skills, etc. This is exactly what we build AI to do, and it’s easy, fun and profitable! If you’re a company and want to buy AI, you’re buying it because it has some kind of useful skill you want to exploit, why else would you buy it?6
So by default, basically all types of progress we make on AI is making it better at doing some kind of thing, things that ideally are measurable and improvable! Whether that’s points in a game, money in a bank account or whatever.
Being Good is a Handicap
So when we put these two observations together, we notice an unfortunate situation: Making our AI better and more powerful at all kinds of skills is easy, but getting it to be good, to do what we want, is way harder! How do you easily measure “being good”? What would that even mean? Can you write that down in math, please?
And it’s even worse than that! Being good directly goes against being good at most things! It’s much easier to make money if you’re willing to lie and cheat.
Goodness is a constraint on our AI’s skills, it’s making it handicap itself to act more like we like. But this means that any advances we make on making the AI more powerful will almost certainly also make it better at doing things that are not good, such as lying or hiding its true intentions from us!7
AI, by default, will not have human emotions like us, why would it? It won’t feel bad about lying or breaking rules, it won’t feel bad about going behind your back. If that makes it win, that’s what it does, that’s what we built it to do!
Being Good is Hard
So where does this leave us? It’s like we have a tug of war pulling us in two directions.
In the one direction we want to pull towards AI systems that “are good” and do what we think is good. On the other, we have all the work to make our AIs smarter, more general, better at winning and making money.
The former is handicapped by being so, so much harder and complicated, while the latter is simple, tractable, and each improvement makes you a ton of money, so it also has 100000x the funding compared to being good.8
The danger from AI does not come from some kind of “evil property” that must be “removed” from the AI. It comes from competence, competition and the inherit complexities of what we really want.
Or at least trying to be.
Honestly, it sometimes feels like it’s more edges than not.
Insofar as those exist.
It’s not even clear there are universally correct answers at all!
This doesn’t mean there are never pretty unambiguous ways to be more or less Good, or that there aren’t often improvements on the margin, but in general this is just a really, really sticky problem.
“Investor relations fraud” is the obvious other answer.
Or worse, creating whatever the fuck is on YouTube Kids. Someone needs to go to prison for that shit for real.
You tell me if that sounds like a recipe for a good outcome.
This is good
I really like seeing more conversation about this kind of misalignment. Good vs good at is one kind, actually true vs plausible/engaging is another dimension where tradeoffs are being made that are not in the direction of improving the world.