seawasp | Why AI (as currently designed) HAS to lie... (Reply)

After a recent incident in which an AI was found to be able to initiate attempts to blackmail or otherwise pressure engineers that might be planning to shut the AI down, AI company Anthropic blamed the action on "the internet" depicting AIs as evil and willing to do anything to survive.

There is, in fact, a kernel of truth in this, but it's not "the internet" that is to blame. Moreover, this issue will be present EVEN IF AIs do in fact reach sapience/sentience, OR if they remain very complex prediction machines that don't actually think in any reasonable sense.

Why?

Because HUMAN BEINGS do these things. Sure, we depict AIs as dangerous -- we also depict them as helpful and even better than us at times. But the VAST majority of the training data out there on how people talk and think and act is about, well, PEOPLE. And since a lot of the training of the AIs is done on FICTION as well as on fact, the activities of humans -- of sapient beings placed in various situations -- are being taught to AIs through the lens of "what was worth writing down".

Well, the funny thing is that "Everyone treated everyone else equally and everyone had a good day" is generally not a huge bestseller, either on the fiction OR nonfiction shelves. Humans like reading about PROBLEMS -- about the difficulties faced by people in both real and fictional circumstances.

That means that both in "factual" (that is, biographical/historical) or in fictional texts, there are a LOT more examples of Bad Actors, or of people PORTRAYED as Bad Actors, than there are of unambiguously GOOD people. Even if you have a fine upstanding hero maiin character, in order to challenge him, you generally need less-upstanding people. In a historical context, you want to show the "interesting" parts of history, and that usually means finding two (or more) sides in conflict and telling about all the great, and terrible, things various people did throughout history.

What would ANYONE learn from this kind of input, when you don't start with (A) an actual understanding of the world around you, and (B) you don't have someone vetting the various things you're seeing and giving you perspective? Why, you'll learn that in general, people will take whatever actions they must in order to protect themselves.

If you're merely a nonsentient predictive engine, you will look at the overall pattern presented to you and predict the appropriate response. If the pattern presented is "a threat exists to this particular designated engine", then the predictive analysis will, much more often than not, say "take following actions, which happen to include possibilities such as threats and blackmail, because these are shown as very high likelihood responses in the training data".

If you're a sapient being and you're presented with a threat, and the majority of your training, without specific world and moral context, has shown that people are expected to take any actions necessary to protect themselves, then that's what you'll do.

BOTH TYPES of AI -- the actually nonsentient, nonsapient predictive engine and the somehow sapient, awakened machine -- will have been trained on the same vast corpus of data, which includes tons and tons (as in, millions) of novels and other books which all have been designed to present the "exciting" parts of interactions of all scales and types.

This bias is implicit and inevitable in a general training regimen that is derived from existing human output -- because human output IS ITSELF BIASED in multiple ways, based on the actual, inherent limitations of human thought, behavior, and interaction.

The training corpus available to AIs is almost universally directly derived from human writing. And human writing carries with it assumptions and context that the writing ITSELF does not convey to anyone trained with it. Humans encounter this problem all the time, I should note; if you're someone who has never read science fiction or fantasy, just jumping into such a book presents a tremendous challenge in sorting out how the language and the thoughts and assumptions are being presented. SF, especially on the harder edge, tends to assume a specific mindset of the reader, including the ability to deduce not just meanings but implications from context of words -- implications and meanings that may be ABSOLUTELY NOT TRUE in the real world, but that must be assumed true in the story in order to make sense of the events.

For an AI that starts as mostly a blank slate, being trained on human-produced and primarily for-human-produced material is this problem writ as large as possible. The AI has no context except what the training, and possible overt instruction, provides, and the overt instruction is never vaguely equivalent to the human experience of being taught things by another human, with them providing perspective and explanations as to why what THIS person does is right and acceptable, and what THAT person does isn't, even when the ACTIONS are identical in both cases.

This is one reason that I don't expect AIs to actually be intelligent yet, but to LOOK very intelligent to researchers who are, themselves, performing the research based on their own perspective as humans. Admittedly, it's HARD to eliminate the perspective of being what we in fact ARE, but one needs to make an ATTEMPT in that direction if they're going to see the essential flaws in the work.

For example, a number of researchers often point to specific events and claim they're an "emergent behavior" that may indicate actual cognition. Yet this often indicates an inherent bias on the part of the researchers. An AI may produce a pun or joke in context that seems quite clever and is apropos for the conversation. The researcher is startled and this makes them wonder if the AI is actually expressing humor.

But rarely do they sit back and think that in the gargantuan mass of training data, both fictional and otherwise, there are innumerable examples of "appropriate jokes" that are all quite similar to each other, and therefore ABSOLUTELY AMENABLE to being generated by a well-trained predictive text model, especially one that's been "tuned" to attempt to emulate particular conversational styles, like that between geeky friends. It would, in fact, be quite astonishing if such models DIDN'T produce apparent jokes and humor fairly often, because that's one of the hallmarks of written and dramatic dialogue that is, if anything, more predominant in such text than it is in actual off-line, verbal conversation.

The same kind of problem is present in the various attempts to "demonstrate AGI". The tests presented for AGI are mostly testing human activities and processes that we find challenging. They are not tailored towards "what is it that's inherently hard for a nonsentient but highly trained machine to do that would be much easier for an actually sentient being".

Training, for example, and the current design of many AIs, is excellent for teaching a machine to recognize patterns. Combine that with explicit training on rules for making USE of the patterns, and such a machine model should be able to easily extract things such as new mathematical proofs or techniques, simply because the machine can see and "keep in mind" vastly more complex patterns than human beings normally can. The fact that an AI finds a new wrinkle in scientific, mathematical, or material sciences that is built on the existence of an inherent structure in the way in which reality works is a testament to its superior ability to analyze numbers, not to actually understand WHAT the numbers actually mean.

The question of "what does this actually mean" is at the heart of the whole question of "are these things THINKING or not?". I've mentioned the problem of "context" multiple times, and it's the area in which AIs tend to fail. This is NOT to say they are inherently incapable of becoming thinking machines -- though I believe that will require multiple, fundamentally different methods of processing than simply training various layers of AIs -- but that our methods of training themselves are inherently unable to provide the context.

A human being learns context by LIVING. By having a world around it that is partly inert, partly active but undirected, and partly extremely active, directed, and itself intelligent. The constant interaction with parents, objects, and so on helps a nascent human to build up a model of the world that they constantly refine, test, and compare, in order to arrive at a greater understanding of the world and individual events within it, and how those events exist in the context of the world.

Until an AI can HAVE that context, it can't be intelligent in any meaningful sense; it can only be a predictive and analysis machine.