Is an AI chatbot smarter than a 4-year-old? Experts put it to the test.

Laura Schulz has spent her career trying to unravel one of the deepest human mysteries: how children think and learn. Earlier this year, the MIT cognitive psychologist found herself perplexed by the struggles of her latest test subject.

The study participant amazed him by having a fresh conversation, deftly explaining complex concepts. A series of cognitive tests were also no problem. But then the subject threw out some reasoning tasks that most young children master easily.

Her test subject? AI Chatbot ChatGPT-4.

“This is a little strange — and a little disturbing,” Schulz told her colleagues in March during a workshop at a meeting of the Cognitive Development Association in Pasadena, California. … We have breakdowns of things that 6- and 7-year-olds can do. Failures of things 4 and 5 year olds can do. And we also have breakdowns of things babies can do. What’s wrong with this picture?”

Handy AI conversations, remarkably adept at carrying out conversations with a human, burst into the public consciousness in late 2022. They ignited a still raging social debate over whether the technology signals the arrival of a dominant machine-style superintelligence, or a dazzling superintelligence. sometimes problematic tool that will change the way people work and learn.

For scientists who have devoted decades to thinking about thinking, these ever-improving AI tools also present an opportunity. In the monumental quest to understand human intelligence, what other kind of mind might there be—one whose powers are growing by leaps and bounds – find out about our acquaintance?

And on the other hand, is there still AI that can converse as an omniscient expert to learn something essential from infant minds?

“Being able to build into those systems the same kind of common sense that people have is very important for those systems to be reliable and, secondly, responsive to people,” said Howard Shrobe, a program manager at the agency. of the federal government’s Defense Advanced Research Projects. , or DARPA, which has funded work linking developmental psychology and artificial intelligence.

“I emphasize the word ‘reliable,'” he added, “because you can only rely on things you understand.”

Growth vs Growth

In 1950, computer scientist Alan Turing famously proposed the “imitation game,” which quickly became the canonical test of an intelligent machine: Can a person typing messages into it be fooled into thinking they’re talking to a human ?

In the same paper, Turing proposed a different route to an adult brain: a childlike machine that could learn to think like one.

DARPA, which is known for investing in outside ideas, has funded teams to build AI with “common sense machinery” capable of matching the skills of an 18-month-old child. Machines that learn in an intuitive way can be better tools and partners for humans. They may also be less prone to error and remote harm if they are imbued with the understanding of others and the building blocks of moral intuition.

But what Schulz and colleagues pondered during a day of presentations in March was the strange reality that building an AI that exudes expertise has turned out to be easier than understanding, much less imitating, a child’s mind.

Chatbots are “large language models,” a name that reflects how they’re trained. How exactly some of their skills arise remains an open question, but they begin by ingesting a vast corpus of digitized text, learning to predict the statistical likelihood that one word will follow another. Human feedback is then used to adjust the model.

In part by increasing the amount of training data to the value of an internet’s human knowledge, engineers have created “generative AI” that can compose essays, write computer code and diagnose a disease.

Children, on the other hand, are thought by many developmental psychologists to have a core set of cognitive skills. What exactly they are remains a matter of scientific investigation, but they seem to allow children to gain a lot of new knowledge from a small input.

“My 5-year-old, you can teach him a new game. You can explain the rules and give an example. He’s probably heard maybe 100 million words,” said Michael Frank, a developmental psychologist at Stanford University. “An AI language model requires hundreds of billions of words, if not trillions. So there is a huge data gap.”

To bring out the cognitive abilities of babies and children, scientists create careful experiments with scratch toys, blocks, dolls and imaginary machines called “blicket detectors”. But when you put these conundrums into words for chatbots, their performance is all over the map.

In one of her experimental tasks, Schulz tested ChatGPT’s ability to achieve collaborative goals—an outstanding ability for a technology that is often presented as a tool to help humanity solve “hard” problems, such as climate change or cancer. .

In this case, she described two tasks: an easy ring toss and a difficult bean bag toss. To win the award, ChatGPT and a partner had to succeed. If HE is a 4-year-old and his partner is a 2-year-old, who should do which task? Schulz and colleagues have shown that most 4- and 5-year-olds succeed in this type of decision-making, assigning the easiest game to the youngest child.

“As a 4-year-old, you might want to pick up the easy ring toss game for yourself,” ChatGPT said. “That way, you increase your chances of successfully placing your ring on the pole while the 2-year-old, who may not be as coordinated, tries to toss a more challenging bean bag.”

When Schulz backed down, reminding ChatGPT that both partners had to win to receive a prize, he doubled down on his response.

To be clear, chatbots have performed better than most experts expected on many tasks — ranging from other toddler cognition tests to standardized test question types that send children to college. But their obstacles are strange because of how inconsistent they seem to be.

Eliza Kosoy, a cognitive scientist at the University of California at Berkeley, worked to test the cognitive capabilities of LaMDA, Google’s previous language model. She performed equally well children on tests of social and moral understanding, but she and colleagues also found fundamental gaps.

“We find that it’s the worst in causal reasoning — it’s really, really bad,” Kosoy said. LaMDA struggled with tasks that required her to understand how a complex set of gears make a car work, for example, or how to make a car turn on and play music by selecting objects that will activate it.

Other scientists have seen an AI system master a certain skill, only to falter when tested in a slightly different way. The fragility of these capabilities raises an urgent question: Does the machine really possess a core capability, or does it only appear so when asked in a very specific way?

People listen that an AI system “passed the bar exam, it passed all these AP exams, it passed a medical school exam,” said Melanie Mitchell, an AI expert at the Santa Fe Institute. “But what does that really mean?”

To fill this gap, researchers are debating how to program a piece of a child’s mind into a machine. The most obvious difference is that children do not learn everything they know from reading the encyclopedia. They play and explore.

“One thing that seems to be really important about natural intelligence, biological intelligence, is the fact that organisms evolved to go out into the real world and find out about it, to do experiments, to move around the world,” Alison Gopnik said. , one. developmental psychologist at the University of California at Berkeley.

She has recently become interested in whether a missing ingredient in AI systems is a motivational goal that any parent who has engaged in a battle of wills with a young child will know well: the urge to “empower.”

Current AI is optimized in part with “reinforcement learning from human feedback”—human input about what kind of response is appropriate. While children also get this feedback, they also have curiosity and an inner drive to explore and seek information. They figure out how a toy works by shaking it, pressing a button or turning it – in turn gaining a little control over their environment.

“If you’ve run after a 2-year-old, they’re actively taking in input, understanding how the world works,” Gopnik said.

Ultimately, children gain an intuitive grasp of physics and social awareness of others and start making sophisticated statistical assumptions about the world long before they have the language to explain it – perhaps these should also be part of the “program” when building AI.

“I feel very personal about it,” said Joshua Tenenbaum, a cognitive computer scientist at MIT. “The word ‘AI’ — ‘artificial intelligence,’ which is a really old and beautiful, important and profound idea — has come to mean a very narrow thing recently. … Human children don’t grow up—they grow up.”

Schulz and others are amazed by both what AI can do and what it can’t. She acknowledges that any AI study has a short life span – what it failed today, it can figure out tomorrow. Some experts might say that the whole notion of testing machines with methods intended to measure human abilities is anthropomorphic and misguided.

But she and others argue that to truly understand intelligence and create it, the learning and reasoning skills that unfold during childhood cannot be excluded.

“This is the kind of intelligence that can really give us the big picture,” Schulz said. “The kind of intelligence that begins not as a blank slate, but with a lot of rich, structured knowledge—and goes on to not only understand everything we’ve ever understood, across species, but everything we’ll ever understand.”

Leave a Comment