Scientists Thought This Would Make AI Worse but It Made It Smarter

It turns out that training an AI in a perfectly controlled environment may help it…

It turns out that training an AI in a perfectly controlled environment may help it perform better in chaotic, real-world settings.

MIT researchers found that AI agents trained in a noise-free simulation often outperformed those trained in noisy conditions—even when tested in unpredictable environments. This “indoor training effect” defies conventional AI training wisdom and could lead to breakthroughs in how artificial intelligence is prepared for real-world tasks.

Unexpected AI Training Success in Unfamiliar Environments

A home robot trained in a factory to handle household chores may struggle with tasks like scrubbing the sink or taking out the trash when placed in a real kitchen. The unfamiliar environment can disrupt its performance.

To prevent this, engineers typically design training simulations that closely resemble the real-world settings where the AI will operate.

However, researchers from MIT and other institutions have discovered that this approach isn’t always the most effective. Surprisingly, training an AI in a completely different, more controlled environment can sometimes lead to better performance.

Their findings suggest that AI agents trained in a predictable, noise-free environment often outperform those trained in a more variable, noisy setting — even when tested in that same noisy environment.

The researchers call this unexpected phenomenon the indoor training effect.

“If we learn to play tennis in an indoor environment where there is no noise, we might be able to more easily master different shots. Then, if we move to a noisier environment, like a windy tennis court, we could have a higher probability of playing tennis well than if we started learning in the windy environment,” explains Serena Bono, a research assistant in the MIT Media Lab and lead author of a paper on the indoor training effect.

The Indoor-Training Effect: Unexpected Gains from Distribution Shifts in the Transition Function. Credit: MIT Center for Brains, Minds, and Machines

Testing the Theory: Atari Games and Surprising Results

The researchers studied this phenomenon by training AI agents to play Atari games, which they modified by adding some unpredictability. They were surprised to find that the indoor training effect consistently occurred across Atari games and game variations.

They hope these results fuel additional research toward developing better training methods for AI agents.

“This is an entirely new axis to think about. Rather than trying to match the training and testing environments, we may be able to construct simulated environments where an AI agent learns even better,” adds co-author Spandan Madan, a graduate student at Harvard University.

Bono and Madan collaborated on the paper with Ishaan Grover, a graduate student at MIT; Mao Yasueda, a graduate student at Yale; Cynthia Breazeal, a professor of media arts and sciences at MIT and head of the Personal Robotics Group in the MIT Media Lab; Hanspeter Pfister, the An Wang Professor of Computer Science at Harvard; and Gabriel Kreiman, a professor at Harvard Medical School. Their research will be presented at the Association for the Advancement of Artificial Intelligence (AAAI) Conference.

Training Troubles: Why AI Struggles in New Spaces

The researchers set out to explore why reinforcement learning agents tend to have such dismal performance when tested on environments that differ from their training space.

Reinforcement learning is a trial-and-error method in which the agent explores a training space and learns to take actions that maximize its reward.

The team developed a technique to explicitly add a certain amount of noise to one element of the reinforcement learning problem called the transition function. The transition function defines the probability an agent will move from one state to another, based on the action it chooses.

If the agent is playing Pac-Man, a transition function might define the probability that ghosts on the game board will move up, down, left, or right. In standard reinforcement learning, the AI would be trained and tested using the same transition function.

The researchers added noise to the transition function with this conventional approach and, as expected, it hurt the agent’s Pac-Man performance.

But when the researchers trained the agent with a noise-free Pac-Man game, then tested it in an environment where they injected noise into the transition function, it performed better than an agent trained on the noisy game.

“The rule of thumb is that you should try to capture the deployment condition’s transition function as well as you can during training to get the most bang for your buck. We really tested this insight to death because we couldn’t believe it ourselves,” Madan says.

Injecting varying amounts of noise into the transition function let the researchers test many environments, but it didn’t create realistic games. The more noise they injected into Pac-Man, the more likely ghosts would randomly teleport to different squares.

To see if the indoor training effect occurred in normal Pac-Man games, they adjusted underlying probabilities so ghosts moved normally but were more likely to move up and down, rather than left and right. AI agents trained in noise-free environments still performed better in these realistic games.

“It was not only due to the way we added noise to create ad hoc environments. This seems to be a property of the reinforcement learning problem. And that was even more surprising to see,” Bono says.

AI Learning Patterns: A Surprising Discovery

When the researchers dug deeper in search of an explanation, they saw some correlations in how the AI agents explore the training space.

When both AI agents explore mostly the same areas, the agent trained in the non-noisy environment performs better, perhaps because it is easier for the agent to learn the rules of the game without the interference of noise.

If their exploration patterns are different, then the agent trained in the noisy environment tends to perform better. This might occur because the agent needs to understand patterns it can’t learn in the noise-free environment.

“If I only learn to play tennis with my forehand in the non-noisy environment, but then in the noisy one I have to also play with my backhand, I won’t play as well in the non-noisy environment,” Bono explains.

Future Implications: Leveraging the Indoor Training Effect

Looking ahead, the researchers plan to investigate whether the indoor training effect applies to more complex reinforcement learning environments and other AI techniques, such as computer vision and natural language processing. They also aim to develop training environments that take advantage of this effect, potentially improving AI performance in unpredictable real-world settings.

Reference: “The Indoor-Training Effect: unexpected gains from distribution shifts in the transition function” by Serena Bono, Spandan Madan, Ishaan Grover, Mao Yasueda, Cynthia Breazeal, Hanspeter Pfister and Gabriel Kreiman, 8 January 2025, Computer Science > Machine Learning.
arXiv:2401.15856