Home AI - Artificial Intelligence Gamers Turn to Super Mario for AI Benchmarking

Gamers Turn to Super Mario for AI Benchmarking

by admin

Did you think Pokémon was the ultimate challenge for AI? A group of researchers claims that Super Mario Bros. poses an even greater test.

On Friday, the Hao AI Lab, a research organization affiliated with the University of California San Diego, put AI to the test in live gameplay of Super Mario Bros. Anthropic’s Claude 3.7 led the performance, followed closely by Claude 3.5. In contrast, Google’s Gemini 1.5 Pro and OpenAI’s GPT-4o faced significant challenges.

It’s important to note that this wasn’t the exact version of Super Mario Bros. that debuted in 1985. The game was run in an emulator and incorporated within a framework called GamingAgent, which granted the AIs control over Mario.

Super Mario Bros. AI benchmark
Image Credits:Hao Lab

GamingAgent, developed internally by Hao, provided AI with fundamental instructions, such as “If an obstacle or enemy is present, move/jump left to avoid,” alongside in-game screenshots. The AI generated controls for Mario in the form of Python code.

Despite these supports, Hao stated that the game compelled each model to “learn” intricate tactics and devise gameplay strategies. Interestingly, the lab observed that reasoning models like OpenAI’s o1, which systematically address problems to find solutions, performed worse than non-reasoning models, even though they usually excel in other benchmarks.

Researchers noted that one key issue for reasoning models is their slower decision-making process—typically taking seconds to choose an action. In Super Mario Bros., timing is critical; a mere second can be the difference between successfully jumping over an obstacle or falling to a game-ending demise.

For decades, games have served as benchmarks for AI. However, some experts have raised doubts about the validity of correlating AI performance in gaming with advancements in technology. Unlike real-world scenarios, games are often abstract and simplified, providing an essentially limitless dataset for AI training.

Recent impressive gaming assessments indicate what Andrej Karpathy, a research scientist and co-founder at OpenAI, referred to as an “evaluation crisis.”

“I’m unsure what [AI] metrics we should focus on at this point,” he expressed in a post on X. “In summary, my feeling is I’m uncertain about the current capabilities of these models.”

At least we can enjoy watching AI play Mario.

Compiled by Techarena.au.
Fanpage: TechArena.au
Watch more about AI – Artificial Intelligence

You may also like

About Us

Get the latest tech news, reviews, and analysis on AI, crypto, security, startups, apps, fintech, gadgets, hardware, venture capital, and more.

Latest Articles