Home AI - Artificial Intelligence Anthropic Utilizes Pokémon as a Benchmark for Its Latest AI Model

Anthropic Utilizes Pokémon as a Benchmark for Its Latest AI Model

by admin

Anthropic has employed Pokémon as a benchmark for its latest AI model. Yes, you read that correctly.

In a blog post released on Monday, Anthropic revealed that it evaluated its newest model, Claude 3.7 Sonnet, using the classic Game Boy game Pokémon Red. The model was enhanced with basic memory capabilities, pixel input for screen interactions, and function calls to press buttons, enabling it to continuously navigate and play Pokémon.

One notable aspect of Claude 3.7 Sonnet is its capacity for “extended thinking.” Similar to OpenAI’s o3-mini and DeepSeek’s R1, this model can tackle complex challenges by utilizing more computational resources and taking additional time.

This new ability proved advantageous in Pokémon Red.

In comparison to its predecessor, Claude 3.0 Sonnet, which struggled to progress beyond Pallet Town—the starting point of the game—Claude 3.7 Sonnet was able to successfully defeat three Pokémon gym leaders and earn their badges.

Anthropic Pokemon Red
Photo Credits:Anthropic

At this point, it’s unclear just how much computational power Claude 3.7 Sonnet required to achieve these milestones and the duration of each endeavor. Anthropic only mentioned that the model executed 35,000 actions to confront the final gym leader, Surge.

It likely won’t be long before some innovative developer uncovers the specifics.

While Pokémon Red serves more as a playful benchmark than anything substantive, there exists a long tradition of utilizing games for AI benchmarking. In recent months, numerous new applications and platforms have emerged, assessing models’ gaming capabilities across various titles, from Street Fighter to Pictionary.

Compiled by Techarena.au.
Fanpage: TechArena.au
Watch more about AI – Artificial Intelligence

You may also like

About Us

Get the latest tech news, reviews, and analysis on AI, crypto, security, startups, apps, fintech, gadgets, hardware, venture capital, and more.

Latest Articles