In a recent media event, Sam Altman, CEO of OpenAI, expressed his observation regarding the swift enhancement of AI’s “IQ” over the years.
“Broadly speaking — and this isn’t a scientifically backed assertion, just my personal sentiment — it feels like we are advancing by approximately one standard deviation of IQ each year,” Altman remarked.
Altman is not alone in utilizing IQ as a standard for evaluating AI’s advancements. Various AI enthusiasts on social media have also administered IQ tests to AI models and subsequently ranked their scores.
However, numerous experts argue that IQ is an inadequate and often misleading parameter for assessing AI systems’ capabilities.
“While it might seem enticing to apply the same assessments we use for human intelligence to gauge capabilities and progress, it’s akin to comparing apples and oranges,” stated Sandra Wachter, a researcher focused on technology and regulation at Oxford, in an interview with TechCrunch.
In his statements, Altman associated IQ with intelligence. Yet, IQ tests serve as relative rather than objective assessments of specific types of intelligence. There’s a growing consensus that while IQ effectively assesses logic and abstract reasoning, it falls short in evaluating practical intelligence — the ability to apply knowledge practically — and merely provides a momentary glimpse.
“IQ is a contested method for measuring human abilities, reflecting what scientists believe constitutes human intelligence,” Wachter highlighted. “However, one cannot utilize the same metric to define the capabilities of AI. A car may outperform humans in speed, and a submarine excels at diving, yet this does not imply that either vehicle possesses superior intelligence. You’re mistakenly equating one dimension of performance with a far richer and intricate definition of human intelligence.”
To succeed on an IQ test, the roots of which some historians link back to eugenics — the widely discredited notion that human traits can be enhanced through selective breeding — an individual must possess a robust working memory and familiarity with Western cultural references. This opens the door for bias, which is why some psychologists have referred to IQ assessments as “ideologically tainted mechanical representations” of intelligence.
The fact that an AI model performs well on an IQ test often reflects the inherent shortcomings of the test itself rather than the model’s true capabilities, according to Os Keyes, a doctoral candidate at the University of Washington researching ethical AI.
“These tests can be easily manipulated if you have virtually unlimited memory and perseverance,” said Keyes. “IQ tests offer a very narrow framework for evaluating cognition, sentience, and intelligence — something we’ve understood since before the advent of digital computing.”
AI systems likely hold an advantageous position in taking IQ tests, as they possess vast memory and expansive internalized knowledge. Typically, these models are trained on publicly available internet data, which frequently includes sample questions from IQ assessments.
“Tests often exhibit repetitive patterns — a reliable method to enhance your IQ is simply to practice IQ tests, which is precisely what these [models] have accomplished,” noted Mike Cook, a research fellow specializing in AI at King’s College London. “Unlike AI, I cannot absorb information with perfect clarity or process it without any noise or signal degradation.”
Ultimately, as Cook pointed out, IQ tests were explicitly designed with humans in mind to evaluate generalized problem-solving skills. Hence, they are unsuitable for assessing a technology that approaches problems in fundamentally different ways.
“A crow might adeptly use a tool to retrieve food from a container, yet that does not imply it could gain admission to Harvard,” Cook explained. “When I tackle a math problem, my mind is also handling the ability to read the question clearly, avoiding distractions like remembering errands I need to run afterward, or whether it’s too chilly in the room. In essence, human cognition grapples with significantly more factors while solving any problem — whether it’s an IQ test or not — and does so with less assistance [than AI].”
All of this underscores the necessity for improved testing methods for AI, as highlighted by Heidy Khlaaf, chief AI scientist at the AI Now Institute, during her dialogue with TechCrunch.
“In the history of computing, we have refrained from juxtaposing computing capabilities with human abilities precisely because computation inherently implies that systems can achieve tasks beyond what humans can do,” Khlaaf noted. “The trend of directly comparing system performances with human capabilities is both recent and highly debated, surrounded by the controversy regarding perpetually evolving benchmarks meant for evaluating AI systems.”
Compiled by Techarena.au.
Fanpage: TechArena.au
Watch more about AI – Artificial Intelligence

