Exploring TTT Models: The Potential New Frontier in Generative AI Technology

The exploration for new AI architectures is gaining momentum as the efficient yet dominant form, the transformer, faces increasing challenges.

At the core of OpenAI’s video creation tool Sora and leading text generators like Anthropic’s Claude, Google’s Gemini, and GPT-4, transformers are encountering limitations, especially in aspects related to computational efficiency.

The inherent design of transformers does not lend itself well to handling large data volumes efficiently on conventional hardware. This inefficiency is leading to significant and possibly unsustainable power consumption growth as businesses scale up their infrastructures to meet the demands of transformer-based systems.

A recent promising development in AI architecture is test-time training (TTT), innovated by a cooperative research endeavor spanning Stanford, UC San Diego, UC Berkeley, and Meta over eighteen months. This team posits that TTT models outperform transformers in data processing capacity while utilizing significantly less computational power.

Understanding the Hidden State in Transformers

At the heart of every transformer is the “hidden state,” a comprehensive databank that aids in the model’s memory function. As a transformer parses information, it updates the hidden state to encapsulate what it has processed, much like noting down terms or phrases from a book it analyses.

Describing transformers, Yu Sun, a Stanford post-doc and TTT research collaborator, likens the hidden state to the cognitive hub of the transformer, providing it the renowned ability for contextual learning.

Although integral to their powerful functionality, this reliance on an ever-expanding hidden state also poses significant limitations, requiring extensive computational efforts for even simple tasks related to the processed data.

To tackle this, Sun and his colleagues have conceptualized substituting the hidden state with another machine learning model, hinting at a profound reengineering – a model nested within another model.

Despite its complexity, the core idea simplifies the process: unlike the ever-expanding lookup table of a transformer, the internal machine learning model of TTT architectures encodes processed data into fixed-size entities known as weights. This approach grants TTT models superior performance and scalability, irrespective of the data volume they handle.

Sun envisions TTT models eventually processing vast arrays of data—ranging from text and images to audio and video—much more efficiently than current models can manage.

“Our approach enables us to discuss a book comprehensively with far less computational strain than rereading it multiple times,” explains Sun. He contrasts this with the limitations of transformer-based large video models like Sora, which are constrained by their simplistic ‘lookup table brain’ to short snippets of video. The ultimate aim is to create systems capable of processing extensive video content, akin to a human’s visual experience through life.

Questioning the Viability of TTT Models

Whether TTT models will eventually outpace transformers remains to be seen.

The transition to TTT models is not straightforward, and with only two small-scale models built for initial studies, comparing TTT to established transformer models is challenging at this stage.

Mike Cook, a senior lecturer at King’s College London who has not been involved in TTT research, expresses cautious interest. He recalls a professor’s joke about solving computer science problems by adding layers of abstraction, noting the irony in nesting neural networks within each other as reminiscent of this principle.

Nonetheless, the surge in research into alternatives to transformers underscores the dire need for breakthroughs in AI architecture.

Mistral, an AI startup, has recently unveiled Codestral Mamba, a model rooted in state space models (SSMs), signaling another promising direction away from transformers due to its improved computational efficiency and scalability. Notably, AI21 Labs and Cartesia, early pioneers in SSM technology and forerunners of Codestral Mamba, are also exploring this avenue.

Success in these ventures could further democratize generative AI, broadening its accessibility and applications, for better or worse.

Compiled by Techarena.au.
Fanpage: TechArena.au
Watch more about AI – Artificial Intelligence

Exploring TTT Models: The Potential New Frontier in Generative AI Technology

Understanding the Hidden State in Transformers

Questioning the Viability of TTT Models

About Us

Top Categories

Latest Articles

Editor's Picks

The reputation of struggling YC...

Roku Introduces Standalone App for...

Meta Launches Initial Testing of...

Meta’s Natural Gas Surge Could...

Exploring TTT Models: The Potential New Frontier in Generative AI Technology

Understanding the Hidden State in Transformers

Questioning the Viability of TTT Models

Ethereum Co-Founder Raises Concerns Over Pro-Crypto Candidates: Questions Their Motives

How a Startup Specializing in B2F Payments Attracted Investment from Max, Jack, and Sam Altman, Alongside JP Morgan

You may also like

About Us

Top Categories

Latest Articles

Editor's Picks