Inception, a newly established enterprise in Palo Alto, founded by Stanford professor Stefano Ermon, asserts that it has created an innovative artificial intelligence model utilizing “diffusion” technology. This model is referred to as a diffusion-based large language model, or “DLM” for short.
Current generative AI models garnering significant attention can be categorized into two main types: Large Language Models (LLMs) and diffusion models. LLMs are built on the transformer architecture and are primarily used for generating text. In contrast, diffusion models are employed to generate images, videos, and audio, powering systems such as Midjourney and OpenAI’s Sora.
According to Inception, their model combines the functionality of traditional LLMs—including code generation and question-answering—while significantly enhancing speed and lowering computational expenses.
Ermon shared with TechCrunch that he has long been exploring the application of diffusion models to textual data in his Stanford lab. His research was predicated on the premise that conventional LLMs operate at slower speeds than diffusion technology.
With LLMs, he explained, “you cannot produce the second word until the first one is generated, and similarly, the third word cannot be produced until the first two are available.”
Ermon sought a method to harness a diffusion framework for text, as opposed to the sequential functioning of LLMs; diffusion models begin with an approximate representation of the data they are generating (for instance, an image) and subsequently refine the output in its entirety.
He theorized that generating and adjusting substantial chunks of text in parallel could be feasible with diffusion models. After years of experimentation, Ermon and one of his students made a significant breakthrough, which they discussed in a research paper published the previous year.
Recognizing the transformative potential of this advancement, Ermon established Inception last summer and brought on two former students, UCLA professor Aditya Grover and Cornell professor Volodymyr Kuleshov, as co-leaders of the company.
While Ermon chose not to disclose details about Inception’s funding, sources indicate that the Mayfield Fund has made an investment.
Inception has already secured several clients, including unnamed Fortune 100 companies, by addressing their urgent need for reduced latency in AI systems and increased processing speeds, according to Ermon.
“Our models are able to utilize GPUs far more effectively,” Ermon stated, referring to the widely used computer chips in model production. “I believe this is a significant development, as it could revolutionize how language models are constructed.”
Inception offers an API along with options for on-premises and edge device deployments, including support for model fine-tuning and a variety of pre-built DLMs for distinct applications. The company claims that its DLMs can operate up to 10 times faster than conventional LLMs while incurring a tenth of the cost.
A spokesperson for the company remarked to TechCrunch, “Our ‘small’ coding model rivals [OpenAI’s] GPT-4o mini while exceeding it in speed by more than tenfold. Our ‘mini’ model outperforms small open-source alternatives like [Meta’s] Llama 3.1 8B and can achieve over 1,000 tokens per second.”
In industry terminology, “tokens” refer to units of raw data. Achieving a rate of 1,000 tokens per second is indeed impressive, provided that Inception’s claims are validated.
Compiled by Techarena.au.
Fanpage: TechArena.au
Watch more about AI – Artificial Intelligence


