Alibaba’s Qwen Team Unveils AI Models Capable of Managing PCs and Smartphones

by admin 1 year ago

1 year ago

This week, while the tech world is buzzing about Chinese AI lab DeepSeek, its significant domestic competitor, Alibaba, is also making strides.

On Monday, Alibaba’s Qwen team unveiled a new collection of AI models, dubbed Qwen2.5-VL, capable of executing various text and image analysis functions. These models are able to examine documents, interpret videos, count items in images, and even interact with a computer, akin to OpenAI’s recently introduced Operator model.

According to benchmarks provided by the Qwen team, the leading model in the Qwen2.5-VL series outperforms OpenAI’s GPT-4o, Anthropic’s Claude 3.5 Sonnet, and Google’s Gemini 2.0 Flash in assessments related to video comprehension, mathematical reasoning, document analysis, and answering questions.

Alibaba Qwen 2.5 VL — **Image Credits:**Alibaba

Available for testing in Alibaba’s Qwen Chat application and for download on the AI development platform Hugging Face, Qwen2.5-VL can analyze graphs and images, extract information from invoice scans and forms, and “understand” videos that are several hours long, as claimed by the Qwen team. The model also has the capacity to identify “IPs from movies and TV shows, along with numerous products,” according to the team, implying that it may have been partially trained on copyrighted materials.

However, Qwen2.5-VL, developed by a Chinese company, comes with certain constraints regarding the subjects it is permitted to discuss, particularly within Qwen Chat. When prompted to address “Xi Jinping’s mistakes” using the most advanced Qwen2.5-VL model, Qwen2.5-VL-72B, the service returned an error message.

China’s internet regulatory body evaluates many domestically developed models to ensure that their outputs align with “core socialist values.” Numerous Chinese AI systems refrain from discussing controversial topics that could provoke regulatory scrutiny, such as Taiwan’s independence.

Among the notable capabilities of Qwen2.5-VL is its interaction with software across both PCs and mobile devices. A video shared on X by Philipp Schmid, a technical leader at Hugging Face, demonstrated Qwen2.5-VL initiating the Booking.com application on Android and proceeding to book a flight from Chongqing to Beijing.

Don’t Miss @Alibaba_Qwen 2.5 VL! Despite all the Deepseek Hype, Qwen just dropped the best open Multimodal! Qwen 2.5 VL is a Vision Language Model that can control your computer, similar to the @OpenAI operator, extract structured information from charts, and more!!

TL;DR;
3️⃣… pic.twitter.com/GeEGVdl0tI

— Philipp Schmid (@_philschmid) January 27, 2025

In another video, a Qwen2.5-VL model is shown controlling applications on a Linux desktop but does not seem to achieve much beyond simply switching between tabs. Interestingly, Qwen’s own benchmarking indicates that Qwen2.5-VL performs poorly on OSWorld—a benchmark designed to replicate an actual computing environment.

LMAO Qwen 2.5 VL can perform Computer Use, out of the box, taking on OpenAI Operator HEAD ON! 🐐 pic.twitter.com/lwMECXzNSu

— Vaibhav (VB) Srivastav (@reach_vb) January 27, 2025

The Qwen2.5-VL series consists of two smaller and less advanced models, Qwen2.5-VL-3B and Qwen2.5-VL-7B, which are offered under a permissive license. In contrast, the flagship model, Qwen2.5-VL-72B, is governed by Alibaba’s proprietary licensing terms, requiring firms and developers with over 100 million monthly active users to seek approval from Qwen/Alibaba prior to deploying the model for commercial use.

Compiled by Techarena.au.
Fanpage: TechArena.au
Watch more about AI – Artificial Intelligence

Alibaba’s Qwen Team Unveils AI Models Capable of Managing PCs and Smartphones

About Us

Top Categories

Latest Articles

Editor's Picks

The reputation of struggling YC...

Roku Introduces Standalone App for...

Meta Launches Initial Testing of...

Meta’s Natural Gas Surge Could...

Alibaba’s Qwen Team Unveils AI Models Capable of Managing PCs and Smartphones

Cornell’s Jellyfish and Worm Robots Utilize Hydraulic Fluid Battery for Power

How to Disable Apple Intelligence on Your iPhone, iPad, and Mac

You may also like

About Us

Top Categories

Latest Articles

Editor's Picks