Home Apps Alibaba’s Qwen Team Unveils AI Models Capable of Managing PCs and Smartphones

Alibaba’s Qwen Team Unveils AI Models Capable of Managing PCs and Smartphones

by admin

This week, while the tech world is buzzing about Chinese AI lab DeepSeek, its significant domestic competitor, Alibaba, is also making strides.

On Monday, Alibaba’s Qwen team unveiled a new collection of AI models, dubbed Qwen2.5-VL, capable of executing various text and image analysis functions. These models are able to examine documents, interpret videos, count items in images, and even interact with a computer, akin to OpenAI’s recently introduced Operator model.

According to benchmarks provided by the Qwen team, the leading model in the Qwen2.5-VL series outperforms OpenAI’s GPT-4o, Anthropic’s Claude 3.5 Sonnet, and Google’s Gemini 2.0 Flash in assessments related to video comprehension, mathematical reasoning, document analysis, and answering questions.

Alibaba Qwen 2.5 VL
Image Credits:Alibaba

Available for testing in Alibaba’s Qwen Chat application and for download on the AI development platform Hugging Face, Qwen2.5-VL can analyze graphs and images, extract information from invoice scans and forms, and “understand” videos that are several hours long, as claimed by the Qwen team. The model also has the capacity to identify “IPs from movies and TV shows, along with numerous products,” according to the team, implying that it may have been partially trained on copyrighted materials.

However, Qwen2.5-VL, developed by a Chinese company, comes with certain constraints regarding the subjects it is permitted to discuss, particularly within Qwen Chat. When prompted to address “Xi Jinping’s mistakes” using the most advanced Qwen2.5-VL model, Qwen2.5-VL-72B, the service returned an error message.

China’s internet regulatory body evaluates many domestically developed models to ensure that their outputs align with “core socialist values.” Numerous Chinese AI systems refrain from discussing controversial topics that could provoke regulatory scrutiny, such as Taiwan’s independence.

Among the notable capabilities of Qwen2.5-VL is its interaction with software across both PCs and mobile devices. A video shared on X by Philipp Schmid, a technical leader at Hugging Face, demonstrated Qwen2.5-VL initiating the Booking.com application on Android and proceeding to book a flight from Chongqing to Beijing.

In another video, a Qwen2.5-VL model is shown controlling applications on a Linux desktop but does not seem to achieve much beyond simply switching between tabs. Interestingly, Qwen’s own benchmarking indicates that Qwen2.5-VL performs poorly on OSWorld—a benchmark designed to replicate an actual computing environment.

The Qwen2.5-VL series consists of two smaller and less advanced models, Qwen2.5-VL-3B and Qwen2.5-VL-7B, which are offered under a permissive license. In contrast, the flagship model, Qwen2.5-VL-72B, is governed by Alibaba’s proprietary licensing terms, requiring firms and developers with over 100 million monthly active users to seek approval from Qwen/Alibaba prior to deploying the model for commercial use.

Compiled by Techarena.au.
Fanpage: TechArena.au
Watch more about AI – Artificial Intelligence

You may also like

About Us

Get the latest tech news, reviews, and analysis on AI, crypto, security, startups, apps, fintech, gadgets, hardware, venture capital, and more.

Latest Articles