Home AI - Artificial Intelligence Apple Claims Its Approach to Training Its AI Models Was ‘Responsible’

Apple Claims Its Approach to Training Its AI Models Was ‘Responsible’

by admin

Apple has released a research document outlining the architecture behind Apple Intelligence, which is set to introduce a series of generative AI capabilities to its iOS, macOS, and iPadOS systems in the forthcoming months.

The document refutes claims, backed by a report, suggesting Apple adopted morally dubious methods in training its models. The company emphasizes its commitment to user privacy, stating that its training materials were sourced from publicly accessible content and licensed data without utilizing personal user information.

“[Our] pre-training dataset comprises data licensed from publishers, curated datasets that are publicly available or open source, and information gathered by our Applebot web crawler,” the company details. “Highlighting our dedication to privacy, we assure no Apple user’s private data was involved in our data mix.”

Proof News highlighted in July that Apple utilized a collection named The Pile, which included subtitles from numerous YouTube videos, for the training of models aimed at local device operations. This initiative unknowingly incorporated content from YouTube creators without their consent. Apple later clarified its position, stating it had no plans to employ those models for any AI features within its products.

This research paper, unveiled initially at WWDC 2024, focuses on Apple Foundation Models (AFM) and clarifies that the data used for training these models was ethically sourced, according to Apple’s criteria.

The information for training AFM models included not just web data that is publicly accessible but also licensed content from various publishers, not publicly named. As per The New York Times, Apple negotiated with publishers such as NBC, Condé Nast, and IAC for rights worth over $50 million to use their news archives for training purposes. The training also leveraged open source code from GitHub, featuring languages such as Swift, Python, C, Objective-C, C++, JavaScript, Java, and Go.

Utilizing code for training without explicit permission, even if the code is open source, has been a contentious issue. Developers have voiced concerns over some codebases’ licensing that might not explicitly permit AI training. However, Apple counters by stating it exclusively chose code under lenient licenses like MIT, ISC, or Apache.

To enhance the mathematical capabilities of the AFM models, Apple incorporated math problems and solutions from various online sources, including forums and educational websites, ensuring the data sets used were of high quality and legally permissible for model training, devoid of any sensitive content.

The compilation of training data for AFM models amasses to roughly 6.3 trillion tokens. For context, this is significantly less than the 15 trillion tokens Meta utilized for their top-tier text-generating model, Llama 3.1 405B.

In addition to its primary data sources, Apple supplemented the AFM models with data derived from human input and synthetic data to refine the models further and reduce the occurrence of negative outputs such as producing harmful content.

“Our models are designed to assist users with daily tasks across Apple devices, embodying the company’s foundational values and adhering to responsible AI principles throughout their development,” Apple articulates.

The paper intentionally lacks explosive details or revelations, a strategic choice to avoid competitive disadvantage and potential legal issues. It touches briefly on the ongoing debate surrounding the legality of scraping public web data under the fair use doctrine and the legal challenges that lie ahead.

Apple acknowledges that webmasters can prevent data collection by its crawlers but recognizes this doesn’t solve the problem for individual content creators who may feel powerless against data scraping on platforms not blocking Apple’s activities.

As the discussions and legal battles over generative AI training practices continue, Apple positions itself as aiming for ethical standards while navigating the complex legal landscape.

Compiled by Techarena.au.
Fanpage: TechArena.au
Watch more about AI – Artificial Intelligence

You may also like

About Us

Get the latest tech news, reviews, and analysis on AI, crypto, security, startups, apps, fintech, gadgets, hardware, venture capital, and more.

Latest Articles