AI firms resist transparency about their learning resources
AI Developers to Reveal Training Data for AI in Future as Per EU's "AI Act", but Many Companies Aren't Keen
In the future, AI developers will be forced to disclose the data they use for training their Artificial Intelligence, as per the EU's "AI Act". The EU Commission aims to safeguard artist's rights, but many companies are unwilling to comply.
Matthieu Riouf, CEO of Photoroom, a software company offering AI-supported image editing, likens the secrecy around AI data to a chef's secret recipe. "It's like cooking," he says. "There's a part of the recipe that the best chefs don't reveal: the 'je ne sais quoi', which makes it different."
The European "AI Act" mandates AI developers to make their training data accessible for external verification. The law also requires developers to provide a detailed summary of the data used for training. However, the specifics of these reports remain unclear.
The European AI office plans to release guidelines for these reports in early 2025 following discussions with interested parties. Companies are resisting this move, claiming competitors could gain an advantage. The EU Commission acknowledges the need for AI developers to maintain business secrecy, but this law also allows content providers to take legal action against companies using their work without permission.
Relying on Gargantuan Databases
Since the launch of ChatGPT about a year and a half ago, Generative AI has made headlines. These programs can generate text, images, or audio files based on minimal instructions. They rely on vast databases, which are likely to contain content from the internet.
AI developers have faced lawsuits from authors, musicians, or filmmakers for copyright infringement, as their work was used without permission for AI training. Concerns have also been raised about Meta using content and personal data from Facebook and Instagram for this purpose.
OpenAI recently withdrew the speech output for the latest ChatGPT variant after actress Scarlett Johansson complained that the "Sky" voice sounded too similar to hers. She had previously declined an offer from the company to lend her voice to "Sky".
OpenAI also faced criticism for its CEO Mira Murati's refusal to confirm in a newspaper interview whether the new video-AI "Sora" was trained using YouTube videos. YouTube claims this would violate their business terms.
Innovation Before Regulation
To prevent further copyright lawsuits and ease the growing political pressure, several technology companies have entered into licensing agreements with publishers, music labels, online platforms, and TV broadcasters in recent months. However, these self-commitments were not enough, according to EU parliamentarian Dragos Tudorache, who played a significant role in the "AI Act".
Transparency reports serve as mandatory control instruments, says Tudorache. "They must be detailed enough for Scarlett Johansson, Beyonce, or whoever to know if their work, their songs, their voice, their art, or their science was used in the training of the algorithm."
The French government warns against excessive regulation, fearing that Europe might lose touch with future-proof AI technology. "Europe must finally understand that one must first create innovations before regulating," said French Finance Minister Bruno Le Maire at the Viva Technology Conference in May in Paris. "Otherwise, there's a risk of regulating technologies that we don't master, or regulating them poorly because we don't master them."
Read also:
The EU's "AI Act" not only requires AI developers to disclose their training data but also encourages companies like OpenAI to obtain licensing agreements with content creators to avoid copyright infringement issues, similar to the one faced by ChatGPT with Scarlett Johansson.
Amidst the EU's push for transparency in AI, the French Finance Minister Bruno Le Maire emphasizes the importance of innovation before regulation, fearing that overregulation could hinder Europe's progress in future-proof AI technology.