AI content cannibalization problem, Threads a loss leader for AI data? – Cointelegraph Magazine

ChatGPT eats cannibals

ChatGPT hype is starting to wane, with Google searches for “ChatGPT” down 40% from its peak in April, while web traffic to OpenAI’s ChatGPT website has been down almost 10% in the past month.

This is only to be expected — however GPT-4 users are also reporting the model seems considerably dumber (but faster) than it was previously.

One theory is that OpenAI has broken it up into multiple smaller models trained in specific areas that can act in tandem, but not quite at the same level.

But a more intriguing possibility may also be playing a role: AI cannibalism.

The web is now swamped with AI-generated text and images, and this synthetic data gets scraped up as data to train AIs, causing a negative feedback loop. The more AI data a model ingests, the worse the output gets for coherence and quality. It’s a bit like what happens when you make a photocopy of a photocopy, and the image gets progressively worse.

While GPT-4’s official training data ends in September 2021, it clearly knows a lot more than that, and OpenAI recently shuttered its web browsing plugin.

A new paper from scientists at Rice and Stanford University came up with a cute acronym for the issue: Model Autophagy Disorder or MAD.

“Our primary conclusion across all scenarios is that without enough fresh real data in each generation of an autophagous loop, future generative models are doomed to have their quality (precision) or diversity (recall) progressively decrease,” they said.

Essentially the models start to lose the more unique but less well-represented data, and harden up their outputs on less varied data, in an ongoing process. The good news is this means the AIs now have a reason to keep humans in the loop if we can work out a way to identify and prioritize human content for the models. That’s one of OpenAI boss Sam Altman’s plans with his eyeball-scanning blockchain project, Worldcoin.

Is Threads just a loss leader to train AI models?

Twitter clone Threads is a bit of a weird move by Mark Zuckerberg as it cannibalizes users from Instagram. The photo-sharing platform makes up to $50 billion a year but stands to make around a tenth of that from Threads, even in the unrealistic scenario that it takes 100% market share from Twitter. Big Brain Daily’s Alex Valaitis predicts it will either be shut down or reincorporated into Instagram within 12 months, and argues the real reason it was launched now “was to have more text-based content to train Meta’s AI models on.”

ChatGPT was trained on huge volumes of data from Twitter, but Elon Musk has taken various unpopular steps to prevent that from happening in the future (charging for API access, rate limiting, etc).

Zuck has form in this regard, as Meta’s image recognition AI software SEER was trained on a billion photos posted to Instagram. Users agreed to that in the privacy policy, and more than a few have noted the Threads app collects data on everything possible, from health data to religious beliefs and race. That data will inevitably be used to train AI models such as Facebook’s LLaMA (Large Language Model Meta AI).Musk, meanwhile, has just launched an OpenAI competitor called xAI that will mine Twitter’s data for its own LLM.

Various permissions required by social apps (CounterSocial)

Religious chatbots are fundamentalists

Who would have guessed that training AIs on religious texts and speaking in the voice of God would turn out to be a terrible idea? In India, Hindu chatbots masquerading as Krishna have been consistently advising users that killing people is OK if it’s your dharma, or duty.

At least five chatbots trained on the Bhagavad Gita, a 700-verse scripture, have appeared in the past few months, but the Indian government has no plans to regulate the tech, despite the ethical concerns.

“It’s miscommunication, misinformation based on religious text,” said Mumbai-based lawyer Lubna Yusuf, coauthor of the AI Book. “A text gives a lot of philosophical value to what they are trying to say, and what does a bot do? It gives you a literal answer and that’s the danger here.”

AI doomers versus AI optimists

The world’s foremost AI doomer, decision theorist Eliezer Yudkowsky, has released a TED talk warning that superintelligent AI will kill us all. He’s not sure how or why, because he believes an AGI will be so much smarter than us we won’t even understand how and why it’s killing us — like a medieval peasant trying to understand the operation of an air conditioner. It might kill us as a side effect of pursuing some other objective, or because “it doesn’t want us making other superintelligences to compete with it.”

He points out that “Nobody understands how modern AI systems do what they do. They are giant inscrutable matrices of floating point numbers.” He does not expect “marching robot armies with glowing red eyes” but believes that a “smarter and uncaring entity will figure out strategies and technologies that can kill us quickly and reliably and then kill us.” The only thing that could stop this scenario from occurring is a worldwide moratorium on the tech backed by the threat of World War III, but he doesn’t think that will happen.

In his essay “Why AI will save the world,” A16z’s Marc Andreessen argues this sort of position is unscientific: “What is the testable hypothesis? What would falsify the hypothesis? How do we know when we are getting into a danger zone? These questions go mainly unanswered apart from ‘You can’t prove it won’t happen!’”

Microsoft boss Bill Gates released an essay of his own, titled “The risks of AI are real but manageable,” arguing that from cars to the internet, “people have managed through other transformative moments and, despite a lot of turbulence, come out better off in the end.”

“It’s the most transformative innovation any of us will see in our lifetimes, and a healthy public debate will depend on everyone being knowledgeable about the technology, its benefits, and its risks. The benefits will be massive, and the best reason to believe that we can manage the risks is that we have done it before.”

Data scientist Jeremy Howard has released his own paper, arguing that any attempt to outlaw the tech or keep it confined to a few large AI models will be a disaster, comparing the fear-based response to AI to the pre-Enlightenment age when humanity tried to restrict education and power to the elite.

OpenAI’s code interpreter

GPT-4’s new code interpreter is a terrific new upgrade that allows the AI to generate code on demand and actually run it. So anything you can dream up, it can generate the code for and run. Users have been coming up with various use cases, including uploading company reports and getting the AI to generate useful charts of the key data, converting files from one format to another, creating video effects and transforming still images into video. One user uploaded an Excel file of every lighthouse location in the U.S. and got GPT-4 to create an animated map of the locations.

All killer, no filler AI news

— Research from the University of Montana found that artificial intelligence scores in the top 1% on a standardized test for creativity. The Scholastic Testing Service gave GPT-4’s responses to the test top marks in creativity, fluency (the ability to generate lots of ideas) and originality.

— Comedian Sarah Silverman and authors Christopher Golden and Richard Kadreyare suing OpenAI and Meta for copyright violations, for training their respective AI models on the trio’s books.

— Microsoft’s AI Copilot for Windows will eventually be amazing, but Windows Central found the insider preview is really just Bing Chat running via Edge browser and it can just about switch Bluetooth on.

— Anthropic’s ChatGPT competitor Claude 2 is now available free in the UK and U.S., and its context window can handle 75,000 words of content to ChatGPT’s 3,000 word maximum. That makes it fantastic for summarizing long pieces of text, and it’s not bad at writing fiction.

Video of the week

Indian satellite news channel OTV News has unveiled its AI news anchor named Lisa, who will present the news several times a day in a variety of languages, including English and Odia, for the network and its digital platforms. “The new AI anchors are digital composites created from the footage of a human host that read the news using synthesized voices,” said OTV managing director Jagi Mangat Panda.

Andrew Fenton

Based in Melbourne, Andrew Fenton is a journalist and editor covering cryptocurrency and blockchain. He has worked as a national entertainment writer for News Corp Australia, on SA Weekend as a film journalist, and at The Melbourne Weekly.

Follow the author @andrewfenton

Source link