Jump to content

Authors Accuse OpenAI of Using Pirate Sites to Train ChatGPT - Piracy News and Crypto Updates - InviteHawk - The #1 Trusted Source for Free Tracker Invites

Buy, Sell, Trade, or Find Free Invites for top private trackers like redacted, blutopia, losslessclub, femdomcult, filelist, Chdbits, Uhdbits, empornium, iptorrents, hdbits, gazellegames, animebytes, privatehd, myspleen, torrentleech, morethantv, bibliotik, alpharatio, blady, passthepopcorn, brokenstones, pornbay, cgpeers, cinemageddon, broadcasthenet, learnbits, torrentseeds, beyondhd, cinemaz, u2.dmhy, Karagarga, PTerclub, Nyaa.si, Polishtracker, and many more.

Recommended Posts

Authors Accuse OpenAI of Using Pirate Sites to Train ChatGPT

Generative AI is a revolutionary technology that's expected to change society as we know it but, in parallel, it raises many copyright infringement concerns. This week, book authors Paul Tremblay and Mona Awad filed a lawsuit against OpenAI, accusing the company of using pirated books to train its ChatGPT models.

Generative AI models such as ChatGPT have captured the imagination of millions of people, offering a glimpse of what an AI-assisted future might look like.

The new technology also brings up novel copyright questions. Several rightsholders are worried that their work is being used to train AI without any form of compensation, for example.

How these and other copyright questions will be dealt with is not entirely clear. Governments around the world are taking different approaches, with U.S. Congress recently stating that it doesn’t plan to overreact. Meanwhile, rightsholders don’t intend to stand idly by.

Authors Sue OpenAI for Copyright Infringement

This week, authors Paul Tremblay and Mona Awad filed a class action lawsuit against OpenAI, accusing ChatGPT’s parent company of copyright infringement and violating the DMCA, among other things. According to the authors, ChatGPT was partly trained on their copyrighted works, without permission.

The proof for this claim is seemingly simple. The authors never gave OpenAI permission to use their works, yet ChatGPT can provide accurate summaries of their writings. This information must have come from somewhere.

“Indeed, when ChatGPT is prompted, ChatGPT generates summaries of Plaintiffs’ copyrighted works—something only possible if ChatGPT was trained on Plaintiffs’ copyrighted works,” the complaint reads.

Pirate Training

While these types of claims are not new, this week’s lawsuit alleges that OpenAI used pirate websites as training input. This potentially includes Z-Library, a shadow library of millions of pirated books that’s at the center of a criminal prosecution by the U.S. Department of Justice.

OpenAI hasn’t disclosed the datasets that ChatGPT is trained on, but in an older paper two databases are referenced; “Books1” and “Books2”. The first one contains roughly 63,000 titles and the latter around 294,000 titles.

These numbers are meaningless in isolation. However, the authors note that OpenAI must have used pirated resources, as legitimate databases with that many books don’t exist.

“The only ‘internet-based books corpora’ that have ever offered that much material are notorious ‘shadow library’ websites like Library Genesis (aka LibGen), Z-Library (aka Bok), Sci-Hub, and Bibliotik. The books aggregated by these websites have also been available in bulk via torrent systems.”

Based on these data points, the complaint concludes that OpenAI committed copyright infringement. As compensation, the plaintiffs demand statutory damages, which can reach $150,000 per work. Additional damages for the alleged removal of copyright management information, in violation of the DMCA, are also on the table.

AI, Piracy and Copyright

There is no direct evidence that OpenAI used pirate sites to train ChatGPT. That said, it is no secret that some AI projects have trained on pirated material in the past, as an excellent summary from Search Engine Journal highlights.

The mainstream media has picked up this issue too. The Washington Post previously reported that the “C4 data set,” which Google and Facebook used to train their AI models, included Z-Library and various other pirate sites.

“At least 27 other sites identified by the U.S. government as markets for piracy and counterfeits were present in the data set,” the article added.

The present lawsuit will be closely watched by AI enthusiasts and rightsholders. It may result in OpenAI having to disclose some of its training data, which would be interesting in its own right

Even if it transpires that ChatGPT was trained with pirated books, the court would still have to decide whether that amounted to copyright infringement. Some experts believe that this type of AI training can be considered fair use.

Fair use protects transformative uses of copyrighted works that don’t compete with the original content. According to several experts, that defense could likely apply to AI training cases.

Avoid unnecessary posts such as 'Thank you', 'Welcome', etc. Such posts will be deleted and user will be warned if it happens again. If caught spamming, the following actions are applicable -

  • First time - Warning
  • Second time - 5000 Points will be deducted
  • Third time - Ban for 7 days
  • Fourth time - Permanent Ban

If the post helped you, reward the user by reacting to the post like this -

1.jpg

The last post in this topic was made more than 14 days ago. Only post in this topic if you have something valuable to add. Irrelevant posts are not allowed and you will be warned/banned for spamming old topics.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Read this before posting -
  • Only post if you have something valuable to contribute.
  • Avoid unnecessary posts such as 'Thank you', 'Welcome', etc. Such posts will be deleted and you will be warned if it happens again.
  • If the post helped you, reward the user by reacting to the post like this -                      1.jpg
Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Customer Reviews

  • Similar Topics

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.