Tag • #chatgpt

chevron_right

Meta’s “massively multilingual” AI model translates up to 100 languages, speech or text

news.movim.eu / ArsTechnica · Tuesday, 22 August, 2023 - 19:57 · 1 minute

An illustration of a person holding up a megaphone to a head silhouette that says

Enlarge (credit: Getty Images)

On Tuesday, Meta announced SeamlessM4T , a multimodal AI model for speech and text translations. As a neural network that can process both text and audio, it can perform text-to-speech, speech-to-text, speech-to-speech, and text-to-text translations for "up to 100 languages," according to Meta. Its goal is to help people who speak different languages communicate with each other more effectively.

Continuing Meta's relatively open approach to AI, Meta is releasing SeamlessM4T under a research license (CC BY-NC 4.0) that allows developers to build on the work. They're also releasing SeamlessAlign, which Meta calls "the biggest open multimodal translation dataset to date, totaling 270,000 hours of mined speech and text alignments." That will likely kick-start the training of future translation AI models from other researchers.

Among the features of SeamlessM4T touted on Meta's promotional blog, the company says that the model can perform speech recognition (you give it audio of speech, and it converts it to text), speech-to-text translation (it translates spoken audio to a different language in text), speech-to-speech translation (you feed it speech audio, and it outputs translated speech audio), text-to-text translation (similar to how Google Translate functions), and text-to-speech translation (feed it text and it will translate and speak it out in another language). Each of the text translation functions supports nearly 100 languages, and the speech output functions support about 36 output languages.

Read 6 remaining paragraphs | Comments

chevron_right

Crypto botnet on X is powered by ChatGPT

news.movim.eu / ArsTechnica · Tuesday, 22 August, 2023 - 13:21

An illustration of a robot and word balloons

Enlarge (credit: sakchai vongsasiripat/Getty Image)

ChatGPT may well revolutionize web search , streamline office chores , and remake education , but the smooth-talking chatbot has also found work as a social media crypto huckster.

Researchers at Indiana University Bloomington discovered a botnet powered by ChatGPT operating on X—the social network formerly known as Twitter—in May of this year.

The botnet, which the researchers dub Fox8 because of its connection to cryptocurrency websites bearing some variation of the same name, consisted of 1,140 accounts. Many of them seemed to use ChatGPT to craft social media posts and to reply to each other’s posts. The auto-generated content was apparently designed to lure unsuspecting humans into clicking links through to the crypto-hyping sites.

Read 15 remaining paragraphs | Comments

chevron_right

OpenAI discontinues its AI writing detector due to “low rate of accuracy”

news.movim.eu / ArsTechnica · Wednesday, 26 July, 2023 - 19:51 · 1 minute

Enlarge / An AI-generated image of a slot machine in a desert. (credit: Midjourney)

On Thursday, OpenAI quietly pulled its AI Classifier, an experimental tool designed to detect AI-written text. The decommissioning, first noticed by Decrypt, occurred with no major fanfare and was announced through a small note added to OpenAI's official AI Classifier webpage :

As of July 20, 2023, the AI classifier is no longer available due to its low rate of accuracy. We are working to incorporate feedback and are currently researching more effective provenance techniques for text, and have made a commitment to develop and deploy mechanisms that enable users to understand if audio or visual content is AI-generated.

Released on January 31 amid clamor from educators about students potentially using ChatGPT to write essays and schoolwork, OpenAI's AI Classifier always felt like a performative Band-Aid on a deep wound. From the beginning, OpenAI admitted that its AI Classifier was not "fully reliable," correctly identifying only 26 percent of AI-written text as "likely AI-written" and incorrectly labeling human-written works 9 percent of the time.

As we've pointed out on Ars, AI writing detectors such as OpenAI's AI Classifier, Turnitin, and GPTZero simply don't work with enough accuracy to rely on them for trustworthy results. The methodology behind how they work is speculative and unproven, and the tools are currently routinely used to falsely accuse students of cheating.

Read 5 remaining paragraphs | Comments

chevron_right

Pocket assistant: ChatGPT comes to Android

news.movim.eu / ArsTechnica · Wednesday, 26 July, 2023 - 15:08

Enlarge (credit: OpenAI)

On Tuesday, OpenAI released an official ChatGPT app for Android, now available in the Google Play Store in four countries: the US, India, Bangladesh, and Brazil, with more coming soon. As a client for OpenAI's language model family, the GPT-3.5 and GPT-4 models run on the cloud and provide results to your Android device. It also integrates OpenAI's Whisper model for speech recognition.

ChatGPT, launched in November, is a conversational AI language model interface. As an AI assistant, it can help with summarization, text composition, and analysis. OpenAI bills its use cases as a way to seek "instant answers," "tailored advice," "creative inspiration," "professional input," and "learning opportunities."

A series of images showing OpenAI's proposed uses of the ChatGPT Android app, taken from the Google Play Store. (credit: OpenAI)

However, as we've noted in the past , ChatGPT is occasionally prone to confabulation (that is, making things up)—especially the GPT-3.5 model—so it's not entirely trustworthy as a factual reference. It can come in handy as a way to analyze data you provide yourself, though, so long as you're familiar with the subject matter and can validate the results.

Read 3 remaining paragraphs | Comments

chevron_right

Major AI companies form group to research, keep control of AI

news.movim.eu / ArsTechnica · Wednesday, 26 July, 2023 - 13:16

Enlarge / The four companies say they launched the Frontier Model Forum to ensure "the safe and responsible development of frontier AI models." (credit: Financial Times)

Four of the world’s most advanced artificial intelligence companies have formed a group to research increasingly powerful AI and establish best practices for controlling it, as public anxiety and regulatory scrutiny over the impact of the technology increases.

On Wednesday, Anthropic, Google, Microsoft and OpenAI launched the Frontier Model Forum, with the aim of “ensuring the safe and responsible development of frontier AI models.”

In recent months, the US companies have rolled out increasingly powerful AI tools that produce original content in image, text or video form by drawing on a bank of existing material. The developments have raised concerns about copyright infringement, privacy breaches and that AI could ultimately replace humans in a range of jobs.

Read 11 remaining paragraphs | Comments

chevron_right

New ChatGPT feature remembers “custom instructions” between sessions

news.movim.eu / ArsTechnica · Monday, 24 July, 2023 - 20:14

Enlarge / An AI-generated image of a chatbot in front of library shelves. (credit: Benj Edwards / Stable Diffusion)

On Thursday, OpenAI announced a new beta feature for ChatGPT that allows users to provide custom instructions that the chatbot will consider with every submission. The goal is to prevent users from having to repeat common instructions between chat sessions.

The feature is currently available in beta for ChatGPT Plus subscription members, but OpenAI says it will extend availability to all users over the coming weeks. As of this writing, the feature is not yet available in the UK and EU.

The Custom Instructions feature functions by letting users set their individual preferences or requirements that the AI model will then consider when generating responses. Instead of starting each conversation anew, ChatGPT can now be instructed to remember specific user preferences across multiple interactions.

Read 9 remaining paragraphs | Comments

chevron_right

Musk announces new AI company that seeks to “understand the universe”

news.movim.eu / ArsTechnica · Wednesday, 12 July, 2023 - 19:20 · 1 minute

Enlarge / Elon Musk speaks via video link at the opening ceremony of the World Artificial Intelligence Conference (WAIC) in Shanghai on July 6, 2023. (credit: REBECCA BAILEY/AFP via Getty Images)

On Wednesday, Elon Musk formally announced the formation of xAI , a company aimed at understanding "the true nature of the universe" that will draw from a heavy bench of industry veterans to take on OpenAI's popular chatbot ChatGPT . Musk has criticized OpenAI publicly in the past.

The inception of xAI dates back to March when Musk and Jared Birchall, the operator of Musk's family office, incorporated a business named "X.AI" in Nevada, according to Bloomberg . In keeping with the unconventional name, the company's website domain name is "x.ai". In April, reports emerged of Musk's ongoing dialogues with investors from Tesla and SpaceX regarding the potential funding of a new AI startup. Around that time, we reported that Musk purchased thousands of GPUs from Nvidia for AI use.

The new company announced on its website that Musk will lead its core team himself, and it will rely on the talents of an array of AI industry veterans of tech giants such as Google's DeepMind, Microsoft, and Tesla, as well as academic institutions like the University of Toronto.

Read 5 remaining paragraphs | Comments

chevron_right

New ChatGPT rival, Claude 2, launches for open beta testing

news.movim.eu / ArsTechnica · Tuesday, 11 July, 2023 - 19:43

Enlarge (credit: Anthropic)

On Tuesday, Anthropic introduced Claude 2, a large language model (LLM) similar to ChatGPT that can craft code, analyze text, and write compositions. Unlike the original version of Claude launched in March, users can try Claude 2 for free on a new beta website . It's also available as a commercial API for developers.

Anthropic says that Claude is designed to simulate a conversation with a helpful colleague or personal assistant and that the new version addresses feedback from users of the previous model: "We have heard from our users that Claude is easy to converse with, clearly explains its thinking, is less likely to produce harmful outputs, and has a longer memory."

Anthropic claims that Claude 2 demonstrates advancements in three key areas: coding, math, and reasoning. "Our latest model scored 76.5% on the multiple choice section of the Bar exam, up from 73.0% with Claude 1.3," they write. "When compared to college students applying to graduate school, Claude 2 scores above the 90th percentile on the GRE reading and writing exams, and similarly to the median applicant on quantitative reasoning."

Read 6 remaining paragraphs | Comments

chevron_right

Sarah Silverman sues OpenAI, Meta for being “industrial-strength plagiarists”

news.movim.eu / ArsTechnica · Monday, 10 July, 2023 - 19:42 · 6 minutes

Enlarge / Comedian and author Sarah Silverman. (credit: Jason Kempin / Staff | Getty Images North America )

On Friday, the Joseph Saveri Law Firm filed US federal class-action lawsuits on behalf of Sarah Silverman and other authors against OpenAI and Meta, accusing the companies of illegally using copyrighted material to train AI language models such as ChatGPT and LLaMA .

Other authors represented include Christopher Golden and Richard Kadrey, and an earlier class-action lawsuit filed by the same firm on June 28 included authors Paul Tremblay and Mona Awad. Each lawsuit alleges violations of the Digital Millennium Copyright Act, unfair competition laws, and negligence.

The Joseph Saveri Law Firm is no stranger to press-friendly legal action against generative AI. In November 2022, the same firm filed suit over GitHub Copilot for alleged copyright violations. In January 2023, the same legal group repeated that formula with a class-action lawsuit against Stability AI, Midjourney, and DeviantArt over AI image generators. The GitHub lawsuit is currently on path to trial, according to lawyer Matthew Butterick. Procedural maneuvering in the Stable Diffusion lawsuit is still underway with no clear outcome yet.

In a press release last month, the law firm described ChatGPT and LLaMA as "industrial-strength plagiarists that violate the rights of book authors." Authors and publishers have been reaching out to the law firm since March 2023, lawyers Joseph Saveri and Butterick wrote, because authors "are concerned" about these AI tools' "uncanny ability to generate text similar to that found in copyrighted textual materials, including thousands of books."

The most recent lawsuits from Silverman, Golden, and Kadrey were filed in a US district court in San Francisco. Authors have demanded jury trials in each case and are seeking permanent injunctive relief that could force Meta and OpenAI to make changes to their AI tools.

Meta declined Ars' request to comment. OpenAI did not immediately respond to Ars' request to comment.

A spokesperson for the Saveri Law Firm sent Ars a statement, saying, "If this alleged behavior is allowed to continue, these models will eventually replace the authors whose stolen works power these AI products with whom they are competing. This novel suit represents a larger fight for preserving ownership rights for all artists and other creators."

Accused of using “flagrantly illegal” data sets

Neither Meta nor OpenAI has fully disclosed what's in the data sets used to train LLaMA and ChatGPT. But lawyers for authors suing say they have deduced the likely data sources from clues in statements and papers released by the companies or related researchers. Authors have accused both OpenAI and Meta of using training data sets that contained copyrighted materials distributed without authors' or publishers' consent, including by downloading works from some of the largest e-book pirate sites.

In the OpenAI lawsuit , authors alleged that based on OpenAI disclosures, ChatGPT appeared to have been trained on 294,000 books allegedly downloaded from "notorious 'shadow library' websites like Library Genesis (aka LibGen), Z-Library (aka Bok), Sci-Hub, and Bibliotik." Meta has disclosed that LLaMA was trained on part of a data set called ThePile, which the other lawsuit alleged includes “all of Bibliotik,” and amounts to 196,640 books.

On top of allegedly accessing copyrighted works through shadow libraries, OpenAI is also accused of using a "controversial data set" called BookCorpus.

BookCorpus, the OpenAI lawsuit said, "was assembled in 2015 by a team of AI researchers for the purpose of training language models." This research team allegedly "copied the books from a website called Smashwords that hosts self-published novels, that are available to readers at no cost." These novels, however, are still under copyright and allegedly "were copied into the BookCorpus data set without consent, credit, or compensation to the authors."

Ars could not immediately reach the BookCorpus researchers or Smashwords for comment. [ Update: Dan Wood, COO of Draft2Digital—which acquired Smashwords in March 2022—told Ars that the Smashwords "store site lists close to 800,000 titles for sale," with "about 100,000" currently priced at free.

"Typically, the free book will be the first of a series," Wood said. "Some authors will keep these titles free indefinitely, and some will run limited promotions where they offer the book for free. From what we understand of the BookCorpus data set, approximately 7,185 unique titles that were priced free at the time were scraped without the knowledge or permission of Smashwords or its authors." It wasn't until March 2023 when Draft2Digital "first became aware of the scraped books being used for commercial purposes and redistributed, which is a clear violation of Smashwords’ terms of service," Wood said.

"Every author, whether they have an internationally recognizable name or have just published their first book, deserve to have their copyright protected," Wood told Ars. "They also should have the confidence that the publishing service they entrust their work with will protect it. To that end, we are working diligently with our lawyers to fully understand the issues—including who took the data and where it was distributed—and to devise a strategy to ensure our authors’ rights are enforced. We are watching the current cases being brought against OpenAI and Meta very closely."]

“Numerous questions of law” raised

Authors claim that by utilizing "flagrantly illegal" data sets, OpenAI allegedly infringed copyrights of Silverman's book The Bedwetter , Golden’s Ararat , and Kadrey’s Sandman Slime . And Meta allegedly infringed copyrights of the same three books, as well as "several" other titles from Golden and Kadrey.

It seems obvious to authors that their books were used to train ChatGPT and LLaMA because the tools "can accurately summarize a certain copyrighted book." Although sometimes ChatGPT gets some details wrong, its summaries are otherwise very accurate, and this suggests that "ChatGPT retains knowledge of particular works in the training data set and is able to output similar textual content," the authors alleged.

It also seems obvious to authors that OpenAI and Meta knew that their models were "ingesting" copyrighted materials because all the copyright-management information (CMI) appears to have been "intentionally removed," authors alleged. That means that ChatGPT never responds to a request for a summary by citing who has the copyright, allowing OpenAI to "unfairly profit from and take credit for developing a commercial product based on unattributed reproductions of those stolen writing and ideas."

"OpenAI knew or had reasonable grounds to know that this removal of CMI would facilitate copyright infringement by concealing the fact that every output from the OpenAI Language Models is an infringing derivative work, synthesized entirely from expressive information found in the training data," the OpenAI complaint said.

Among "numerous questions of law" raised in these complaints was a particularly prickly question: Is ChatGPT or LLaMA itself an infringing derivative work based on perhaps thousands of authors' works?

Authors are already upset that companies seem to be unfairly profiting off their copyrighted materials, and the Meta lawsuit noted that any unfair profits currently gained could further balloon, as "Meta plans to make the next version of LLaMA commercially available." In addition to other damages, the authors are asking for restitution of alleged profits lost.

"Much of the material in the training datasets used by OpenAI and Meta comes from copyrighted works—including books written by plaintiffs—that were copied by OpenAI and Meta without consent, without credit, and without compensation," Saveri and Butterick wrote in their press release.

Read on Ars Technica | Comments