Tag • #gpt-4

chevron_right

TurboTax-maker Intuit offers an AI agent that provides financial tips

news.movim.eu / ArsTechnica · Wednesday, 6 September, 2023 - 22:19 · 1 minute

Piggy bank on a laptop computer with a robotic hand.

On Wednesday, TurboTax-maker Intuit launched an AI assistant called "Intuit Assist" that can provide AI-generated financial recommendations and assist with decision-making when using the company's software, Reuters reports . Inuit Assist uses a custom large language model platform called GenOS , and it is available now to all TurboTax customers and select users of Intuit's other products, including Credit Karma, QuickBooks, and Mailchimp, with a wider rollout planned in the coming months.

"Consumers will find it easier than ever to manage and improve their financial lives," the company writes on its promotional website. "They’ll be able to get personalized recommendations throughout the year, with actions they can take to maximize their tax refund and accurately file taxes in record time with TurboTax. And they’ll be given the tools to make smart money decisions throughout their financial journey with Credit Karma."

Intuit also sees Intuit Assist as a way to level the playing field for small and medium-sized businesses, which often lack the resources of larger companies. The AI assistant will reportedly help shorten the time it takes to file taxes and provide faster access to refunds, as well as offer personalized financial advice. Intuit Chief Data Officer Ashok Srivastava told Reuters that the company's AI models "competed favorably" against other AI systems in internal accuracy tests.

Read 6 remaining paragraphs | Comments

chevron_right

Meta’s “massively multilingual” AI model translates up to 100 languages, speech or text

news.movim.eu / ArsTechnica · Tuesday, 22 August, 2023 - 19:57 · 1 minute

An illustration of a person holding up a megaphone to a head silhouette that says

Enlarge (credit: Getty Images)

On Tuesday, Meta announced SeamlessM4T , a multimodal AI model for speech and text translations. As a neural network that can process both text and audio, it can perform text-to-speech, speech-to-text, speech-to-speech, and text-to-text translations for "up to 100 languages," according to Meta. Its goal is to help people who speak different languages communicate with each other more effectively.

Continuing Meta's relatively open approach to AI, Meta is releasing SeamlessM4T under a research license (CC BY-NC 4.0) that allows developers to build on the work. They're also releasing SeamlessAlign, which Meta calls "the biggest open multimodal translation dataset to date, totaling 270,000 hours of mined speech and text alignments." That will likely kick-start the training of future translation AI models from other researchers.

Among the features of SeamlessM4T touted on Meta's promotional blog, the company says that the model can perform speech recognition (you give it audio of speech, and it converts it to text), speech-to-text translation (it translates spoken audio to a different language in text), speech-to-speech translation (you feed it speech audio, and it outputs translated speech audio), text-to-text translation (similar to how Google Translate functions), and text-to-speech translation (feed it text and it will translate and speak it out in another language). Each of the text translation functions supports nearly 100 languages, and the speech output functions support about 36 output languages.

Read 6 remaining paragraphs | Comments

chevron_right

New ChatGPT feature remembers “custom instructions” between sessions

news.movim.eu / ArsTechnica · Monday, 24 July, 2023 - 20:14

Enlarge / An AI-generated image of a chatbot in front of library shelves. (credit: Benj Edwards / Stable Diffusion)

On Thursday, OpenAI announced a new beta feature for ChatGPT that allows users to provide custom instructions that the chatbot will consider with every submission. The goal is to prevent users from having to repeat common instructions between chat sessions.

The feature is currently available in beta for ChatGPT Plus subscription members, but OpenAI says it will extend availability to all users over the coming weeks. As of this writing, the feature is not yet available in the UK and EU.

The Custom Instructions feature functions by letting users set their individual preferences or requirements that the AI model will then consider when generating responses. Instead of starting each conversation anew, ChatGPT can now be instructed to remember specific user preferences across multiple interactions.

Read 9 remaining paragraphs | Comments

chevron_right

OpenAI launches GPT-4 API for everyone

news.movim.eu / ArsTechnica · Monday, 10 July, 2023 - 19:50 · 1 minute

Enlarge (credit: OpenAI)

On Thursday, OpenAI announced that all paying API customers now have access to the GPT-4 API. It also introduced updates to chat-based models, announced a shift from the Completions API to the Chat Completions API, and outlined plans for deprecation of older models.

Generally considered its most powerful API product, the GPT-4 API first launched in March but has been under closed testing until now. As an API, developers can use a special interface to integrate OpenAI's large language model (LLM) into their own products for uses such as summarization, coding assistance, analysis, and composition. The model runs remotely on OpenAI's servers and provides output to other apps over the Internet.

OpenAI says the GPT-4 API with 8K context is accessible to existing developers who have a successful payment history, with plans to open access to new developers by the end of July. And in a move to distance itself from older GPT-3-style models, OpenAI has also opted to begin retiring "Completions API" models in favor of newer Chat Completions API models. Since its March launch , OpenAI says that its Chat Completions API models now account for 97 percent of OpenAI's API GPT usage.

Read 4 remaining paragraphs | Comments

chevron_right

Sarah Silverman sues OpenAI, Meta for being “industrial-strength plagiarists”

news.movim.eu / ArsTechnica · Monday, 10 July, 2023 - 19:42 · 6 minutes

Enlarge / Comedian and author Sarah Silverman. (credit: Jason Kempin / Staff | Getty Images North America )

On Friday, the Joseph Saveri Law Firm filed US federal class-action lawsuits on behalf of Sarah Silverman and other authors against OpenAI and Meta, accusing the companies of illegally using copyrighted material to train AI language models such as ChatGPT and LLaMA .

Other authors represented include Christopher Golden and Richard Kadrey, and an earlier class-action lawsuit filed by the same firm on June 28 included authors Paul Tremblay and Mona Awad. Each lawsuit alleges violations of the Digital Millennium Copyright Act, unfair competition laws, and negligence.

The Joseph Saveri Law Firm is no stranger to press-friendly legal action against generative AI. In November 2022, the same firm filed suit over GitHub Copilot for alleged copyright violations. In January 2023, the same legal group repeated that formula with a class-action lawsuit against Stability AI, Midjourney, and DeviantArt over AI image generators. The GitHub lawsuit is currently on path to trial, according to lawyer Matthew Butterick. Procedural maneuvering in the Stable Diffusion lawsuit is still underway with no clear outcome yet.

In a press release last month, the law firm described ChatGPT and LLaMA as "industrial-strength plagiarists that violate the rights of book authors." Authors and publishers have been reaching out to the law firm since March 2023, lawyers Joseph Saveri and Butterick wrote, because authors "are concerned" about these AI tools' "uncanny ability to generate text similar to that found in copyrighted textual materials, including thousands of books."

The most recent lawsuits from Silverman, Golden, and Kadrey were filed in a US district court in San Francisco. Authors have demanded jury trials in each case and are seeking permanent injunctive relief that could force Meta and OpenAI to make changes to their AI tools.

Meta declined Ars' request to comment. OpenAI did not immediately respond to Ars' request to comment.

A spokesperson for the Saveri Law Firm sent Ars a statement, saying, "If this alleged behavior is allowed to continue, these models will eventually replace the authors whose stolen works power these AI products with whom they are competing. This novel suit represents a larger fight for preserving ownership rights for all artists and other creators."

Accused of using “flagrantly illegal” data sets

Neither Meta nor OpenAI has fully disclosed what's in the data sets used to train LLaMA and ChatGPT. But lawyers for authors suing say they have deduced the likely data sources from clues in statements and papers released by the companies or related researchers. Authors have accused both OpenAI and Meta of using training data sets that contained copyrighted materials distributed without authors' or publishers' consent, including by downloading works from some of the largest e-book pirate sites.

In the OpenAI lawsuit , authors alleged that based on OpenAI disclosures, ChatGPT appeared to have been trained on 294,000 books allegedly downloaded from "notorious 'shadow library' websites like Library Genesis (aka LibGen), Z-Library (aka Bok), Sci-Hub, and Bibliotik." Meta has disclosed that LLaMA was trained on part of a data set called ThePile, which the other lawsuit alleged includes “all of Bibliotik,” and amounts to 196,640 books.

On top of allegedly accessing copyrighted works through shadow libraries, OpenAI is also accused of using a "controversial data set" called BookCorpus.

BookCorpus, the OpenAI lawsuit said, "was assembled in 2015 by a team of AI researchers for the purpose of training language models." This research team allegedly "copied the books from a website called Smashwords that hosts self-published novels, that are available to readers at no cost." These novels, however, are still under copyright and allegedly "were copied into the BookCorpus data set without consent, credit, or compensation to the authors."

Ars could not immediately reach the BookCorpus researchers or Smashwords for comment. [ Update: Dan Wood, COO of Draft2Digital—which acquired Smashwords in March 2022—told Ars that the Smashwords "store site lists close to 800,000 titles for sale," with "about 100,000" currently priced at free.

"Typically, the free book will be the first of a series," Wood said. "Some authors will keep these titles free indefinitely, and some will run limited promotions where they offer the book for free. From what we understand of the BookCorpus data set, approximately 7,185 unique titles that were priced free at the time were scraped without the knowledge or permission of Smashwords or its authors." It wasn't until March 2023 when Draft2Digital "first became aware of the scraped books being used for commercial purposes and redistributed, which is a clear violation of Smashwords’ terms of service," Wood said.

"Every author, whether they have an internationally recognizable name or have just published their first book, deserve to have their copyright protected," Wood told Ars. "They also should have the confidence that the publishing service they entrust their work with will protect it. To that end, we are working diligently with our lawyers to fully understand the issues—including who took the data and where it was distributed—and to devise a strategy to ensure our authors’ rights are enforced. We are watching the current cases being brought against OpenAI and Meta very closely."]

“Numerous questions of law” raised

Authors claim that by utilizing "flagrantly illegal" data sets, OpenAI allegedly infringed copyrights of Silverman's book The Bedwetter , Golden’s Ararat , and Kadrey’s Sandman Slime . And Meta allegedly infringed copyrights of the same three books, as well as "several" other titles from Golden and Kadrey.

It seems obvious to authors that their books were used to train ChatGPT and LLaMA because the tools "can accurately summarize a certain copyrighted book." Although sometimes ChatGPT gets some details wrong, its summaries are otherwise very accurate, and this suggests that "ChatGPT retains knowledge of particular works in the training data set and is able to output similar textual content," the authors alleged.

It also seems obvious to authors that OpenAI and Meta knew that their models were "ingesting" copyrighted materials because all the copyright-management information (CMI) appears to have been "intentionally removed," authors alleged. That means that ChatGPT never responds to a request for a summary by citing who has the copyright, allowing OpenAI to "unfairly profit from and take credit for developing a commercial product based on unattributed reproductions of those stolen writing and ideas."

"OpenAI knew or had reasonable grounds to know that this removal of CMI would facilitate copyright infringement by concealing the fact that every output from the OpenAI Language Models is an infringing derivative work, synthesized entirely from expressive information found in the training data," the OpenAI complaint said.

Among "numerous questions of law" raised in these complaints was a particularly prickly question: Is ChatGPT or LLaMA itself an infringing derivative work based on perhaps thousands of authors' works?

Authors are already upset that companies seem to be unfairly profiting off their copyrighted materials, and the Meta lawsuit noted that any unfair profits currently gained could further balloon, as "Meta plans to make the next version of LLaMA commercially available." In addition to other damages, the authors are asking for restitution of alleged profits lost.

"Much of the material in the training datasets used by OpenAI and Meta comes from copyrighted works—including books written by plaintiffs—that were copied by OpenAI and Meta without consent, without credit, and without compensation," Saveri and Butterick wrote in their press release.

Read on Ars Technica | Comments

chevron_right

Anthropic’s Claude AI can now digest an entire book like The Great Gatsby in seconds

news.movim.eu / ArsTechnica · Friday, 12 May, 2023 - 15:44

Enlarge / An AI-generated image of a robot reading a book. (credit: Benj Edwards / Stable Diffusion)

On Thursday, AI company Anthropic announced it has given its ChatGPT-like Claude AI language model the ability to analyze an entire book's worth of material in under a minute. This new ability comes from expanding Claude's context window to 100,000 tokens, or about 75,000 words.

Like OpenAI's GPT-4 , Claude is a large language model (LLM) that works by predicting the next token in a sequence when given a certain input. Tokens are fragments of words used to simplify AI data processing, and a "context window" is similar to short-term memory—how much human-provided input data an LLM can process at once.

A larger context window means an LLM can consider larger works like books or participate in very long interactive conversations that span "hours or even days," according to Anthropic:

Read 5 remaining paragraphs | Comments

chevron_right

OpenAI peeks into the “black box” of neural networks with new research

news.movim.eu / ArsTechnica · Thursday, 11 May, 2023 - 21:25

Enlarge / An AI-generated image of robots looking inside an artificial brain. (credit: Stable Diffusion)

On Tuesday, OpenAI published a new research paper detailing a technique that uses its GPT-4 language model to write explanations for the behavior of neurons in its older GPT-2 model, albeit imperfectly. It's a step forward for "interpretability," which is a field of AI that seeks to explain why neural networks create the outputs they do.

While large language models (LLMs) are conquering the tech world, AI researchers still don't know a lot about their functionality and capabilities under the hood. In the first sentence of OpenAI's paper, the authors write, "Language models have become more capable and more widely deployed, but we do not understand how they work."

For outsiders, that likely sounds like a stunning admission from a company that not only depends on revenue from LLMs but also hopes to accelerate them to beyond-human levels of reasoning ability.

Read 10 remaining paragraphs | Comments

chevron_right

The AI race heats up: Google announces PaLM 2, its answer to GPT-4

news.movim.eu / ArsTechnica · Thursday, 11 May, 2023 - 19:20

Enlarge (credit: Google)

On Wednesday, Google introduced PaLM 2 , a family of foundational language models comparable to OpenAI's GPT-4 . At its Google I/O event in Mountain View, California, Google revealed that it already uses PaLM 2 to power 25 products, including its Bard conversational AI assistant.

As a family of large language models (LLMs), PaLM 2 has been trained on an enormous volume of data and does next-word prediction, which outputs the most likely text after a prompt input by humans. PaLM stands for "Pathways Language Model," and " Pathways " is a machine-learning technique created at Google. PaLM 2 follows up on the original PaLM , which Google announced in April 2022.

According to Google, PaLM 2 supports over 100 languages and can perform "reasoning," code generation, and multi-lingual translation. During his 2023 Google I/O keynote, Google CEO Sundar Pichai said that PaLM 2 comes in four sizes: Gecko, Otter, Bison, Unicorn. Gecko is the smallest and can reportedly run on a mobile device. Aside from Bard, PaLM 2 is behind AI features in Docs, Sheets, and Slides.

Read 9 remaining paragraphs | Comments

chevron_right

Warning of AI’s danger, pioneer Geoffrey Hinton quits Google to speak freely

news.movim.eu / ArsTechnica · Monday, 1 May, 2023 - 19:26 · 1 minute

Enlarge / Geoffrey Hinton, chief scientific adviser at the Vector Institute, speaks during The International Economic Forum of the Americas (IEFA) Toronto Global Forum in Toronto, Ontario, Canada, on Thursday, Sept. 5, 2019. (credit: Getty Images / Benj Edwards)

According to the New York Times , AI pioneer Dr. Geoffrey Hinton has resigned from Google so he can "speak freely" about potential risks posed by AI. Hinton, who helped create some of the fundamental technology behind today's generative AI systems, fears that the tech industry's drive to develop AI products could result in dangerous consequences—from misinformation to job loss or even a threat to humanity.

"Look at how it was five years ago and how it is now," the Times quoted Hinton as saying. "Take the difference and propagate it forwards. That’s scary."

Hinton's resume in the field of artificial intelligence extends back to 1972, and his accomplishments have influenced current practices in generative AI. In 1987, Hinton, David Rumelhart, and Ronald J. Williams popularized backpropagation , a key technique for training neural networks that is used in today's generative AI models. In 2012, Hinton, Alex Krizhevsky, and Ilya Sutskever created AlexNet , which is commonly hailed as a breakthrough in machine vision and deep learning, and it arguably kickstarted our current era of generative AI. In 2018, Hinton won the Turing Award , which some call the "Nobel Prize of Computing," along with Yoshua Bengio and Yann LeCun.

Read 8 remaining paragraphs | Comments