close
    • chevron_right

      Meta’s “massively multilingual” AI model translates up to 100 languages, speech or text

      news.movim.eu / ArsTechnica · Tuesday, 22 August, 2023 - 19:57 · 1 minute

    An illustration of a person holding up a megaphone to a head silhouette that says

    Enlarge (credit: Getty Images)

    On Tuesday, Meta announced SeamlessM4T , a multimodal AI model for speech and text translations. As a neural network that can process both text and audio, it can perform text-to-speech, speech-to-text, speech-to-speech, and text-to-text translations for "up to 100 languages," according to Meta. Its goal is to help people who speak different languages communicate with each other more effectively.

    Continuing Meta's relatively open approach to AI, Meta is releasing SeamlessM4T under a research license (CC BY-NC 4.0) that allows developers to build on the work. They're also releasing SeamlessAlign, which Meta calls "the biggest open multimodal translation dataset to date, totaling 270,000 hours of mined speech and text alignments." That will likely kick-start the training of future translation AI models from other researchers.

    Among the features of SeamlessM4T touted on Meta's promotional blog, the company says that the model can perform speech recognition (you give it audio of speech, and it converts it to text), speech-to-text translation (it translates spoken audio to a different language in text), speech-to-speech translation (you feed it speech audio, and it outputs translated speech audio), text-to-text translation (similar to how Google Translate functions), and text-to-speech translation (feed it text and it will translate and speak it out in another language). Each of the text translation functions supports nearly 100 languages, and the speech output functions support about 36 output languages.

    Read 6 remaining paragraphs | Comments

    • chevron_right

      OpenAI launches GPT-4 API for everyone

      news.movim.eu / ArsTechnica · Monday, 10 July, 2023 - 19:50 · 1 minute

    OpenAI launches GPT-4 API for everyone

    Enlarge (credit: OpenAI)

    On Thursday, OpenAI announced that all paying API customers now have access to the GPT-4 API. It also introduced updates to chat-based models, announced a shift from the Completions API to the Chat Completions API, and outlined plans for deprecation of older models.

    Generally considered its most powerful API product, the GPT-4 API first launched in March but has been under closed testing until now. As an API, developers can use a special interface to integrate OpenAI's large language model (LLM) into their own products for uses such as summarization, coding assistance, analysis, and composition. The model runs remotely on OpenAI's servers and provides output to other apps over the Internet.

    OpenAI says the GPT-4 API with 8K context is accessible to existing developers who have a successful payment history, with plans to open access to new developers by the end of July. And in a move to distance itself from older GPT-3-style models, OpenAI has also opted to begin retiring "Completions API" models in favor of newer Chat Completions API models. Since its March launch , OpenAI says that its Chat Completions API models now account for 97 percent of OpenAI's API GPT usage.

    Read 4 remaining paragraphs | Comments

    • chevron_right

      ChatGPT and Whisper APIs debut, allowing devs to integrate them into apps

      news.movim.eu / ArsTechnica · Wednesday, 1 March, 2023 - 19:54

    An abstract green artwork created by OpenAI.

    Enlarge (credit: OpenAI)

    On Wednesday, OpenAI announced the availability of developer APIs for its popular ChatGPT and Whisper AI models that will let developers integrate them into their apps. An API (application programming interface) is a set of protocols that allows different computer programs to communicate with each other. In this case, app developers can extend their apps' abilities with OpenAI technology for an ongoing fee based on usage.

    Introduced in late November, ChatGPT generates coherent text in many styles. Whisper , a speech-to-text model that launched in September, can transcribe spoken audio into text.

    In particular, demand for a ChatGPT API has been huge, which led to the creation of an unauthorized API late last year that violated OpenAI's terms of service. Now, OpenAI has introduced its own API offering to meet the demand. Compute for the APIs will happen off-device and in the cloud.

    Read 6 remaining paragraphs | Comments