Position: Home|News List

The Google Meta departure team trained a large model comparable to GPT-4 in just 4 months, with alumni from Tsinghua University and Beihang University participating.

Author:Li RanranPublish:2024-04-26

Another high-end player has joined the big model poker table.

Trial link: https://chat.reka.ai/auth/login

Reka AI, founded by former leaders from DeepMind, Google Brain, and FAIR, has released their latest multimodal large model, Reka Core, which rivals GPT-4 in all aspects!

Its performance is on par with GPT-4 and Gemini Ultra on several key test sets.

Furthermore, it supports mixed input of three modal data, a capability currently only found in Gemini among mainstream large model products, and its multimodal performance is even slightly stronger than Gemini Ultra.

Specifically, the main technological highlights of Reka Core are reflected in several aspects:

Multimodal capability: It has the ability to understand images, videos, and audio. It has strong contextual understanding of images, videos, and audio, and is currently one of only two full multimodal models on the market, in addition to Gemini. 128K context window. Extremely strong reasoning ability.

Reka Core has super strong reasoning ability (including language and mathematics), making it suitable for tasks requiring complex analysis.

In addition to the large-scale Reka Core, the team has previously released two small open-source models, Reka Flash and Reka Edge.

What is most astonishing is that this model, which performs on par with GPT-4, was developed by a team of only 22 people—

Almost all of the members work remotely, with close to half being of Asian descent, distributed in locations including California, Seattle, London, Zurich, Hong Kong, and Singapore.

In their own words, this "small but fierce" team has made significant contributions to many breakthroughs in the field of artificial intelligence over the past decade.

According to the introduction by the CTO, this model was developed by a team of 20 people in the past 4 months, as 90% of their computing power was only in place at the end of December last year.

With the release of Reka Core, they have moved from behind the scenes to the center stage, ushering in the era of "small team" for the first time for the "big model"!

Can understand the big model of The Three-Body Problem.

In the official demonstration, Reka Core interpreted the classic scene from the first episode of the Netflix series "The Three-Body Problem".

In the video, a person is in a dimly lit room, shining a flashlight on the wall. There are many numbers and equations written in red and black ink on the wall. The person seems to be studying these numbers very seriously, and then turns to the camera to speak, mentioning a countdown and a series of murders that seem to be related in some way.

It's hard to say whether someone who hasn't read "The Three-Body Problem" would be able to capture all these details comprehensively from watching this video. Additionally, Reka Core naturally understands and integrates the scene changes, the characters' actions, and the multimodal information such as sound in the video.

Immediately feed the following segment related to "countdown" to it, and it not only clearly understands the information in the video, but also actively connects the countdown in the visual segment with the countdown in the audio information from the previous video.

The countdown in the video creates a sense of unease and impending danger. This may be related to the series of murders mentioned by the man. It could be related to a timed bomb or a deadline. The character may need to do something before this deadline to avoid catastrophic results or solve a puzzle. Of course, the countdown may also represent the limitation of time, the struggle that the character has to face in a chaotic environment.

In addition to its strong multimodal understanding ability, Reka's coding ability is also very powerful.

The official demonstration includes a visualization of the "three-body problem" python code output by Reka Core:

It can also accurately identify the actors in "The Three-Body Problem". Reka Core automatically associates the characters he played in other works.

The specific model of the helicopter in the lens, the location of the large hadron collider, are all inferred with good reason.

Technical details

Training Data

According to the official explanation of the training data, Reka's three model training data includes public datasets and proprietary/authorized datasets, and the knowledge cutoff date for the dataset is November 2023.

The data ingested by the models includes text, images, videos, and audio clips. The two smaller-scale open-source models, Reka Flash and Reka Edge, were trained on approximately 50 trillion and 45 trillion tokens of data, respectively.

Approximately 25% of the pre-training data is related to code, 30% is related to STEM. About 25% of the data is obtained from web crawling.

Model Structure

The overall architecture of the model is shown in the above figure, which is a modular encoder-decoder architecture. It supports text, image, video, and audio inputs, but currently only supports text output.

The backbone Transformer is based on the "Noam" architecture. From an architectural perspective, it is similar to the PaLM architecture, but without parallel layers.

Dataset Performance

According to the official dataset performance, Reka Core is already on par with GPT-4, and the smaller open-source model Reka Flash's multimodal capabilities are also comparable to Gemini Pro 1.5.

Based on the feedback from human test participants on the performance of several mainstream models, the multimodal test score of Reka Core exceeds that of Claude 3 Grande, but lags behind GPT-4V by a small margin.

After this test, the Reka team also had Reka Core play the role of a human rater, evaluating the output of each model, and the results were very close to the human ratings.

In the same text-based test with human participants, Reka Core's performance was only surpassed by GPT-4 Turbo and Claude 3 XL.

Team Member Introduction

CEO/Co-founder Dani Yogatama

He was born in Indonesia and graduated with a Ph.D. from CMU in 2015. He briefly worked at Baidu's Silicon Valley AI Lab before joining DeepMind, where he worked until 2022. He is now the CEO of Reka AI and also an associate professor in the Computer Science Department at the University of Southern California.

Before founding Reka AI, he had a successful research career and contributed to several well-known papers.

Chief Technology Officer/Co-founder Yi Tay

He comes from Singapore and has served as the technical lead of Google Research and a senior research scientist at Google Brain. During his tenure at Google, he made contributions to many large-scale model projects, such as PaLM, UL2, Flan-{PaLM/UL2/T5}, LaMDA/Bard, and MUM.

In addition to being a highly successful deep learning scientist and entrepreneur, he is also an amateur classical pianist, and obtained an Associate Diploma in Classical Piano Performance from Trinity College London in 2012.

Co-founder Qi Liu

He obtained his Ph.D. from the University of Oxford, worked as a researcher at Fair, and now, in addition to being a co-founder of Reka AI, he also serves as an assistant professor in the Department of Computer Science at the University of Hong Kong.

Che Zheng

He graduated from Tsinghua University with a bachelor's degree and from CMU with a master's degree. Before joining Reka AI, he worked at Kuaishou and Google.

Zhongkai Zhu

Before joining Reka AI, he worked at Meta AI, Microsoft, and Tesla, and graduated from Beihang University with a bachelor's degree.


Copyright © 2024 newsaboutchina.com