AI & Robotics News

Chinese AI unicorn’s 34B LLM outperforms larger Llama 2 and Falcon models

November 7, 2023

01.AI, the Chinese startup founded by veteran AI expert and investor Kai-Fu Lee, has released a 34-billion parameter large language model (LLM) that outperforms the 70-billion Llama 2 and 180 billion Falcon open-source counterparts built by Meta Platforms, Inc., and the Technology Innovation Institute in Abu Dhabi, respectively.

Dubbed Yi-34B, the new AI model supports Chinese and English languages and can be fine-tuned for a variety of use cases. The startup also offers a smaller option that has been trained with 6 billion parameters and performs worse, but still respectably, on widely used AI/ML model benchmarks.

Eventually, the company, which has already hit unicorn status in less than eight months of its launch, plans to double down these models and launch a commercial offering capable of taking on OpenAI, the current generative AI market leader by number of users.

The strategy highlights a global trend where global companies are developing generative AI models geared primarily towards their respective markets.

Lee founded 01.AI in March with a mission to contribute to the AI 2.0 era, where large language models could enhance human productivity and empower them to create significant economic and societal shifts.

“The team behind 01.AI firmly believes that the new AI 2.0 driven by foundation model breakthrough is revolutionizing technology, platforms, and applications at all levels. We predict that AI 2.0 will create a platform opportunity ten times larger than the mobile internet, rewriting all software and user interfaces. This trend will give rise to the next wave of AI-first applications and AI-empowered business models, fostering AI 2.0 innovations over time,” the company writes on its website.

According to reports, Lee was quick to assemble a technology team including AI experts from companies like Google, Huawei and Microsoft Research Asia and pile up the chips required for training 01.AI’s Yi series of models.

The initial funding for the effort was led by Sinovation Ventures, which is also chaired by Lee, as well as Alibaba’s cloud unit. However, the exact amount raised remains unclear at this stage.

The first public release from the company introduced two bilingual (English/Chinese) base models with the parameter sizes of 6B and 34B – both trained with 4K sequence length with the option to extend to 32K during inference time. The subsequent release of the models came with 200K context length.

On Hugging Face, the base 34B model stood out with a performance better than the much larger pre-trained base LLMs, including Llama 2-70B and Falcon-180B.

For example, when the benchmarked tasks revolved around common reasoning and reading comprehension, the 01.AI model delivered scores of 80.1 and 76.4, while Llama 2 followed closely with scores of 71.9 and 69.4. Even on the MMLU (massive multitask language understanding) benchmark, the Chinese model did better with a score of 76.3, while the Llama and Falcon models had a score of 68.9 and 70.4, respectively.

A smaller model delivering better performance could save compute resources for end users, allowing them to fine-tune the model and build applications targeting different use cases cost-effectively. According to the company, all models under its current Yi series are fully open for academic research. However, if the need is free commercial use, teams will have to obtain the necessary permissions to get started with the models.

The current offerings from Lee’s startup are lucrative options for global organizations serving customers in China. They can use the model to build chatbots answering in both English and Chinese. Moving ahead, the company plans to expand these efforts by adding support for more languages to the open-source models. It also plans to launch a bigger commercial LLM targeting OpenAI’s GPT series, although not much has been revealed on the project so far.

Notably, 01.AI is not the only AI startup focusing on specific languages and markets with LLMs. Just last month, Chinese giant Baidu announced the release of ERNIE 4.0 LLM and previewed a whole host of new applications built atop it, including Qingduo, a creative platform that aims to rival Canva and Adobe Creative Cloud.

Similarly, Korean giant Naver is offering HyperCLOVA X, its next-generation large language model (LLM) that has learned 6,500 times more Korean data than ChatGPT and is particularly useful for localized experiences where it can understand not only natural Korean-language expressions but also laws, institutions and cultural context relevant to Korean society. India’s Reliance Industries is also working with Nvidia to build a large language model trained on the nation’s diverse languages, tailored for different applications.

VentureBeat presents: AI Unleashed – An exclusive executive event for enterprise data leaders. Network and learn with industry peers. Learn More

01.AI, the Chinese startup founded by veteran AI expert and investor Kai-Fu Lee, has released a 34-billion parameter large language model (LLM) that outperforms the 70-billion Llama 2 and 180 billion Falcon open-source counterparts built by Meta Platforms, Inc., and the Technology Innovation Institute in Abu Dhabi, respectively.

Dubbed Yi-34B, the new AI model supports Chinese and English languages and can be fine-tuned for a variety of use cases. The startup also offers a smaller option that has been trained with 6 billion parameters and performs worse, but still respectably, on widely used AI/ML model benchmarks.

Eventually, the company, which has already hit unicorn status in less than eight months of its launch, plans to double down these models and launch a commercial offering capable of taking on OpenAI, the current generative AI market leader by number of users.

The strategy highlights a global trend where global companies are developing generative AI models geared primarily towards their respective markets.

VB Event

AI Unleashed

An exclusive invite-only evening of insights and networking, designed for senior enterprise executives overseeing data stacks and strategies.

LLM-driven vision of ‘Human+AI’

Lee founded 01.AI in March with a mission to contribute to the AI 2.0 era, where large language models could enhance human productivity and empower them to create significant economic and societal shifts.

“The team behind 01.AI firmly believes that the new AI 2.0 driven by foundation model breakthrough is revolutionizing technology, platforms, and applications at all levels. We predict that AI 2.0 will create a platform opportunity ten times larger than the mobile internet, rewriting all software and user interfaces. This trend will give rise to the next wave of AI-first applications and AI-empowered business models, fostering AI 2.0 innovations over time,” the company writes on its website.

According to reports, Lee was quick to assemble a technology team including AI experts from companies like Google, Huawei and Microsoft Research Asia and pile up the chips required for training 01.AI’s Yi series of models.

The initial funding for the effort was led by Sinovation Ventures, which is also chaired by Lee, as well as Alibaba’s cloud unit. However, the exact amount raised remains unclear at this stage.

The first public release from the company introduced two bilingual (English/Chinese) base models with the parameter sizes of 6B and 34B – both trained with 4K sequence length with the option to extend to 32K during inference time. The subsequent release of the models came with 200K context length.

On Hugging Face, the base 34B model stood out with a performance better than the much larger pre-trained base LLMs, including Llama 2-70B and Falcon-180B.

For example, when the benchmarked tasks revolved around common reasoning and reading comprehension, the 01.AI model delivered scores of 80.1 and 76.4, while Llama 2 followed closely with scores of 71.9 and 69.4. Even on the MMLU (massive multitask language understanding) benchmark, the Chinese model did better with a score of 76.3, while the Llama and Falcon models had a score of 68.9 and 70.4, respectively.

A smaller model delivering better performance could save compute resources for end users, allowing them to fine-tune the model and build applications targeting different use cases cost-effectively. According to the company, all models under its current Yi series are fully open for academic research. However, if the need is free commercial use, teams will have to obtain the necessary permissions to get started with the models.

Much more to come

The current offerings from Lee’s startup are lucrative options for global organizations serving customers in China. They can use the model to build chatbots answering in both English and Chinese. Moving ahead, the company plans to expand these efforts by adding support for more languages to the open-source models. It also plans to launch a bigger commercial LLM targeting OpenAI’s GPT series, although not much has been revealed on the project so far.

Notably, 01.AI is not the only AI startup focusing on specific languages and markets with LLMs. Just last month, Chinese giant Baidu announced the release of ERNIE 4.0 LLM and previewed a whole host of new applications built atop it, including Qingduo, a creative platform that aims to rival Canva and Adobe Creative Cloud.

Similarly, Korean giant Naver is offering HyperCLOVA X, its next-generation large language model (LLM) that has learned 6,500 times more Korean data than ChatGPT and is particularly useful for localized experiences where it can understand not only natural Korean-language expressions but also laws, institutions and cultural context relevant to Korean society. India’s Reliance Industries is also working with Nvidia to build a large language model trained on the nation’s diverse languages, tailored for different applications.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Author: Shubham Sharma
Source: Venturebeat
Reviewed By: Editorial Team

202

0