Numenta has researched the brain for 17 years, and now it finally has a product that it hopes can make AI up to 100 times more efficient.
The Redwood City, California-based company — started by computing pioneers Jeff Hawkins and Donna Dubinsky — is unveiling its neuroscience-based AI commercial solution, the Numenta Platform for Intelligent Computing (NuPIC).
It is built on two decades of neuroscience research, and it is based on the theory of the brain and intelligence that Hawkins wrote about in his 2021 book A Thousand Brains.
And interestingly, in a crossover with gaming, Numenta has teamed up Gallium Studios, a game startup started by gaming pioneers Will Wright (co-creator of The Sims) and Lauren Elliott (co-creator of Where in the World is Carmen Sandiego). Gallium Studios is working on Proxi, and it chose Numenta as its AI partner due to the fundamental challenges they faced in incorporating AI into their game while prioritizing user trust and privacy.
With NuPIC, Gallium Studios can achieve high performance running LLMs on CPUs, utilizing both generative and non-generative models as needed. With full control over models and data on-premises, Gallium Studios anticipates that Numenta’s cutting-edge neuroscience-driven research will enable the development of simulated AI players that continuously learn, adapt, and behave intelligently.
NuPIC leverages Numenta’s unique architecture, data structures, and algorithms to enable the efficient deployment of Large Language Models (LLMs) on CPUs. This groundbreaking platform marks a significant milestone in the AI landscape by delivering disruptive performance, substantial cost savings, and crucial privacy, security, and control features. Importantly, NuPIC is designed to be accessible to developers and software engineers, requiring no deep learning expertise, said Numenta CEO Subutai Ahmad in an interview with VentureBeat.
Most LLMs rely on graphics processing units (GPUs), something that has turned graphics chip maker Nvidia into an AI powerhouse over the years. But Numenta has teamed up with Intel, the maker of x86-based central processing units (CPUs) because it takes advantage of the flexible programming model of CPUs compared to the monolithic model of GPUs, Ahmad said. The idea is to bring down the costs of LLMs by switching much of the processing to CPUs.
“We recognize that we’re in a wave of AI confusion. Everyone wants to reap the benefits, but not everyone knows where to start or how to achieve the performance they need to put LLMs into production,” said Ahmad. “The only platform based on the Thousand Brains Theory of Intelligence, NuPIC delivers performance results that elevate CPUs to be the ideal platform for running LLMs. With our optimized inference server, model library, and training module, you can select the right models for your unique business needs, fine-tune them on your data, and run them at extremely high throughput and low latency on CPUs, significantly faster than on an Nvidia A100 GPU— all with utmost security and privacy.”
Furthermore, NuPIC ensures security and privacy for businesses, Ahmad said. Among the features, NuPIC enables consistent high throughput and low latency inference using only CPUs, eliminating the need for complex and costly GPU infrastructures.
And unlike alternative solutions that require sending internal data to external software-as-a-service (SaaS) services, NuPIC operates entirely within the customer’s infrastructure, either on-premise or via private cloud on major cloud providers. This approach guarantees complete control over data and models, ensuring consistent behavior, reducing costs, and enhancing data compliance.
NuPIC’s flexible model library also offers a range of production-ready models, including BERT and GPTs. Customers can optimize for accuracy or speed and create customized versions of existing models to suit their needs.
And NuPIC empowers customers to swiftly prototype LLM-based solutions without requiring extensive machine learning expertise. Backed by a dedicated team of AI experts, NuPIC facilitates seamless deployment of LLMs in production. Delivered as a Docker container, customers can leverage standard MLOps tools and processes to iterate and scale their AI solutions.
These unique features translate into significant business advantages, Ahmad said. NuPIC allows customers to leverage the power of LLMs on easily accessible CPUs, achieving remarkable throughput and latency improvements of 10 to 100 times on Intel 4th Gen Xeon Scalable Processors.
NuPIC enables the selection of the right LLM, fine-tuning with custom data, easy scalability, and the ability to handle larger models without expanding budgets. Most importantly, NuPIC empowers organizations to maintain complete control over their data, ensuring privacy and building trust.
Numenta is currently offering access to NuPIC to a limited number of enterprise customers. The company has about 20 people and it’s been funded privately, both through internal and external sources.
Ahmad has been working on the tech with Hawkins since 2005. The idea was to understand the brain and how it operates so efficiently, and then mimic those capabilities in computer science. Many have tried that and failed, such as IBM with its brain-based research. But Hawkins came up with a unique theory.
At first, Hawkins started the Redwood Neuroscience Institute, and he wrote a book called On Intelligence that debuted in 2004.
In the first part, Hawkins noted that parts of the brain like memory worked with a kind of hierarchy, particularly a temporal hierarchy. It noted you can remember things that happen in a time sequence, and that explains the ease at which you can remember music. The brain worked like a prediction machine, taking lessons from the past and making guesses about the future.
Now Hawkins believes that there are maybe 100,000 brains, or cortical columns, that operate in your brain as if they were independent brains within a larger overall system. The different cortical columns collaborate as you think.
“We’ve always felt there were more fundamental things to learn from neuroscience,” Ahmad said. “At this point, we feel we now have a complete framework for how the basics of intelligence are implemented in the neocortex. The neuroscience field has exploded over the last 20 to 30 years. We think it’s about time we take that and turn that understanding into real algorithms and implement them in AI systems.”
It exists as a software application that can run on any Intel-compatible CPU. That means it can run on Intel and Advanced Micro Devices CPUs, but not on Arm-based CPUs at the moment. Intel has validated that the technology works, Ahmad said.
“We’ve been around for a long time, we’ve been deep into doing neuroscience research and really trying to understand deeply how the brain works. So, Jeff published a book called The Thousand Brains Theory of Intelligence, which came out two years ago. That really encapsulates that research side and what we’ve learned from the neuroscience.”
Bill Gates lauded the book as one of the five best of 2021. And Numenta investigated how that theory could impact practical AI systems.
“It turns out the first place we can take these learnings of neuroscience is to make transformer models — these large language models (LLMs) or GPT models up to 100 times more efficient,” Ahmad said.
As an example, you can show a person a picture of a cat and the human will learn that it’s a cat right away. For an AI model, you have to show it thousands and thousands of images of cats before it can recognize a cat. Numenta started building up a suite of algorithms for these short cuts.
“The trick was learning how to map that knowledge of the brain from an engineering perspective to existing hardware systems,” Ahmad said. “Once we figured out how to map it to the hardware systems, we could actually run it at scale, rather than build our own brain hardware.”
The company has proved that it works commercially, and it is generating revenue.
He noted the brain is super efficient, using only perhaps 20 watts of power, whereas deep learning systems require lots of GPUs. By switching to CPUs, the processing can be much more efficient, Ahmad said.
“We think this is a watershed moment,” he said. “People can use it on commodity servers and CPUs. They don’t need to get special purpose GPU systems. Once it’s on CPUs, there’s so much flexibility you have, whereas with GPUs, it’s very hard to program them to be flexible.”
He said you can have multiple models running at the same time and don’t need to operate with large batches. It also doesn’t need to run in a cloud service so it can have better privacy security and control.
Docker containers are an easy way to run software without a complex installation process.
“Enterprises can save huge amounts of money and the price performance is unparalleled,” he said.
Ahmad said that the CPU focus makes sense because high-end GPUs are pretty much sold out for the next year or 18 months, due to lack of manufacturing capacity as the AI revolution takes off. GPUs are also relatively inflexible, often doing the same kind of calculations in parallel, in contrast to CPUs.
“That was fundamentally important to us to enable us to do these more innovative algorithms,” said Ahmad. “The brain doesn’t just do tons of dense matrix multiplications. It selectively decides what you want to compute, and when you want to compute it, and how you want to allocate computation, because it’s all metabolic energy in the brain. So, it’s developed a lot of smart strategies. But to do that, you need to be able to write those algorithms in a flexible way. So, CPUs are inherently better as well. CPUs are way more flexible than GPUs.”
One of the tricks that the brain uses is avoiding computation, rather than doing lots of useless or repetitive computation, Ahmad said.
“That’s the idea that we have imported into transformers,” he said.
To me, some of that sounds like AI being used in computer graphics. Nvidia calls it DLSS (deep learning super sampling), where AI speeds up the graphics processing by assuming that one patch of green in an image means that it’s very likely that the next pixel it draws will be green too, and so it assumes that the pixel will be green and it can skip a lot of calculation. That’s a case of AI and graphics working together.
You can take existing LLMs and deploy them in Numenta’s optimized inference server running on CPUs. Then you can write applications on top of that. In addition, Numenta has a training model, so you can fine tune your models to be more specific to your applications. Since it is delivered in Docker containers, it can run on the customer’s infrastructure, such as Gallium Studios’ own systems.
This is enabling the future of games and AI, where you have characters that are like you dubbed “proxis” in the Gallium Studios game.
“Our latest game, Proxi, is an expansive interactive world populated by your personal memories and connections. We turned to Numenta because of fundamental challenges we faced in incorporating AI – not only to deliver the best experience possible to our players, but also ensure that we never jeopardize the trust and privacy they place in us,” said Lauren Elliott, CEO of Gallium Studios, in a statement. “With NuPIC, we can run LLMs with incredible performance on CPUs and use both generative and non-generative models as needed. And, because everything is on-prem, we have full control of models and data. Over time, Numenta’s cutting edge neuroscience driven research will enable us to build simulated AI players that continuously learn, adapt, and behave in truly intelligent fashion. We are excited by the possibilities.”
Those proxis simulate you to some extent. To run efficiently on CPUs, you need to be able to run lots of models at the same time, which is hard to do on GPUs.
“The way that we are doing our product is a perfect fit for their game. And I personally think this is going to be true for lots of different games that want to incorporate AI,” Ahmad said. “They may see two orders of magnitude performance improvement, depending on the exact model, and a huge price performance difference.”
We’re thrilled to announce the return of GamesBeat Next, hosted in San Francisco this October, where we will explore the theme of “Playing the Edge.” Apply to speak here and learn more about sponsorship opportunities here. At the event, we will also announce 25 top game startups as the 2024 Game Changers. Apply or nominate today!
Numenta has researched the brain for 17 years, and now it finally has a product that it hopes can make AI up to 100 times more efficient.
The Redwood City, California-based company — started by computing pioneers Jeff Hawkins and Donna Dubinsky — is unveiling its neuroscience-based AI commercial solution, the Numenta Platform for Intelligent Computing (NuPIC).
It is built on two decades of neuroscience research, and it is based on the theory of the brain and intelligence that Hawkins wrote about in his 2021 book A Thousand Brains.
And interestingly, in a crossover with gaming, Numenta has teamed up Gallium Studios, a game startup started by gaming pioneers Will Wright (co-creator of The Sims) and Lauren Elliott (co-creator of Where in the World is Carmen Sandiego). Gallium Studios is working on Proxi, and it chose Numenta as its AI partner due to the fundamental challenges they faced in incorporating AI into their game while prioritizing user trust and privacy.
Event
VB Transform 2023 On-Demand
Did you miss a session from VB Transform 2023? Register to access the on-demand library for all of our featured sessions.
With NuPIC, Gallium Studios can achieve high performance running LLMs on CPUs, utilizing both generative and non-generative models as needed. With full control over models and data on-premises, Gallium Studios anticipates that Numenta’s cutting-edge neuroscience-driven research will enable the development of simulated AI players that continuously learn, adapt, and behave intelligently.
A new software platform
NuPIC leverages Numenta’s unique architecture, data structures, and algorithms to enable the efficient deployment of Large Language Models (LLMs) on CPUs. This groundbreaking platform marks a significant milestone in the AI landscape by delivering disruptive performance, substantial cost savings, and crucial privacy, security, and control features. Importantly, NuPIC is designed to be accessible to developers and software engineers, requiring no deep learning expertise, said Numenta CEO Subutai Ahmad in an interview with VentureBeat.
Most LLMs rely on graphics processing units (GPUs), something that has turned graphics chip maker Nvidia into an AI powerhouse over the years. But Numenta has teamed up with Intel, the maker of x86-based central processing units (CPUs) because it takes advantage of the flexible programming model of CPUs compared to the monolithic model of GPUs, Ahmad said. The idea is to bring down the costs of LLMs by switching much of the processing to CPUs.
“We recognize that we’re in a wave of AI confusion. Everyone wants to reap the benefits, but not everyone knows where to start or how to achieve the performance they need to put LLMs into production,” said Ahmad. “The only platform based on the Thousand Brains Theory of Intelligence, NuPIC delivers performance results that elevate CPUs to be the ideal platform for running LLMs. With our optimized inference server, model library, and training module, you can select the right models for your unique business needs, fine-tune them on your data, and run them at extremely high throughput and low latency on CPUs, significantly faster than on an Nvidia A100 GPU— all with utmost security and privacy.”
Furthermore, NuPIC ensures security and privacy for businesses, Ahmad said. Among the features, NuPIC enables consistent high throughput and low latency inference using only CPUs, eliminating the need for complex and costly GPU infrastructures.
And unlike alternative solutions that require sending internal data to external software-as-a-service (SaaS) services, NuPIC operates entirely within the customer’s infrastructure, either on-premise or via private cloud on major cloud providers. This approach guarantees complete control over data and models, ensuring consistent behavior, reducing costs, and enhancing data compliance.
NuPIC’s flexible model library also offers a range of production-ready models, including BERT and GPTs. Customers can optimize for accuracy or speed and create customized versions of existing models to suit their needs.
And NuPIC empowers customers to swiftly prototype LLM-based solutions without requiring extensive machine learning expertise. Backed by a dedicated team of AI experts, NuPIC facilitates seamless deployment of LLMs in production. Delivered as a Docker container, customers can leverage standard MLOps tools and processes to iterate and scale their AI solutions.
These unique features translate into significant business advantages, Ahmad said. NuPIC allows customers to leverage the power of LLMs on easily accessible CPUs, achieving remarkable throughput and latency improvements of 10 to 100 times on Intel 4th Gen Xeon Scalable Processors.
NuPIC enables the selection of the right LLM, fine-tuning with custom data, easy scalability, and the ability to handle larger models without expanding budgets. Most importantly, NuPIC empowers organizations to maintain complete control over their data, ensuring privacy and building trust.
Numenta is currently offering access to NuPIC to a limited number of enterprise customers. The company has about 20 people and it’s been funded privately, both through internal and external sources.
A long journey
Ahmad has been working on the tech with Hawkins since 2005. The idea was to understand the brain and how it operates so efficiently, and then mimic those capabilities in computer science. Many have tried that and failed, such as IBM with its brain-based research. But Hawkins came up with a unique theory.
At first, Hawkins started the Redwood Neuroscience Institute, and he wrote a book called On Intelligence that debuted in 2004.
In the first part, Hawkins noted that parts of the brain like memory worked with a kind of hierarchy, particularly a temporal hierarchy. It noted you can remember things that happen in a time sequence, and that explains the ease at which you can remember music. The brain worked like a prediction machine, taking lessons from the past and making guesses about the future.
Taking theory into practice
Now Hawkins believes that there are maybe 100,000 brains, or cortical columns, that operate in your brain as if they were independent brains within a larger overall system. The different cortical columns collaborate as you think.
“We’ve always felt there were more fundamental things to learn from neuroscience,” Ahmad said. “At this point, we feel we now have a complete framework for how the basics of intelligence are implemented in the neocortex. The neuroscience field has exploded over the last 20 to 30 years. We think it’s about time we take that and turn that understanding into real algorithms and implement them in AI systems.”
It exists as a software application that can run on any Intel-compatible CPU. That means it can run on Intel and Advanced Micro Devices CPUs, but not on Arm-based CPUs at the moment. Intel has validated that the technology works, Ahmad said.
“We’ve been around for a long time, we’ve been deep into doing neuroscience research and really trying to understand deeply how the brain works. So, Jeff published a book called The Thousand Brains Theory of Intelligence, which came out two years ago. That really encapsulates that research side and what we’ve learned from the neuroscience.”
Bill Gates lauded the book as one of the five best of 2021. And Numenta investigated how that theory could impact practical AI systems.
“It turns out the first place we can take these learnings of neuroscience is to make transformer models — these large language models (LLMs) or GPT models up to 100 times more efficient,” Ahmad said.
As an example, you can show a person a picture of a cat and the human will learn that it’s a cat right away. For an AI model, you have to show it thousands and thousands of images of cats before it can recognize a cat. Numenta started building up a suite of algorithms for these short cuts.
“The trick was learning how to map that knowledge of the brain from an engineering perspective to existing hardware systems,” Ahmad said. “Once we figured out how to map it to the hardware systems, we could actually run it at scale, rather than build our own brain hardware.”
The company has proved that it works commercially, and it is generating revenue.
He noted the brain is super efficient, using only perhaps 20 watts of power, whereas deep learning systems require lots of GPUs. By switching to CPUs, the processing can be much more efficient, Ahmad said.
“We think this is a watershed moment,” he said. “People can use it on commodity servers and CPUs. They don’t need to get special purpose GPU systems. Once it’s on CPUs, there’s so much flexibility you have, whereas with GPUs, it’s very hard to program them to be flexible.”
He said you can have multiple models running at the same time and don’t need to operate with large batches. It also doesn’t need to run in a cloud service so it can have better privacy security and control.
Docker containers are an easy way to run software without a complex installation process.
“Enterprises can save huge amounts of money and the price performance is unparalleled,” he said.
Ahmad said that the CPU focus makes sense because high-end GPUs are pretty much sold out for the next year or 18 months, due to lack of manufacturing capacity as the AI revolution takes off. GPUs are also relatively inflexible, often doing the same kind of calculations in parallel, in contrast to CPUs.
“That was fundamentally important to us to enable us to do these more innovative algorithms,” said Ahmad. “The brain doesn’t just do tons of dense matrix multiplications. It selectively decides what you want to compute, and when you want to compute it, and how you want to allocate computation, because it’s all metabolic energy in the brain. So, it’s developed a lot of smart strategies. But to do that, you need to be able to write those algorithms in a flexible way. So, CPUs are inherently better as well. CPUs are way more flexible than GPUs.”
One of the tricks that the brain uses is avoiding computation, rather than doing lots of useless or repetitive computation, Ahmad said.
“That’s the idea that we have imported into transformers,” he said.
To me, some of that sounds like AI being used in computer graphics. Nvidia calls it DLSS (deep learning super sampling), where AI speeds up the graphics processing by assuming that one patch of green in an image means that it’s very likely that the next pixel it draws will be green too, and so it assumes that the pixel will be green and it can skip a lot of calculation. That’s a case of AI and graphics working together.
You can take existing LLMs and deploy them in Numenta’s optimized inference server running on CPUs. Then you can write applications on top of that. In addition, Numenta has a training model, so you can fine tune your models to be more specific to your applications. Since it is delivered in Docker containers, it can run on the customer’s infrastructure, such as Gallium Studios’ own systems.
Gallium Studios and the future of games and AI
This is enabling the future of games and AI, where you have characters that are like you dubbed “proxis” in the Gallium Studios game.
“Our latest game, Proxi, is an expansive interactive world populated by your personal memories and connections. We turned to Numenta because of fundamental challenges we faced in incorporating AI – not only to deliver the best experience possible to our players, but also ensure that we never jeopardize the trust and privacy they place in us,” said Lauren Elliott, CEO of Gallium Studios, in a statement. “With NuPIC, we can run LLMs with incredible performance on CPUs and use both generative and non-generative models as needed. And, because everything is on-prem, we have full control of models and data. Over time, Numenta’s cutting edge neuroscience driven research will enable us to build simulated AI players that continuously learn, adapt, and behave in truly intelligent fashion. We are excited by the possibilities.”
Those proxis simulate you to some extent. To run efficiently on CPUs, you need to be able to run lots of models at the same time, which is hard to do on GPUs.
“The way that we are doing our product is a perfect fit for their game. And I personally think this is going to be true for lots of different games that want to incorporate AI,” Ahmad said. “They may see two orders of magnitude performance improvement, depending on the exact model, and a huge price performance difference.”
GamesBeat’s creed when covering the game industry is “where passion meets business.” What does this mean? We want to tell you how the news matters to you — not just as a decision-maker at a game studio, but also as a fan of games. Whether you read our articles, listen to our podcasts, or watch our videos, GamesBeat will help you learn about the industry and enjoy engaging with it. Discover our Briefings.
Author: Dean Takahashi
Source: Venturebeat
Reviewed By: Editorial Team