Cerebras Systems, manufacturer of the largest processorbroke the record for the most complex AI model trained using a single device.
using a CS-2 Systempowered by the company’s wafer-sized chip (WSE-2), Cerebras is now able to train AI models with up to 20 billion parameters thanks to new software-level optimizations.
The company says the breakthrough will solve one of the most frustrating problems for AI engineers: the need to partition large-scale models into thousands of GPU. The result is an opportunity to dramatically reduce the time needed to develop and train new models.
Cerebras brings AI to the masses
In subdisciplines such as natural language processing (NLP), model performance correlates linearly with the number of parameters. In other words, the bigger the model, the better the end result.
Today, large-scale AI product development has traditionally involved spreading a model across a large number of GPUs or accelerators, either because there are too many parameters to be housed inside. memory or compute performance is insufficient to handle training workloads.
“This process is painful, often taking months to complete,” Cerebras explained. To make matters worse, the process is unique for each pair of network compute clusters, so the work is not portable to different compute clusters or across neural networks. It is completely bespoke.”
Despite the majority Complex models consist of well over 20 billion parameters, the ability to train relatively large-scale AI models on a single CS-2 device eliminates these bottlenecks for many, accelerating development for existing players and democratizing access for those who previously could not participate in space.
“Cerebras’ ability to bring large language models to the masses with easy and cost-effective access opens an exciting new era in AI. It gives organizations that cannot afford tens of millions easy and inexpensive access to NLP from the major leagues,” said Dan Olds, research director at Intersect360 Research.
“It will be interesting to see the new applications and discoveries that CS-2 customers make while training GPT-3 and GPT-J class models on large datasets.”
Furthermore, Cerebras has hinted that its CS-2 system may be able to handle even larger models in the future, with “up to trillions of parameters”. AND chaining multiple CS-2 systemsmeanwhile, it could pave the way for AI networks larger than the human brain.