
Last year, the International Energy Agency estimated the electricity consumption of data centers, which are associated with running artificial intelligence systems, to be roughly 2 percent of global energy demand. That’s already as much energy consumption as a small country, like the Bahamas. And these numbers are only predicted to grow.
“Over the last 10 years, we have seen a 10,000 times explosion in the number of operations for a given AI model,” explained Naveen Verma, the Ralph H. and Freda I. Augustine Professor of Electrical and Computer Engineering. “We've similarly seen a 10,000 times explosion in the amount of data that these AI models are using.”
This explosion in operations and data together have posed an astronomical strain on current computing systems – graphic processing units or GPUs – which are used to run AI models. And not only is there a strain on processing, the GPU units used for model processing can cost exorbitant amounts of money, up to hundreds of thousands of dollars.
In response, Verma has dedicated his research to designing computing systems that are both cost effective and energy efficient in a world where AI is poised to dominate the amount of computing we all collectively do. “The kind of compute that AI is doing is on an enormous, unprecedented scale and it's also not very well aligned to the kinds of computers we have today,” said Verma.
The answer, according to Verma, is to build entire new systems, bottom up and top down, that allow for a more energy efficient, cost effective ways to run AI models. “By leveraging fundamentally different physics, we can better align with the computations AI is doing,” said Verma. “Our work goes all the way from building the circuits that leverage physics in large-scale chips, to building the software.”
Unlocking innovation
On Feb. 25, Verma gave a talk at the Center for Statistics and Machine Learning in which he described how he and his colleagues design and test new chips and then build software stacks on top of the chips in order to run AI models. The seminar was a part of the center’s ongoing lunchtime faculty seminar series.
Much of the energy expended by computing systems running AI code is spent when accessing stored data. The systems access the data from physical cells, which store the data as bits. These cells are arranged in large arrays, and accessed one row at a time. “Accessing the data across these arrays limits us,” said Verma. “Then, doing computation on the accessed data limits us.”
In the lab, Verma and his collaborators are working to tackle the problems of compute and memory access simultaneously with a new computer architecture. Traditional architectures lose efficiency when they access data by accessing the rows of memory cells lined up on a silicon chip one by one. Verma and his colleagues have leveraged a different type of physics to create an in-memory computing architecture, which effectively accesses all the data stored in the memory at once, and only communicates out the results of the computation, not the individual bits of stored data. “So far, we’ve seen 30 times better energy efficiency than the best chips you can build using standard digital computing architectures,” said Verma.
There’s incredible innovation potential in AI, said Verma, which is what makes the promise of unlocking the ability to run AI more efficiently all the more exciting. “The capabilities of AI systems are improving and going in these directions that we never imagined,” said Verma. “What my team thinks about with these computing systems is not just making it efficient, but making it properly programmable in order to completely unshackle all of that innovation.”