The machine learning landscape has been subjected to another round of scrutiny with the latest release of MLPerf inferencing benchmarks. The benchmarks, published on September 11, test the inferencing capabilities of different computer systems using seven categories. For the first time, large language models (LLMs) like GPT-J were also included in the test. Nvidia’s Grace Hopper “superchip” made its debut, whereas Intel’s Habana Gaudi2 continued to make progress. In this comprehensive review, we shall dissect the benchmark results in detail.


The Role of MLPerf: The ‘Olympics of Machine Learning’

MLPerf is a series of benchmark tests that examine seven distinct aspects of machine learning. These aspects are image recognition, medical-imaging segmentation, object detection, speech recognition, natural language processing, a recommender system, and the new addition, LLMs. The benchmarks evaluate how well an already trained neural network performs inferencing tasks on different computing systems.

The term “inferencing” here refers to the system’s capability to apply learned patterns to new data. For those interested in a more in-depth explanation of MLPerf’s functions, you can find additional information here.


GPT-J and The Importance of LLMs

LLMs, or large language models like GPT-J and GPT-3, are gaining importance in the AI domain. While GPT-J has 6 billion parameters, GPT-3 boasts 175 billion. The decision to test GPT-J, a relatively smaller model, was deliberate. According to David Kanter, the executive director at MLCommons, the goal was to make the benchmarking achievable for a broader range of computing solutions.


Nvidia’s Dominance Continues

In this 3.1 version of MLPerf inferencing benchmarks, Nvidia maintained its leadership position. The company introduced its new Grace Hopper superchip, which combines an Arm-based 72-core CPU with an H100 GPU. Most systems using the H100 GPU usually rely on Intel Xeon or AMD Epyc CPUs, making Grace Hopper a significant step forward.

The Grace Hopper machine outperformed its nearest rival, an Nvidia DGX H100 computer, in all categories by 2 to 14 percent. Memory access appears to be a critical advantage, as the Grace Hopper’s proprietary C2C link enables the GPU to access up to 480 gigabytes of CPU memory directly.


Intel’s Progress with Habana Gaudi2

Intel’s Habana Gaudi2 accelerator showed promise, trailing Nvidia’s fastest machine by between 8 and 22 percent at the LLM task. “In inferencing, we are at almost parity with H100,” stated Jordan Plawner, senior director of AI products at Intel. He emphasized that Habana chips are now considered the only viable alternative to Nvidia’s highly demanded H100.


Data Center Efficiency and The Road Ahead

Qualcomm’s Cloud AI 100 chips also made a remarkable appearance, particularly in benchmarks focused on power consumption. Future developments also look promising for Nvidia as it recently announced a new software library, TensorRT-LLM, which could potentially double the H100’s performance on GPT-J.


Conclusion

The latest MLPerf inferencing benchmarks provide valuable insights into the state of machine learning and data-center computing. Nvidia continues to lead, but the competition, particularly from Intel, is intensifying. As technology evolves, these benchmarks serve as critical indicators for industry stakeholders and researchers alike.


The data affirms the swift progress being made in the machine learning domain. However, it also indicates that the race is far from over. With every benchmark, we move a step closer to understanding which systems are best suited for the complex and demanding tasks that the future of AI will inevitably bring.

Also Read: