The new Knights Mill version of Xeon Phi coming next year will be optimised for the toughest AI jobs such as training neural nets.
Nigel Toon, chief executive and co-founder of Graphcore, describes the start-up’s approach, which is still under wraps, as a general way to compute high dimension probability models. Over time it will come in configurations suitable for the variety of today’s neural networking workloads spanning convolutional and recurrent networks in both training and inference modes.
“We thought about the problem from first principles…looking at the broad set of machine learning approaches to make sure we are not excluding any,” Toon said.
“People think they need different hardware approaches, but at their core they are fundamentally attacking the same problems, so we think we can train better than a GPU and be efficient for inference, too,” he said.
The approach makes sense for a start-up that needs to address the broadest possible market with variations of a single design. With its heft, Intel has the luxury of segmenting the market with a broader portfolio of chips.
Figure 1: Intel is working on an end-to-end offering in machine learning, supporting seven AI software frameworks.
“Intel has a full solution, from the network edge to the data centre, including math kernels and open sourced libraries,” said Barry Davis, a general manager for accelerated computing in a briefing for the supercomputer event that pointed to the Nov. 17 AI event.
In an effort to cover the waterfront, Intel is “working on” support for seven machine learning frameworks, including the Neon software it acquired with Nervana, Davis said. By contrast, start-up Graphcore will support just two including Google’s Tensorflow, when it debuts its first products next year.
The Nervana offering alone will include its Neon framework as part of an “end-to-end solution focused on the enterprise with solution blueprints and reference platforms,” said Davis, suggesting Intel will follow its PC model of selling both chips and everything needed to go around them.
Separately, Intel’s Arria FPGA PCIe card will quadruple performance/watt when running so-called scoring or inference jobs when it ships next year, Davis said. Intel’s Altera division will provide tools so users can avoid the complex process of writing RTL code, expertise that “not everyone has in house,” he said.
The new Knights Mill version of Xeon Phi coming next year will be optimised for the toughest AI jobs such as training neural nets. It will support mixed-precision modes, likely including the 16-bit precision work becoming widely adopted to speed the job of getting results when combing through large data sets.
Following the interconnects
Despite its many offerings, Intel’s server customers complain the chip giant is slowing down in its pace of innovation. Intel’s latest Xeon chips on display at the supercomputing show provide evidence of the trend.
Intel is debuting a new 14nm Broadwell-class Xeon E5 2699A with a 55Mb L3 cache and 22 cores running at 2.4GHz. It sports a mere 4.8% gain over the prior chip on the Linpack benchmark popular in high performance computing.
Server makers also worry they will not get enough choice. The fear Intel will tie together its best Xeon, FPGA, Phi and networking processors—and its upcoming 3D XPoint memories—using its proprietary Omnipath, QuickPath Interconnect and other links.
Intel is planning various integrated products but “there will also be ways to do it separately—there will be choice,” said Charlie Wuischpard, general manager of Intel’s high performance platform group.
However, Intel currently has no plans to support open interconnects for accelerators such as CCIX and memory such as GenZ recently launched by companies including Dell and Hewlett-Packard Enterprise.
“We are keeping an eye on these groups [but] we think we will sit on the sidelines and see how their adoption goes—they are not per se that interesting to us,” said Wuischpard.
So far, Intel has 50 large deployments for the discrete version of its Xeon Phi accelerator. It starts shipping this week a version with Omnipath, an Intel link that is an alternative to Infiniband and Ethernet.
Intel is also integrating Omnipath on Skylake, its next 14nm Xeon processor. The company is demoing the chip for the first time at the supercomputing event. The processors will ship next year and also be available in versions without Omnipath.
Omnipath is currently used in more than half of all servers supporting 100 Gbit/second links, said Wuischpard. Some of the “largest supercomputers in many countries are using it,” he said.
Omnipath was used in 28 systems in the November 2016 Top 500 supercomputer list, twice the number of high-end InfiniBand EDR. It delivers 9% higher application performance at 37 percent lower fabric costs than InfiniBand EDR, Intel claims.
“We’ve been accused of giving away the technology, but out main competitor [Infiniband backer Mellanox] shows 72% margins—that leaves us a lot of room to add value,” he said, noting Omnipath can fan out more connections than Infiniband. “So you need fewer switches and a less expensive topology, higher radix switches are the wave of the future and we are on the front edge of that,” he added.