The best news of all comes from a handful of talks by young people in or emerging from academia. Their smarts and energy are fuel for the future.
The annual Hot Chips event in California reflects a semiconductor industry moving rapidly in multiple directions with a lot of energy pouring into machine learning, smarter cars and novel sensors.
A handful of talks showed the massive mobile sector is clearly still doing great work but advances seem more incremental. Microprocessors, the traditional centre of the event, was still very much present with Intel describing Skylake while IBM announced Power 9 and AMD unveiled its Zen x86 core.
Heated competition in cloud computing continued to drive significant silicon efforts. Traditional players such as Intel showed how it is becoming more vertically integrated as a result of acquisitions while new upstarts such as China’s Phytium emerged.
Perhaps the best news of all came from a handful of talks by young people in or emerging from academia. Their smarts and energy are fuel for the future.
The intense effort around perfecting convolutional neural networks for machine learning needs to start taking power efficiency into account, said David Moloney, chief technology officer at Movidius.
The current focus on training neural networks is driving demand for power-hogging servers packing multiple GPUs like one Facebook recently announced. Moloney made the case that in many applications, more gains in performance, latency and power are won in inference engines like Movidius’ chips placed on the edge of the network close to sensors.
Clearly, his call to action served his company’s agenda, but it also seemed to have in mind the best interests of the machine-learning sector which is rapidly evolving without a sense of the power issues plaguing semiconductors.
Two years ago, Baidu principal architect Jian Ouyang gave a talk on how FPGAs provide good performance for accelerating machine learning at much lower power consumption than GPUs. This year he returned to describe a card China’s Web giant is now using at levels of 10,000 units or more.
The software-defined accelerator is based on an eight-lane PCIe 3.0 card using a 20nm Xilinx KU115 FPGA and 8-32Gbytes memory. It parses SQL constructs into five types that hardware processing elements in the FPGA can speed up from 8-55x depending on the application.
He described the SDA as a general-purpose accelerator that is more power efficient than a GPU for sorting, especially with complex data types. Although the cards have C++ APIs they are programmed using an RTL flow, not OpenCL which he suggested makes timing closure difficult in cases using lots of look-up tables.
Overall, “CPUs are improving more slowly than big data is growing, so we need acceleration to bridge the gap,” Ouyang said, showing a 10x speed up on one application.
__Figure 1:__ *The board from DeePhi Technology attack a space where deep learning experts and RTL engineers currently butt heads. (Source: DeePhi)*
With the ink on his bachelor’s degree barely dry, Song Yao came to Hot Chips with a novel approach to running machine-learning algorithms on FPGAs. The boards and automated compilation tools of his start-up, DeePhi Technology, attack a space where deep learning experts and RTL engineers currently butt heads.
The start-up’s tools are positioned as an alternative to using OpenCL which can require a month’s work. By contrast, DeePhi’s tool generates instructions in as little as a minute for algorithm designers who don’t understand RTL, delivering higher performance and efficiency, Yao claimed.
Neural networking algorithms are still evolving rapidly, Yao said. So DeePhi is rolling out one board for convolutional neural nets and imaging apps and another for Long Short Term Memory nets that target uses such as voice recognition.
The young co-founder at chief executive aims to ship the boards in October and roll out a whole new generation of CNN products late this year. He made the case that for the inference side of deep learning, FPGAs deliver good performance for much lower power than GPUs.
When it comes to machine learning in self-driving cars, Google feels the need for speed in silicon. In a keynote talk, Daniel L. Rosenband, who leads the compute team for Google’s self-driving car, called for a 14/16nm processor that could crank 16-bit floating point jobs at five tera-operations/second.
“That’s a compelling number,” he told attendees. “The CPU continues to be a really important part…We try to cram in as much compute as we can to give our software team more to work with…so we use the best chips we can get—maximum performance is the main goal,” he said.
So far Google has developed at least three generations of electronics for its test cars, the latest one fitting into a briefcase-sized box with automotive connectors. But like the sealed unit, Rosenband, who developed chips at start-ups Sandburst and MetaRAM, is mum on what’s inside.
The version the team is working on for next year will be 50 times faster than the 2012 box which aimed at slow neighbourhood roads and looked like a homemade desktop PC. In 2015, a car targeting use on city streets “needed a rack—it was fun to put this together but it was not so nice to debug,” he joked.
Large boxes of expensive chips have been OK for the prototypes Google has built to date. But it increasingly sees on the horizon the need for sleek boxes and lower cost silicon to deliver real products, he said.
Radar and lidar sensors are the other two technical keys to the smart cars. “A number of companies are working on pretty clever photo detectors to generate point clouds,” he said.
Generating useful 360-degree 3D maps in real time continues to pose tough problems in computer vision. To date the Google cars have driven nearly two million miles on highways, city streets and test tracks to learn from situations that sometimes stump even human eyes.
“For every object we detect we have a probability distribution of how it will move through the scene, and based on that we have a plan for the best trajectory for where the vehicle should go,” he said.
Nvidia, one of many companies angling to get a ride in self-driving cars, described its Parker SoC. Used in its Drive PX 2 board, the chip packs a variant of its new Pascal GPU, two custom 64-bit ARM cores and four ARM A57 cores. It comes from the Tegra line initially targeting mobile phones but now shifting gears into automotive, outfitted for the ride with Ethernet AVB and dual CAN bus peripherals.
The automotive theme at Hot Chips took a novel turn with a talk by start-up Clear Motion. Its GenShock active shock absorber uses electronically controlled actuators to push and pull on a car chassis in ways that even out an otherwise bumpy ride. The company’s secret sauce is all in software riding on off-the-shelf sensors and a 32-bit controller.
__Figure 2:__ G*enShock active shock absorber (Source: Clear Motion)*
Start-up Sentons described a novel touchscreen sensor using 500kHz radio waves geared for smartphones and a range of other devices. It uses piezoelectric arrays embedded in flexible circuits on either side of an active area to create an RF sensing field.
A 19.4mm2 chip made in 65nm uses a licensed microcontroller and Tensilica DSP to simultaneously calculate x, y and z-force coordinates. It works across distances of about 12 inches on glass, metal and plastic surfaces – including devices under water.
The sensor is simpler to make and lower cost than the capacitive touch sensors in today’s mainstream smartphones, especially considering the kind of force sensors Apple recently layered into the iPhone 6S, said chief executive Samuel Sheng.
He showed his sensors making the sides and back of a smartphone case active areas.
“If you want to take a selfie, you just squeeze the phone and don’t have to go hunting for a button—it’s the number one use case, so far,” he said.