The venture funding supports release of Lightmatter's Envise AI accelerator.
Lightmatter, the MIT spinout building AI accelerators with a silicon photonics computing engine, announced a Series B funding round, raising an additional $80 million. The company’s technology is based on proprietary silicon photonics technology which manipulates coherent light inside a chip to perform calculations very quickly while using very little power (see our primer: How Does Optical Compute Work?).
“This Series B will get us through our early access programs with our customers and fund our team growing,” Lightmatter CEO Nick Harris said in an interview. “We’re building a lot of our go-to-market team… it’s about taking the company from being an engineering organization that had to invent all this stuff, to being a company that’s focused on delivering it to customers – yield, reliability, margins, and the business side of things.”
Following the unveiling of its photonic computing proof-of-concept device last summer, the company announced its Envise AI accelerator. The system is a general-purpose AI inference accelerator for data centers. Its form factor is a 4U server blade incorporating 16 Envise chips. The board also has two AMD Epyc 7002 series host processors, two laser modules, standard power supplies, fans and I/O.
The 4U blade consumes about 3 kW maximum, which enabled Envise to compare favorably with alternative approaches on computing density. One reasons is racks can be filled with the Lightmatter systems (alternative solutions may hit the rack’s power limit before the rack is full, so slots are often left empty).
Two Lightmatter Envise blades can process about 1.2 million ResNet-50 inferences per second, equivalent to about 240 inferences per second per Watt (the per-Watt number is somewhat diluted by the power consumption of the host processors and the rest of the system, Harris explained). Measured against the BERT-base and DLRM benchmarks, each board’s 8-GB on-chip SRAM (total for 16 chips), plus more than 100-Tbps bandwidth access to SRAM, comes into play.
Each Envise 4U server blade also offers up to 6.4 Tbps of optical interconnect for multi-blade scale-out, 1 TB of DDR4 DRAM, and 3 TB of solid-state memory.
Inside the chip
As with the proof-of-concept device, two dies are stacked inside each Envise chip: a 12-nm ASIC which orchestrates data flow to the photonic cores, plus the silicon photonics chip itself.
The ASIC provides 500 MB on-chip SRAM alongside control electronics which send commands to the optical processor, configuring the ADCs and DACs. An optical interconnect fabric, 400 Gbps per chip, allows large arrays of Envise chips to be connected for large computing clusters. The 256 RISC cores can offload certain tasks from the photonic chip.
“This really means that it’s general purpose, it’s future-proof,” Harris said. “If someone comes up with new operators that we hadn’t thought of — we hard code a lot of things into the chip, just like everyone else — if you come up with something new, we’ll catch it with the RISC cores.”
The photonic chip has 256 x 256 photonic arithmetic units, configured as a dual-core processor.
“The two cores can operate completely independently of each other and they can be fused in different ways,” Harris said. “You could stack them to do a long matrix, or you could put them side by side for a square.”
Moving to a production device (fabricated by GlobalFoundries) brought other challenges.
“You don’t appreciate it until you try to build something really complicated, but we simplified the heck out of the thing –- anywhere we could reduce complexity, even at the cost of a little bit of performance, we would do that just to reduce risk and increase yield,” said Harris. “Between the Hot Chips announcement and today, the architecture we’re using is something that’s really tailored towards production, and it’s been simplified. That’s really the big difference here.”
Harris said Lightmatter is also working hard on design-for-test strategies.
The company’s Idiom software stack compiles and executes neural network models for Envise hardware. It also offers tools for debug, profiling and model optimization. Idiom can automatically detect the configuration of the Envise chips and inform the compiler, allowing it to generate programs that run efficiently across very large scale systems.
Lightmatter is positioning Envise as an inference accelerator, though it is capable of handling minor training updates.
“If you have a model that’s deployed and you get new data overnight, you can train on Envise and update the model parameters,” Harris said. He explained that while training a large model from scratch might take ten epochs (the number of epochs controls the number of complete passes through the dataset when training), fine tuning a model might take only one epoch, so could be handled efficiently with Envise. Full-scale training is technically possible with Envise, but the current system doesn’t have the stacks of 3D HBM DRAM required to do it efficiently.
Asked whether Lightmatter’s roadmap includes an HBM version of the system for training applications, Harris said: “I think that would be very interesting.”
Aside from training, future Lightmatter accelerators have three main performance dials to turn.
First, since optical processing doesn’t incur inductance and capacitance, the arithmetic units can be clocked at much higher frequencies -– “20 GHz is probably possible,” Harris said.
Second, more than one color of light can be used. Shining multiple colors into the chip can effectively allow processing of multiple inferences simultaneously. This is a way to rapidly increase computing density. Over time, the technology could migrate to shorter wavelengths of light in order to further miniaturize chips.
“Right now we’re at one color and doing just fine,” Harris said. “I think we’re going to go to two colors soon, and that will result in a 2X performance increase, and there’ll be a big boost in the energy efficiency.”
The number of cores also can be increased. As with digital processors, more cores mean more processing power. There are a number of different strategies for this, Harris said, including chiplets.
Lightmatter has also been targeting communications bottlenecks. The startup has developed a wafer-scale device, Passage, which is effectively a programmable photonic interconnect through which chips can communicate at high speed. Up to 48 heterogeneous compute chips (CPUs, GPUs, memory, ASIC accelerators, etc.) would be mounted on the wafer-scale device using standard chip-on-wafer technologies; high bandwidth communication is then enabled optically through the silicon via photonic elements including lasers, modulators and photodetectors within Passage.
The 8” x 8” silicon device fabricated at GlobalFoundries packs 40 photonic lanes into the same space required for each optical fiber.
Passage is a combination of switch and interconnect. Being reconfigurable means connections can be made or broken without an engineer having to physically plug and unplug optical fiber cables in the data center.
“Imagine you have a data center and you want to do AI training,” Harris said. “We could take arrays of, for example, Google’s TPUs, or Nvidia’s GPUs, or Lightmatter’s processors, and put them on top of Passage, increase the interconnect bandwidth by a factor of a hundred, and reduce the interconnect energy costs by 10X. You’d start to see these racks get collapsed into Passage platforms.”
The goal is a configurable-topology supercomputer that can take advantage of optical interconnects without the packaging cost and complexity that usually comes with optical devices or the yield issues that come with attaching optical fiber to chips.
With 40x the bandwidth compared to optical fibers, Passage’s silicon photonics channels also could be used increase data rates, or conversely, lower the data rate and run in parallel –- since the energy required scales non-linearly, yielding significant energy savings.
Founded in 2017, Lightmatter has so far raised a total of $113 million. The latest round was led by Viking Global Investors with participation from Hewlett Packard Enterprise, Lockheed Martin and SIP Global Partners plus returning investors GV, Matrix Partners, and Spark Capital. The company employs 75 people with its headquarters in Boston and an office in Mountain View, Calif.
Samples of Envise will be available by the end of 2021.
This article was originally published on EE Times.
Sally Ward-Foxton covers AI technology and related issues for EETimes.com and all aspects of the European industry for EETimes Europe magazine. Sally has spent more than 15 years writing about the electronics industry from London, UK. She has written for Electronic Design, ECN, Electronic Specifier: Design, Components in Electronics, and many more. She holds a Masters’ degree in Electrical and Electronic Engineering from the University of Cambridge.