YOKOHAMA, Japan — ITD Lab, a 20-person startup based here, is pitching “Intelligent Stereo Camera (ISC)” technology for applications ranging from ADAS, autonomous vehicles, and drones to construction machines and industrial robots.

Keiji Saneyoshi
Keiji Saneyoshi


ITD Lab’s secret weapon is CTO Keiji Saneyoshi, a father of Subaru’s EyeSight. Saneyoshi was the principal engineer behind Subaru’s original stereo vision system, “Stereo Range Imager.” Saneyoshi’s team at Subaru then developed the first driver-assist system to use only stereo cameras to detect objects such as vehicles, pedestrians, cyclists, and motorcyclists.

After leaving Subaru in 1998, Saneyoshi slipped into academia as a professor at Tokyo Institute of Technology (TIT) until early 2017. It was at TIT where he and his team further advanced the original stereo-vision algorithms. They founded ITD Lab in May 2016.

Advanced stereo-vision technology is clearly Saneyoshi and ITD Lab’s forté. The technology offers “distance estimation” — which a mono camera alone can’t do.

However, with sensor fusion being the current rage in the automotive industry, ITD Lab’s push for stereo vision for ADAS might seem counter-intuitive. Although stereo vision might have once been seen as a differentiating technology critical to ADAS, many Tier Ones and automakers focused on highly automated vehicles tend to regard stereo vision as less relevant. A consensus has developed that Monovision’s limitations have been solved by combining a mono camera and radar.

Saneyoshi acknowledges the advancements in radar-monovision. However, he remains confident that ITD Lab’s stereo vision, without having to fuse radar data, can deliver superior performance in 3D distance measurement, self-location recognition, collision prevention and object tracking, lane recognition, and road boundary recognition at a speed of 60 frames per second and up to 160 frames per second.

Akihiro Ogura, president of ITD Lab and a former student of Saneyoshi, told us that other advantages include “automatic calibration” by absorbing manufacturing errors and attachment errors in installation. The goal for ITD Lab’s technology is to offer “the utmost safety” at a lower cost, noted Ogura. His company’s solution, used for ADAS, will be less costly than that of Intel/Mobileye, he added.

ITD Lab provides its advanced stereo-vision algorithms — the startup’s crown jewel — in a black box. The program comes in code stored in ROM, which will be used to boot up Altera’s Cyclone FPGA.

>ITD Lab demonstrates its Intelligent Stereo Camera unit, held on the left. (Photo: EE Times)
ITD Lab demonstrates its Intelligent Stereo Camera unit, held on the left. (Photo: EE Times)



EyeSight pedigree
Although a startup, ITD Lab faces few credibility issues thanks to a technology pedigree associated with Subaru’s EyeSight.

Phil Magney, founder and principal at VSI Labs, told us, “Eyesight is a pretty solid solution and has been around for some time now.” He added, “Subaru got lots of accolades early on for introducing the forward-facing ADAS solution, which became the cornerstone of Subaru’s safety lineup.”

Based on the Stereo Range Imager that Saneyoshi developed at Subaru, ITD Lab claims further improvements in such areas as automatic calibration, parallax image accuracy, and calculation performance (parallel process optimization).

Some industry observers, however, point out that stereo vision is no longer the only answer to ADAS. Speaking of Subaru’s EyeSight, Mike Demler, a senior analyst at The Linley Group, observed, “Subaru is apparently using the dual cameras for distance estimation in its adaptive cruise control and autonomous braking/throttle control, which is an unusual choice.” He noted, “Radar is more popular for those functions. It’s also more accurate and more reliable [because] it works regardless of lighting conditions and precipitation.”

Acknowledging that the only reason to use stereo cameras is for 3D modeling and distance estimation, Demler speculated that Saneyoshi chose the stereo-vision approach over a camera-radar system because he is just more familiar with it.

Saneyoshi, however, maintains that the stereo-vision approach — by keeping an optical flow throughout its process — can make ADAS simpler and easier because it doesn’t require sensor fusion.

Magney partly agreed with Saneyoshi. “Using stereo vision as the primary sensor and not having to worry about radar,” he said, can allow you to “package all your processing in the stereo-vision module so everything is set up pretty well in advance of deployment.” Nevertheless, Magney added, “It is more common to couple a mono camera with radar to give you reasonable measurement regarding the depth of an object or the movement of an object.”

Indeed, radar is getting better all the time. It can see through weather better than a camera, as Magney pointed out. On the other hand, radar’s biggest problem remains “false positives,” he added.

As momentum builds for sensor fusion, it’s possible that ITD’s moment for stereo-vision technology might have already passed. However, CEO Ogura, while fully aware of the uphill battle that he faces, disagrees. He said: “Give us a chance to demonstrate our technology. You won’t be disappointed.”

Others pursuing stereo vision
Notably, ITD Lab isn’t alone in pursuing stereo-vision technology. Companies such as Autoliv, Subaru/Eyesight, and Ambarella (VisLab) offer stereo-vision solutions. Demler said that AIMotive gave him a demo using stereoscopic cameras for higher-level autonomous driving. Magney also noted that, besides those companies mentioned, “a few Tier Ones are still working on stereo and would support an RFQ for stereo vision even if they don’t currently list it as part of their lineup.”

Comparing its technology with other stereo-vision solutions, Saneyoshi stressed that ITD Lab’s advantage lies in two factors. “Ours offers much clearer object-edge detection, and our processing speed is much faster — roughly 10 times faster than Subaru EyeSight.”

Below is the first image captured by a CMOS image sensor.

Image captured by a CMOS image sensor. (Source: ITD Lab)
Image captured by a CMOS image sensor. (Source: ITD Lab)


Showing the second image captured by competitors’ stereo-vision technology, Saneyoshi pointed out that edges between objects are fudged and more ambiguous, which could result in some objects disappearing into the background.

Image captured by ITD Lab's stereo vision. (Source: ITD Lab)
Image captured by ITD Lab’s stereo vision. (Source: ITD Lab)


The third image, below, captured by ITD Lab’s stereo vision, offers sharper edges on each object, enhancing its ability to avoid objects with steering, he added.

Image captured by ITD Lab's competitor's stereo vision technology. (Source: ITD Lab)
Image captured by ITD Lab’s competitor’s stereo vision technology. (Source: ITD Lab)



Doing it all on SoC
Separating ITD Lab from other computer vision processor companies is Saneyoshi’s insistence on using FPGA for his team’s vision accelerators. He believes that FPGA is not only ideal for ITD Lab’s stereo-vision algorithms but also for running CNN and DNN in the future. “FPGA’s parallel-processing pipeline is made for deep learning,” he noted.

Although ITD Lab today has yet to offer deep learning on its Intelligent Stereo Camera unit, it’s planning to add AI functions on “a low-cost FPGA,” according to Ogura. Macnica/Fuji Electronics, ITD Lab’s distributor, is bringing in the third-party deep-learning solution, according to Macnica/Fuji Electronics’ general manager, Sachihiko Asakura.

Magney said, “Not having AI is not a deal-breaker for ADAS as this is largely a deterministic application. However, for more advanced applications (such as automated driving), you need to be working on AI to stay relevant.”

ITD Lab’s Saneyoshi maintains that stereo-vision processing is ideal for detecting generic objects without training. In contrast, monaural vision can fall short of detecting objects when it encounters something that it has not been trained on, he noted. “For example, Volvo’s self-driving technology reportedly struggled to identify kangaroos in the road,” he said. Kangaroos’ movements in mid-jump confused the mono camera’s vision processing.

Demler disagreed. “Monaural object detection and recognition algorithms are well-proven. Isn’t it interesting that all the ImageNet contests run on mono images?” he asked.

“Neural-network inference engines are classifiers. They calculate the probability that an object is from a pre-trained class. As we now know, the Uber system apparently did detect the woman with the bicycle, but their software is brain-dead.”

Referring to the Uber accident, Saneyoshi countered that the question is whether Mobileye’s vision processor detected the pedestrian soon enough.

To demonstrate the power of today’s ADAS technology, Mobileye ran its software on a video feed from a TV monitor running police video of the incident. In a blog post last March, Intel’s senior vice president, Amnon Shashua, concluded:


Despite the suboptimal conditions, where much of the high dynamic range data that would be present in the actual scene was likely lost, clear detection was achieved approximately one second before impact.

ITD Lab’s Saneyoshi did the same, running his software on a video feed. According to Saneyoshi, ITD’s Intelligent Stereo Camera detected the woman with the bicycle 2.23 seconds prior to the accident.

The first column shows an image captured by a monaural vision processor when it first detected the object at 1.29 seconds prior to the accident. The second column shows images captured by ITD Lab's stereo vision. The third column shows what was seen in a video feed from a TV monitor running police video of the incident.
(Source: ITD Lab)

The first column shows an image captured by a monaural vision processor when it first detected the object at 1.29 seconds prior to the accident. The second column shows images captured by ITD Lab’s stereo vision. The third column shows what was seen in a video feed from a TV monitor running police video of the incident.

ITD Lab is rolling out its Intelligent Stereo Camera chip — a ROM chip that stores a set of FPGA codes designed to process parallax image data — together with a reference design of a circuit board, including optics. At least one large Tier One is evaluating the Intelligent Stereo Camera, according to the company. Asked about the power consumption of the unit, Saneyoshi said “Three watts.”

— Junko Yoshida, Chief International Correspondent, EE Times