Application architectures for image processing
While not difficult, there can be resource, timing, and other issues that may frustrate a first-time user. New platforms and compilation techniques have made it easier (but not easy) to offload and reconfigure software code to run in hardware. This article is written by experienced professionals in the tools, IP, and platform side of FPGAs—it is aimed at software developers interested in sampling software-to-hardware compilation. It works to explain the architectural choices that need to be made and the HLL (high level language) tools approach to software-to-hardware compilation.
The disruption of microprocessor-only architectures appears to be increasing. The growth of density, speed, and portability of video/imaging systems is taxing the ability of conventional processors to keep up. GPU, FPGA, and ASSP alternatives are knocking off certain design types—for good reason. At the "bleeding edge" of frame rates and resolution, the amount of math processed is taxing system architectures. Parallel processing is a theoretical solution if you look at the math. FPGAs are safe, non-exotic parallel processors, mostly within the budget and skill levels of most teams. Software developers are used to HLLs, but often cannot practically use 100% of FPGA features. Using HLL-to-FPGA cross-compilation can often populate gates quicker than hand coding. And HLLs may do things "smarter", although a brilliant hand coder will always beat an average HLL coder in terms of Quality of Results (QoR), but there are thousands of the latter and only a handful of the former.
Different types of system architecture are emerging, including SoC (System-on-Chip), CPU co-processing, and "line-speed" processing.
Real-time video processing
In this architecture, code attempts to run in real-time; i.e., it goes out as fast as it comes in. A typical application is image analysis, such as 4K TV where no "judgement" is applied against the image. The architecture is more stream oriented. Pure video-in to video-out. Performance is achieved by designing optimal filtering processors in FPGA fabric and then just creating enough of them to achieve the necessary throughput. Typically, clock speeds (and therefore heat and power) remain relatively low.
Real-time video analysis is a variant of the above, but typically is an offload architecture. Again, the application is taking video-in and may, or may not, be passing video-out as close to real-time as possible. However, in the middle of the flow is an application that makes some sense of the image.
In the case of a UAV (unmanned aerial vehicle) or machine (vision) inspection, the application may perform object recognition. By means of a transform, convolution, or filter there is some type of manipulation. Pipelining is almost invariably used, so there are some latencies. It's also safe to assume that there is a lot of data to handle and no time to store it for later use.
Common configurations in deploying FPGAs include at least three general categories, largely depending on skill level and task type. These categories are as follows:
1. FPGAs as a true "field of gates"
2. FPGAs + special function blocks such as DSPs
3. FPGAs + special function blocks such as DSPs + an on-board MCU
Each has its role in accelerating image processes. The field of gates works great when it's a consistent design type which can be highly parallelized. Starfield analysis falls into this. DSP blocks can add simple processing but the most typical use is to offload a CPU using the Communicating Sequential Processes (CSP) model.
Figure 1: Communicating sequential processes (CSP) model.
|Related Articles||Editor's Choice|
|Related Articles||Editor's Choice|