Developing heterogeneous multi-core SoC for mobiles
There are a couple of broadly dissimilar classes of applications that can be crudely classified as high performance computing (HPC) and consumer. The HPC apps may feature long simulations of very large data sets with extreme precision and accuracy requirements, while consumer apps may feature much less stringent accuracy requirements but operate in real time or near real time while still being able to handle relatively large data sets.
The mobile context is dominated by video-rate apps requiring manipulation or analysis of visual data at a low level with a relatively small amount of higher-level code. These apps are inherently heterogeneous in the sense that they contain layers of functions which can be divided between the CPU array and the GPU (which is classed as a single core but in fact consists of a large array in itself) and can therefore achieve best efficiency, meaning higher frame rate or lower power or more responsiveness—or all three, by being distributed across the available resources.
One of the consequences of the emergence of this class of applications is that the purpose and nature of the camera pipeline (ISP) is changing from being primarily aimed at image production to being redefined as a vision processor, usually as part of a heterogeneous trio in cooperation with the CPU and GPU. Application examples include video conferencing with face beautification, where the majority of the workload can either be handled by the GPU or shared between the GPU and the ISP. Video encoding can be a CPU task or can be offloaded onto dedicated encoder hardware that we call a Video Processing Unit (VPU). In this scenario, the objectives are to maintain consistent frame rates while simultaneously keeping to a power budget appropriate for extended use on a mobile device.
A retail analytics application from Vadaro, while using broadly similar low-level tasks, shows another requirement, which is to run multiple kernels (for multiple customer detection) simultaneously on the GPU, while in another app, Find Exact/Find Similar, the CPU is left free for database searching and results manipulation by delegating the vision-specific tasks to the GPU.
Figure: A vision platform bringing together ISP, GPU, VPU, and CPU technologies.
These three outcomes—higher frame rate, lower power, and free CPU cycles—are the primary benefits sought by mobile developers and are available through heterogeneous multi-core. But how can this be quantified and what can be achieved by all apps?
Three application examples give us some data points. A basic image filtering application, run on a dual CPU system with a PowerVR SGX540 GPU (figure) shows that by moving the vast majority of the work to the GPU a performance gain of 95% with a power reduction of 25% can be achieved.
|Related Articles||Editor's Choice|