Advantages of fused CPU-GPU ISA include the ability to implement a standard graphics pipeline in microcode, provide support for custom shaders and implement ray-tracing extensions...
A group of enthusiasts are proposing a new set of graphics instructions designed for 3D graphics and media processing. These new instructions are built on the RISC-V base vector instruction set. They will add support for new data types that are graphics specific as layered extensions in the spirit of the core RISC-V instruction set architecture (ISA). Vectors, transcendental math, pixel, and textures and Z/Frame buffer operations are supported. It can be a fused CPU-GPU ISA. The group is calling it the RV64X as instructions will be 64-bit long (32 bits will not be enough to support a robust ISA).
The world has plenty of GPUs to choose from, why this? Because, says the group, commercial GPUs are less effective at meeting unusual needs such as dual-phase 3D frustum clipping, adaptable HPC (arbitrary bit depth FFTs), hardware SLAM. They believecollaboration provides flexible standards, reduces the 10 to 20 man-year effort otherwise needed, and will help with cross-verification to avoid mistakes.
The team says their motivation and goals are driven by the desire to create a small, area-efficient design with custom programmability and extensibility. It should offer low-cost IP ownership and development, and not compete with commercial offerings. It can be implemented in FPGA and ASIC targets and will be free and open source. The initial design will be targeted to low-power microcontrollers. It will be Khronos Vulkan-compliant, and over time support other APIs (OpenGL, DirectX and others).
The final hardware will be a RISC-V core with a GPU functional unit. To the programmer it will look like a single piece of hardware with 64-bit long instructions coded as scalar instructions. The programming model is an apparent SIMD, that is, the compiler generates SIMD from prefixed scalar opcodes. It will include variable-issue, predicated SIMD backend, vector front-end, precise exceptions, branch shadowing and much more. There won’t be any need for RPC/IPC calling mechanism to send 3D API calls to/from unused CPU memory space to GPU memory space and vice-versa, says the team. And it will be available as 16-bit fixed point (ideal for FPGAs), as well as 32-bit floating point (ASICs or FPGAs).
The design will employ the Vblock format (from the Libre GPU effort):
The design will employ scalars (8-, 16-, 24- and 32-bit fixed and floats), as well as transcendentals (sincos, atan, pow, exp, log, rcp, rsq, sqrt, etc.). The vectors (RV32-V) will support 2-4 element (8-, 16- or 32-bits/element) vector operations, along with specialized instructions for a general 3D graphics rendering pipeline for points, pixels, texels (essentially special vectors)
Matrices will be 2 × 2, 3 × 3, and 4 × 4 matrices will be supported as a native data type along with memory structures to support them for attribute vectors and will be essentially represented in a 4 × 4 matrix.
Among the advantages of fused CPU-GPU ISA is the ability to implement a standard graphics pipeline in microcode, provide support for custom shaders and implement ray-tracing extensions. It also supports vectors for numerical simulations with 8-bit integer data types for AI and machine learning.
Custom rasterizers can be implemented such as splines, SubDiv surfaces and patches.
The design will be flexible enough that it can implement custom pipeline stages, custom geometry/pixel/frame buffer stages, custom tessellators and custom instancing operations.
The RV64X reference implementation will include:
The design is meant to be scalable as indicated below.
The RV64X design has several novel ideas including fused unified CPU-GPU ISA, configurable registers for custom data types, and user-defined SRAM based micro-code for application-defined custom hardware extensions for:
The same design serves both as a stand-alone graphics microcontroller or scalable shader unit, and data formats support FPGA-native or ASIC implementations.
Why is there a need for open graphics?
The developers think most graphics processors cover the high end such as gaming, high-frequency trading, computer vision and machine learning. They believe the ecosystem lacks a scalable graphics core for more mainstream applications for things like kiosks, billboards, casino gaming, toys, robotics, appliances, wearables, industrial human-machine interfaces, infotainment and automotive gauge clusters. Meanwhile, specialty programming languages must be used to program GPU cores for OpenGL, OpenCL, CUDA, DirectCompute and DirectX.
A graphics extension for RISC-V would resolve the scalability and multi-language burdens enabling a higher level of use case innovation.
This is a very early spec, still in development and subject to change based on stakeholder and industry input. The team will establish a discussion forum. An immediate goal isbuilding a sample implementation with instruction set simulator, an FPGA implementation using open-source IP and custom IP designed as open-source project. Demos and benchmarks are being designed. Developers interested in participating should contract Atif Zafar.
As for the Libre-RISC 3D GPU, the organization’s goal is to design a hybrid CPU, VPU, and GPU. It is not, as widely reported, a “dedicated exclusive GPU.” The option exists to create a stand-alone GPU product. Their primary goal is to design a complete all-in-one processor SoC that happens to include a Libre-licensed VPU and GPU.
The population of GPU suppliers is increasing. We now have over a dozen.
|Apple||Libre-RISC-V 3D GPU||Qualcomm|
An application not listed as a potential user of a free, flexible, small GPU includes crypto-currency and mining.
If it is the goal of the RISC-V community to emulate the IP suppliers such as Arm and Imagination, then we can expect to see DSP, ISP and DP designs. There is at least one Open DSP proposal; perhaps it can be brought into the RISC-V community.
It will take at least two years before any hardware implementations emerge. One of the most logical candidates for adopting this design is Xilinx, which is now using Arm’s Mali in its Zynq design. We would also expect to see several implementations come out of China.
— Jon Peddie, a pioneer in the graphics industry, is president of Jon Peddie Research.