Supernova collapse simulated on a GPU
Visualization plays a key role in helping scientists understand large amounts of information. This role can range from the discovery of problems within the data that may not necessarily be evident in basic numerical studies to the development of new hypotheses and the presentation of results. While advances in computational power have led to the discovery and understanding of many phenomena, the available computing resources are often unable to process data sets of these sizes in an efficient, interactive manner. Unfortunately, this limitation, coupled with ever-increasing data, often leads to situations in which results are inadequately explored.
However, over the last several years, driven primarily by the entertainment industry, commodity graphics hardware has seen rapid enhancements in terms of performance and programmability. Performance improvements have been significant enough that the graphics processor now has more computing power and memory bandwidth than the CPU. This has led to our study of techniques that leverage the power of the GPU for improving the performance of visualization applications and for general-purpose computation.
As part of the Scout project toward a hardware-accelerated system for quantitatively driven visualization and analysis, we have devised a software environment and programming language that lets scientists write simple, expressive data-parallel programs to enable the computation of derived values and direct control of mapping, from data values to the pixels of a final rendered image.
This is all accomplished within an integrated development environment that provides on-the-fly compilation of code and the interactive exploration of the rendered results. Scout has achieved improved computational rates that are roughly 20 times faster than a 3GHz Intel Xeon EM64T processor without the use of streaming SIMD extensions, and approximately four times faster than SIMD-enabled, fully optimized code. As an example of what can be accomplished in this environment, the rendered results were modeled on two ranges of computed entropy values from a core-collapse supernova simulation produced by the Terascale Supernova Initiative.
The first entropy range was partially clipped away to reveal the turbulent structure of the supernova's core, and the second (more transparent) entropy range isolated the details of the shock front.
Both ranges of entropy values were colored by the corresponding velocity magnitude values within the simulation. The entropy and velocity magnitude values, which were stored on a 256 x 256 x 256 computational grid, were computed in approximately 0.22s using an Nvidia Quadro 3400 card.
Although those results are promising, several challenges remain for the successful use of the GPU as a general-purpose computational resource. There are four major disadvantages.
The first is that the task of moving data between the CPU and the GPU can be sufficiently time-consuming to overwhelm the advantages of the GPU's computational power. The introduction of PCI Express has the ability to greatly reduce the impact of this limitation. This will, however, require a commitment from the graphics hardware vendors to fully utilize the capabilities of the new interface.
The second disadvantage is that developing software for the GPU can be complex compared to programming the CPU. This is primarily due to a restrictive programming model, a lack of virtualization of hardware resources and the need to map algorithms into the graphics-centric and data-parallel form required by the hardware and the supporting graphics API.
The Scout language hides a large portion of these issues from the end-user, but the hardware restrictions are still of considerable concern and are an area of active research.
Another disadvantage of the GPU is a lack of floating-point precision. Although the latest hardware from Nvidia now supports a partial IEEE 32bit floating-point format, hardware from ATI is limited to 24bits of precision. It is likely to take years before double-precision floating-point values will be supported in graphics hardware. And it is also possible that they will never be supported. This limitation can have a substantial impact on certain calculations.
The final disadvantage is the relatively small memory sizes available on the graphics card. The current memory sizes range from 128MB to 640MB, which are clearly not adequate to process large data sets in an interactive fashion.
Despite those disadvantages, we believe that the performance numbers, the rapid rate of innovations from the graphics hardware vendors and the recent announcement of support for multiple GPUs in a single desktop system show that the study of the GPU's impact on general-purpose computing is a viable area for continued research.
Also, the GPU can provide scientists with a substantial resource for their desktop systems that can be leveraged to provide interactive data exploration and analysis. We are actively exploring the use of hundreds of GPUs in parallel, within a cluster-based environment, to address memory limitations and explore the scalability of such systems.
Finally, working with GPUs can provide insight into the future of computer architectures. Streaming architectures and the growing trend by leading CPU vendors of using multicore and multithreaded processors suggest that more parallelism may be available on future commodity systems. In particular, it seems reasonable to predict that GPU-like cores will be found in the CPUs of the future, or that future GPUs will acquire more general-purpose functionality.
- Patrick McCormick
Researcher , Advanced Computing Lab
Computer and Computational Sciences Division
Los Alamos National Laboratory
|Related Articles||Editor's Choice|
|Related Articles||Editor's Choice|