The Future is GPGPU
Since the early 1990s the x86 line of CPUs has dominated the computer world. Although it has hardly entered popular consciousness that is all changing. A revolution is afoot, as significant as the change from mainframes to clustered or networked systems some twenty years ago. It is time to welcome our new GPU overlords who, in the November 2010 list of the world's supercomputers, have taken 1st, 3rd and 4th position. Taking the number #1 position the Chinese "Tianhe-1A" achieved a peak computing rate of 2.507 petaFLOPS, a massive 43% faster than the former leader, the CPU-based Jaguar. Not only that it achieved it with approximately 25% less processors. Indeed, if the "Tianhe-1A" was to achieve what it did with a traditional CPU-based architecture, it would have taken twice the floorspace and three times the power and an additional 50,000 CPUs according to Nvidia.
Like most innovations, GPUs have precursor technologies, which provide a context to understand their development. In this particular case the historical precedent was early forms of video controllers or video cards. However it is not the graphics processing as such that is important in this case, but rather the capacity for such processors to perform floating point operations through their own internal parallel architecture. This gives rise to the field of General Purpose Computing on Graphics Processing Units, or GPGPUs.
However, not all is sweetness and light. There is still a dearth of software written that utilised GPUs, but where it has been written (e.g., the CUDA version of NAMD) it is very good; indeed blazingly fast. So whilst software development needs to be undertaken in there, the hardware developments is making it clear as day where the future lies; Robert Faber, in characteristic bluntness, spells it out:
Legacy applications and research efforts that do not invest in multi-threaded software will not benefit from modern multi-core processors, because single-threaded and poorly scaling software will not be able to utilize extra processor cores. As a result, computational performance will plateau at or near current levels, placing the projects that depend on these legacy applications at risk of both stagnation and loss of competitiveness.... To put this in very concrete terms, any teenager (or research effort) from Beijing, China, to New Delhi, India, can purchase a teraflop-capable graphics processor and start developing and testing massively parallel applications.
Of course, GPUs are invariably on-board or (usually better quality) a separate card with their own dedicated memory and so forth. It should be obvious that this implies some additional latency and processing. So perhaps it comes to no surprise to discover that using ARM RISC architecture, that Nvidiais going to produce a "Fusion architectural approach that marries x86 CPUs with ATI GPUs on-chip."