This is pretty interesting considering your basic CPU does something like 30GFLOPS (something around 16 GFLOPs per POWER 6 cores, 10GFLOPS for a Itanium cores). A cell board like this does 180GFLOPs.
(Don't take this is a benchmark or anything. This is just some RAW data).
Some NVDIA something like 500 (technically the G80 has 128 fp32 ALUs @ 1350MHz with MADD - about 350 GFLOPs), a R600 is supposed to have like 500 and a Realizm 800 (Dual Wildcat VPUs) about 700 GFLOPS :-). So yeah, with 16 or so of these cards used right, you could score yourself a place on TOP500 SuperComputers. "Hey, my 4 graphic stations can beat your 1000-node Xeon cluster!".
And this is no joke, since GF8 series and the whole NVIDIA CUDA thing, NVIDIA has also started making... erm.. servers.
NVIDIA Tesla S870 GPU computing system peaks something like 2TFLOPS.
While one of those "low powered MIPS 64 CPU's" in the SiCortex, about 1GFLOP :-). But they have clusters of up to 5832.
PCI-E Cell accelerator board:
- Cell BE processor at 2.8 GHz
- More than 180 GFLOPS in PCI Express accelerator card
- PCI Express x16 interface with raw data rate of 4 GB/s in each direction
- Gigabit Ethernet interface
- 1-GB XDR DRAM, 2 channels each, 512 MB
- 4 GB DDR2, 2 channels each, 2 GB
- Optional MultiCore Plus™ SDK software
A WildCat 800:
There's an awesome potential HPC market here... GPUs, Playstation 3s with Cells, Cell PCI-E cards... exploited properly, it can make some pretty fast clusters. See Folding@Home for example where where GPUs count for 58.3 and PS3's count for 18.1 average computations per client.