Erm. if your not technically minded, probably should just click back.
I'm trying to figure out the theoretical processing differences between Cell and the PowerPC 970FX.
Both briefs and technical specifications are available on IBM.com right now, so again if your not willing to look (even remotely) then should press back. I know some people might be just as interested in this stuff as I am.
From what I understand the Cell and 970FX are clearly both still 64-bit Processors.. yet the architecture confuses me a little. As I can tell each SPE in the Cell is basically a single processing unit.
Like an ALU (WriTek Math Co-Processor springs to mind). It has no direct memory access, in-fact it seems to have no memory access past it's Cache. It does have 128 Registers 128-bit Wide.
They are SIMD, with the maximum size is 2x 64-Bit Floating Point, or 4x 32-bit Integer. It doesn't seem to deal with Quad Words, which is a little odd.
The thing is that each SPE in turn can do 4 Instructions Per Cycle Per Thread. So no matter the complexity, this equates to around 32 Instructions in total.
When you compare this to the 970FX Core which has a total of 128x 128-bit Registers, divided into 64x Float-Point, and 64x General (Integer or Floating-Point) The overal difference is minimal in terms of how much data can be processed in a single thread.
So your still looking at 16 Instructions Per Cycle Per Thread, from what I understand atleast. This means a Dual-Core variation should be able to match the theoretic speed, no?
If you look at the VMX design.
970FX contains 10 (9 Effective) Operations, with 2x VMX Units.
Cell contains 8 SPE, 4 Operations, with 1x VMX Unit.
So we're looking at what..
970FX Capable of 20 (18) Vector Operations Per Cycle
Cell Capable of 32 Vector Operations Per Cycle
at what let's say both at 2.0GHz this would mean, roughly
970FX : 84 (75) GFlops Theoretical
Cell : 134 GFlops Theoretical
Seems about right. The overal processing power here seems to be about 2x though. Something else I'm kinda wondering about is considering these are purely Math operations, the logic is still done by the CPU right?
So Logic-wise software would run no differently on the Cell Host CPU as it would on the standard version, actually given that the host needs to work as the data manager; this would mean it would have more limited capabilities in terms of logic processing, no?
From what I can tell, a Dual Core should be about to achieve a similar processing speed. Provided it's clocked the same.
So it's more than likely that the 940 Triple Core should outperform the Cell at an equal speed by around 1:3rd the speed?
Maybe I read the chart wrong though. It could be it only has 1x VMX, if this is the case then your looking at a Triple Core being able to effectively equal speed; yet it's logic would be far quicker.