Core 2 Duo E6850 VS. Quad 4 Q6600

HERE IS A TID BIT TO HELP>>>

As X86 architecture is now fully in its 64-bit phase and who would've thought of it in 1980, looking at the clumsy 808, it can address lotsa memory if given a chance to - surely that's good for supercomputing, large databases, future super-duper games or 3-D simulations.

The current dual-CPU Xeon64 chipsets provide for four to eight slots of RAM, which, if using the 2GB registered DDR2-400 DIMMs, provide you with up to16 GB on-board memory - not bad for a start. For dual Opterons, same story - if using 4GB registered DDR DIMMs, you get 32 GB RAM right now. On quad-Opterons and quad-CPU Potomac XeonMP systems, there are usually four channels of memory, each with four DIMM sockets, unless some kind of bridging is used to enhance the memory capacity at the cost of higher latency, so the capacity doubles.

But what if we need even more memory, yet no more CPUs? After all, many large computing jobs may be happy with certain fixed computing power, but as large as possible RAM - database searches, proteonics, high-resoluting weather models or computational chemistry come to mind.

On Xeons, well, we could either make memory controllers with more channels, use bridges translating one memory channels into two, or wait for FB-DIMM generation with more channels anyway.

What about Opterons? The integrated dual-channel memory controller limits you to four DIMMs if using DDR400 timing, or up to eight DIMMs with DDR333 / 266 timing (see HP Proliant DL585). This way, a four way Opteron could have 64 GB of DDR400 or 128 GB of slower DDR memory on board. Then?

Well, each 8xx series Opteron CPU has three HT channels (currently supported at 1 GHz for 8 GB/s data rate per channel). In a quad-CPU configuration, let's say two channels go to the two neighbouring CPUs, so one channel is free on each CPU. Let's say then that one channel on CPU 0 and one channel on CPU 2 go to the I/O through respective PCI-X and PCI-E HT bridges and tunnels (sounds as if we're talking about a highway). This gives us 16 GB/s of total I/O bandwidth, more than enough for any current dual-GPU workstation, server or even 'distributed shared memory' tight cluster wit, say, multiple Quadrics rails.

So, one channel on CPU 1 and another on CPU 3 stay free - 16 GB/s of unused bandwidth. What if those 2 channels could connect to a large daughtercard (maybe on a dual-channel HTX slot format) with a nice memory controller circuitry that takes in those 2 HT channels on one side, and provides an extra eight 64-bit buses of DDR2-400 memory, for instance? That gives us an extra 32 DIMM sockets - with 4 GB DIMMs, it is an extra 128 GB RAM, and if using bridges/translators, you could further double the number of channels and DIMMs, to a total of 256 GB extra RAM, on top of the usual on-board memory.

Now, this memory would naturally have higher access latency for the on-board CPUs compared to their own RAM (probably an extra ~200 ns), but the bandwidth would be about the same, in fact two CPUs could access such RAM bank in parallel at full HT speed without contention due to so many channels. If insisting on latency reduction, a local SRAM cache of say 64 to 128 MB could be optionally used to face the two HT channels.

Any of the four CPUs on-board would need a maximum of 2 HT hops to reach the memory controller on the daughtercard, so, in an optimised design, the speed penalty would be low enough to treat this extra memory as a linear extension of main RAM, without the NUMA-ish "near" and "far" memory tricks required. An optimised quad-socket (up to 8 CPU cores) Opteron board with good cooling could fit this daughtercard on top of the motherboard, and still have the whole thing comfortably within a 3U chasis.

In the near future, with new Opteron sockets, and more & faster HT 2.0 channels (after all, AMD could easily put up to 6 HT channels on a next-generation high-end Opterons for greater SMP, I/O and memory scaling), this approach would make even more sense.

And for now, just imagine, 192 GB RAM with very respectable bandwidth in a standard 3U quad-CPU box! A great deal for memory-intensive HPC or database clusters, and hey, this much RAM will probably be enough even for the near-future 64-bit MS Office too, no matter how bloated that one is expected to be... µ

were still on 2 channel MB so ram addressing is split into halfs......

the quad will have 2 channel with 4 threads

each cpu will grab memory from the shared channel..

if you have 2g on a 2 channel board...

2 cpu will have to share addressing from 1 g or ram

If the cpu is on a 4 channel board the will not have to share

I think there is a diagram and it will show the 1X00000000 addressing and the assignments....

its going to be eaisr than writing a book. and will show for 2 and 4 channel...

but pay attention to this paragraph

The current dual-CPU Xeon64 chipsets provide for four to eight slots of RAM, which, if using the 2GB registered DDR2-400 DIMMs, provide you with up to16 GB on-board memory - not bad for a start. For dual Opterons, same story - if using 4GB registered DDR DIMMs, you get 32 GB RAM right now. On quad-Opterons and quad-CPU Potomac XeonMP systems, there are usually four channels of memory, each with four DIMM sockets, unless some kind of bridging is used to enhance the memory capacity at the cost of higher latency, so the capacity doubles.

so i can show you this in a graph....

Walmart Security