Thursday, January 15, 2009

No more 1000 cores on a chip for you

Really? No more exponential growth on performance forever? You mean, no more exponential growth forever such as that of real estate, or the Dow Jones in 2008, or Nasdaq as in 2001? Apparently yes: too many cores and the memory subsystem cannot deal with the enormous load, which in turn leads to poor performance.

How many is too many? Anything past about 8.

What to say... I am not surprised. Interestingly, however, message passing is seen as useful in this context.

3 comments:

Dale said...

Andres, I thought the limit would have been more like...um...10?

Dale

Peter William Lount said...

I'd take the reports of the low ceiling limit on the number of cores with a grain of salt, er, sand. I/O pathway design/capabilities makes a big difference.

The Tile64 chip already has 64 cores and is a little speed demon. They expect their architecture to expand to over 1,000 cores per chip.

Larrabee also beats the linked downbeat article's assumptions with almost linear scaling to 32+ cores and 90% to 48 cores.

The mesh network between cores within the Tile64 chip supports message passing to a new level in hardware systems. In addition the cores can bypass sending data to memory and pass it as a message packet to other cores in the mesh. To top it off the chip has an insane amount of I/O capability built in for communicating with the outside world. A possible design archetype of what will be in mainstream chips?

I/O pathways will need to continue to expand along with processing performance and the number of cores. New pathway designs will make a difference as Tilera and Intel are demonstrating.

Andres said...

Peter,

Reading about Larrabee shows that the linear performance increase is due to the highly parallelizable problem domain, i.e.: graphics. This is no different than our current GPU approaches where we throw dozens of pixel pipes at the problem to speed up rendering.

However, this does not necessarily mean that every single problem can be cut into little pieces that require little data and little communication between cores in a way that will avoid the associated overhead. I would expect simulations with a huge number of interacting particles to be one such problem.

Andres.