Sun Microsystems SPARC T5220 User Manual

Page 6

Advertising

The Evolution of Chip Multithreading (CMT)

Sun Microsystems, Inc.

estate to build increasingly complex processors, with instruction-level parallelism (ILP)

as a goal. Today these traditional processors employ very high frequencies along with a

variety of sophisticated tactics to accelerate a single instruction pipeline, including:

• Large caches

• Superscalar designs

• Out-of-order execution

• Very high clock rates

• Deep pipelines

• Speculative pre-fetches

While these techniques have produced faster processors with impressive-sounding

multiple-gigahertz frequencies, they have largely resulted in complex, hot, and power-

hungry processors that are not well suited to the types of workloads often found in

modern datacenters. In fact, many datacenter workloads are simply unable to take

advantage of the hard-won ILP provided by these processors. Applications with high

shared memory and high simultaneous user or transaction counts are typically more

focused on processing a large number of simultaneous threads (thread-level

parallelism, TLP) rather than running a single thread as quickly as possible (ILP).

Making matters worse, the majority of ILP in existing applications has already been

extracted and further gains promise to be small. In addition, microprocessor frequency

scaling itself has leveled off because of microprocessor power issues. With higher clock

speeds, each successive processor generation has seemingly demanded more power

than the last, and microprocessor frequency scaling has leveled off in the 2-3 GHz range

as a result. Deploying pipelined Superscalar processors requires more power, limiting

this approach by the fundamental ability to cool the processors.

Chip Multiprocessing with Multicore Processors

To address these issues, many in the microprocessor industry have used the transistor

budget provided by Moore's Law to group two or even four conventional processor

cores on a single physical die — creating multicore processors (or chip multiprocessors,

CMP). The individual processor cores introduced by many CMP designs have no greater

performance than previous single-processor chips, and in fact, have been observed to

run single-threaded applications more slowly than single-core processor versions.

However, the aggregate chip performance increases since multiple programs (or

multiple threads) can be accommodated in parallel (thread level parallelism).

Unfortunately, most currently-available (or soon to be available) chip multiprocessors

simply replicate cores from existing (single-threaded) processor designs. This approach

typically yields only slight improvements in aggregate performance since it ignores key

performance issues such as memory speed and hardware thread context switching. As

Advertising

This manual is related to the following products: