C states, c1e on remote memory access – Dell PowerEdge 1655MC User Manual

Page 22

Advertising
background image

Optimal BIOS settings for HPC with Dell PowerEdge 12

th

generation servers

22

The bar graphs plot the performance with Logical Processor enabled when compared to Logical
Processor disabled. The 16-server cluster has a total of 256 physical cores. With Logical Processor
enabled, tests were performed using only 256 cores as well as all the logical 512 cores. The
scheduling of cores was left to the Operating System. The text in the bar indicates the percentage
of additional power consumed with Logical Processor enabled. The secondary y-axis plots the
energy efficiency of Logical Processor enabled relative to disabled. A marker value higher than one
indicates that Logical Processor enabled had better energy efficiency by that much.

From the graph it is seen that the impact of this option is very application specific. For HPL and
WRF, this option should be disabled.

For the benchmark case tested, NAMD performance with Logical Processor enabled and 256 cores is
similar to the performance with Logical Processor disabled; however, the energy efficiency is
better with this setting disabled. Recall that these results are plotted with rating as the
performance metric. NAMD results are typically reported as days/ns. Using rating as the metric as
shown in Figure 8, it appears that the test with 512 cores and Logical Processor enabled does
significantly worse in terms of performance and energy efficiency. However, it was noted that
when the metric used for comparison is days/ns, Logical Processor enabled with 512 cores performs
12% better compared to Logical Processor disabled. As noted in the text in the bar, Logical
Processor enabled at 512 cores consumes 20% more power than when this setting is disabled.

MILC and LU benefit with Logical Processor enabled when all 512 cores are used. With only 256
cores in use, Logical Processor should be disabled for better performance and energy efficiency.

Fluent shows similar performance with 256 cores irrespective of whether Hyper-Threading is
enabled or not; however, the energy efficiency is better with Hyper-Threading disabled. A 9-11%
performance improvement is measured with 512 cores, but the energy efficiency for the
truck_poly_14m case is 7% lower (0.93).

For applications (like Fluent) that have core-based licenses, the increased license costs with more
cores typically outweigh the marginal benefits of Logical Processor. In general it is recommended
that Logical Processor be disabled. The benefit of this feature should be tested for the specific
applications the cluster runs and tuned accordingly.

5.6.

C States, C1E on remote memory access

Section 3.1 describes the C States and C1E BIOS options and Figure 4 shows the impact of these
options on power consumption when the system is idle. Options that conserve power tend to have
an associated performance penalty as the system transitions in and out of sleep states.

This section examines the performance impact of C States and C1E on memory bandwidth when all
processor cores are not in use (i.e., some are idle). Idle cores with C States and C1E enabled means
that the cores are likely to be in a sleep state. On Sandy Bridge-EP based systems, when the cores
clock down, the other components on the processor chip, called the uncore, also clock down. This
saves power when the cores are idle but translates into lower memory bandwidth for remote
accesses since the remote memory controller is running at a slower frequency. Figure 9 graphs this
performance impact on remote memory bandwidth.

Advertising