Thursday, October 21, 2010

Overclocking Intel’s Xeon E5620: Quad-Core 32 nm At 4+ GHz


Meet the Xeon E5620. This processor runs at 2.4 GHz by default, and it Turbo Boosts up to 2.66 GHz. It’s a quad-core chip, but it retains all of Westmere-EP’s 12 MB shared L3 cache. Hyper-Threading is turned on, so you get up to eight threads in flight at a time, and the chip’s QPI operates at 5.86 GT/s.

From left to right: Core i7-970, Core i7-930, and Xeon E5620From left to right: Core i7-970, Core i7-930, and Xeon E5620

Performance-wise, you can’t expect much out of a 2.4 GHz part. However, Xeons are binned notoriously generously, putting power and reliability ahead of all else. The 32 nm Xeon E5620 has a VID range of .75-1.35 V and sports a modest 80 W TDP.

The only Achilles heel this thing has is an 18x multiplier that’s locked. Oh, were it not for locked ratios, this $389 Xeon would be such a beast. Sigh.

From left to right: Core i7-970, Core i7-930, and Xeon E5620From left to right: Core i7-970, Core i7-930, and Xeon E5620

Nevertheless, I got my hands on a pair of E5620s and shot for the moon, hoping to derive a bit of enthusiast value from a notoriously business-class processor designed for dual-socket servers and workstations. With a bit of help from Asus (not every X58 motherboard supports Xeon CPUs), I managed to build a fast, stable, gaming box that doesn’t require high-end cooling or eyebrow-raising BIOS settings.

The first thing to remember about dropping a 2P processor in a desktop platform is that not every motherboard recognizes Xeon CPUs. Asus’ Rampage III Formula does (the company says it tries to give all of its relevant RoG platforms this capability). I’ve also heard good things about certain EVGA platforms, though I don’t have any of the company’s motherboards in-house to cross-check. Should you decide to follow this path, double-check that your board will, in fact, take a Xeon processor.

Limitations

Now, right upfront, we know that there is a practical limit where Intel’s reference clock, known as the BCLK, gets hung up. That ceiling is generally in the 220 MHz range. Multiply that number by the E5620’s highest supported ratio, 19x, and you get a fairly feasible 4.18 GHz operating frequency. That’d be nearly a 1.8 GHz overclock—not bad…not bad at all.

Zoom

Remember, though, that upping the BCLK from 133 to 220 MHz throws a lot of other frequencies out of whack. Intel arms the Xeon with a handful of divisors to help narrow the range of clock rates you can use, but they’re fairly limited. For example, you’ll only want to use the 800 or 1066 MT/s memory ratios. Similarly, QPI needs to be set to 4.8 or 5.86 GT/s. Although the Asus Rampage III Formula motherboard I used for this experiment supports more aggressive settings, using them prevents the platform from booting. Not that we would have wanted to anyway—setting each field to the lowest possible value gives us the most headroom for an aggressive overclock.

As you start pushing frequencies other than the core clock beyond their specifications, it often becomes necessary to goose voltage levels, too. This will almost always be the case for Intel’s Xeon E5620, Core i7-930, or Core i7-970—the three CPUs on our bench today.

Working Around Them

A 221 MHz BCLK setting was already pushing my Xeon E5620 sample fairly hard for a 4.2 GHz clock rate. Even with the lowest DDR3-800 and 4.8 GT/s ratios set, I was forcing a 7976 MT/s QPI rate. This was actually doable with a 1.4 V CPU voltage, 1.425 V QPI/DRAM voltage, and 1.35 V IOH voltage. Those sound fairly high, but our Xeon processor handled them well, never exceeding 75 degrees Celsius with eight threads active in Prime95.

Stable settings for 4 GHzStable settings for 4 GHz

Dialing in 4.3 GHz required a 226 MHz BCLK setting—well beyond where this board wanted to go. That was an 8156 MT/s QPI data rate, with memory clocked at DDR3-1359 (not a problem for my 2000 MT/s Patriot Sector 7 kit), and a 3625 MHz uncore frequency. At this point, I had to pull out a couple of tricks. An increased PCIe clock (110 MHz) was needed to even boot up. Moreover, the QPI Link Data Rate had to be set to Slow Mode or, again, the machine simply wouldn’t boot. Be careful with PCIe voltage adjustments, though. After bumping up the PCIe frequency and IOH/ICH PCIe voltages, I fried the onboard Intel gigabit Ethernet controller. It simply wouldn't show up in Windows afterward.

Zoom

Cranking the BCLK up to 231 MHz, yielding 4.4 GHz, might have even been viable. Unfortunately, no combination of voltages, differential amplitudes, or clock skews could lock in stability with an 8337 MT/s QPI data rate. Had this CPU been unlocked, though, I’m confident it would have handled 4.4 GHz without a problem.

My goal wasn’t to find the most extraneous settings possible before popping a processor, though. So I dialed things back a bit for this comparison.

Test Hardware
Processors
Intel Xeon E5620 (Westmere-EP) 2.4 GHz, LGA 1366, 5.86 GT/s QPI, 12 MB Shared L3, Hyper-Threading enabled, Power-savings enabled

Intel Core i7-970 (Gulftown) 3.2 GHz, LGA 1366, 4.8 GT/s QPI, 12 MB Shared L3, Hyper-Threading enabled, Power-savings enabled

Intel Core i7-930 (Bloomfield) 2.8 GHz, LGA 1366, 4.8 GT/s QPI, 8 MB Shared L3, Hyper-Threading enabled, Power-savings enabled
Motherboard
Asus Rampage III Formula (LGA 1366) Intel X58/ICH10R, BIOS 0402
Memory
Patriot 6 GB (3 x 2 GB) DDR3-2000, PV236G2000LLK @ 7-7-7-20 and 1.65 V
Hard Drive
Intel SSDSA2M160G2GC 160 GB SATA 3Gb/s
Graphics
Nvidia GeForce GTX 480
Power Supply
Cooler Master UCP-1000 W
System Software And Drivers
Operating System
Windows 7 Ultimate 64-bit
DirectX
DirectX 11
Graphics DriverGeForce 258.96


Zoom

We’re using a single GeForce GTX 480 here in order to take GPU bottlenecks out of the equation.

Also, Patriot’s 2000 MT/s Viper II kit gives us ample headroom for elevated data rates; the Xeon pushed these modules to ~DDR3-1700.

Benchmarks and Settings

Audio Encoding

iTunes

Version: 10.0.1 (64-bit), Audio CD ("Terminator II" SE), 53 min., Default format AAC

Video Encoding

TMPGEnc 4.7

Version: 4.7.3.292, Import File: "Terminator II" SE DVD (5 Minutes), Resolution: 720x576 (PAL) 16:9

DivX 6.9.2

Encoding mode: Insane Quality, Enhanced Multi-Threading, Enabled using SSE4, Quarter-pixel search

Xvid 1.2.2

Display encoding status=off

MainConcept Reference 2.0

MPEG2 to H.264, MainConcept H.264/AVC Codec, 28 sec HDTV 1920x1080 (MPEG2), Audio: MPEG2 (44.1 KHz, 2 Channel, 16-Bit, 224 Kb/s), Mode: PAL (25 FPS), Profile: Tom’s Hardware Settings for Qct-Core

HandBrake 0.9.4
Version 0.9.4, convert first .vob file from The Last Samurai to .mp4, High Profile

Applications

Autodesk 3ds Max 2010 (64-bit)

Version: 2010 Service Pack 1, Render Space Flyby Scene at 1920x1080 (HDTV)

WinRAR 3.90

Version 3.90 (64-bit), Benchmark: THG-Workload (334 MB)

7-Zip

Version 4.65, Built-in Benchmark

Adobe Photoshop CS5
Radial Blur, Shape Blur, Median, Polar Coordinates filters
Grisoft AVG Anti-Virus 11.0
Version: 10.0.1120, Benchmark: Scan 334 MB Folder of ZIP/RAR compressed files

Synthetic Benchmarks and Settings

3DMark Vantage

Version: 1.02, GPU and CPU scores

PCMark Vantage

Version: 1.00, System, Memories, TV and Movies, and Productivity benchmarks, Windows Media Player 10.00.00.3646

SiSoftware Sandra 2010

CPU Test=CPU Arithmetic/Multimedia, Memory Test=Bandwidth Benchmark

Games
Metro 2033
High Quality Settings, AAA / 4xAF, 4xAA / 16xAF, vsync off, 1280x1024 / 1680x1050 / 1920x1200 / 2560x1600, DirectX 11, Steam Version, Built-In Benchmark
Just Cause 2
High Quality Settings, No AA / 2xAF, 4xAA / 16xAF, vsync off, 1680x1050 / 1920x1200 / 2560x1600, Desert Sunrise Benchmark, Steam Version
Call of Duty: Modern Warfare 2
Ultra High Settings, No AA / No AF, 4xAA / No AF, 1280x1024 / 1680x1050 / 1920x1200 / 2560x1600, Second Sun, 45 second sequence, Fraps
DiRT 2
High / Ultra High Settings, No AA / No AF, 8xAA / No AF, 1280x1024 / 1680x1050 / 1920x1080 / 2560x1600, In-Game Benchmark, Steam Version

The results in CoD are telling for one reason: this is perhaps the most processor-bound game in our suite, so the fact that performance differences are minimal sets the tone for the rest of our gaming tests.

Of course, the largest gaps are visible at 1280x1024. All three 4 GHz platforms are fairly similar, though. The only laggard is the 2.4 GHz Xeon, though by the time we hit 2560x1600, the impetus is on the graphics card to deliver more speed.

Adding anti-aliasing to the equation tightens things up even more, and now we see parity at 1920x1200, with just a bit of variance at 1680x1050.

Not quite as demanding as Metro, but certainly more intense than Call of Duty, DiRT 2 shows the 2.4 GHz dipping below the 4 GHz processors at 1280x1024 and to a much lesser extent at 1680x1050. Otherwise, all four configurations are on fairly even footing—more so when we enable anti-aliasing.

Now in version 10, we’d expect iTunes to take better advantage of threaded processors, but it still doesn’t. The three 4 GHz chips turn in nearly identical scores, and the 2.4 GHz stock Xeon shows what a 1.6 GHz deficit does in a single-threaded piece of software.

Now this is more like it. Understandably, the quad-core 2.4 GHz Xeon gets smoked in this freely-available well-threaded transcoding app. But when it’s overclocked to 4 GHz, it’s able to edge out Intel’s Core i7-930.

As a point of comparison, the six-core Core i7-970 is 25% faster than the i7-930, but you have to pay an additional 300% for that extra performance. Unless you’re transcoding professionally and simply cannot get enough compute muscle, it’d be hard to justify such a steep premium.

DivX capitalizes on available core count, and easily shows how much faster the overclocked Xeon E5620 is versus the same CPU in stock form. A higher memory frequency and more L3 cache help nudge that overclocked Xeon in front of Intel’s Core i7-930 running at 4 GHz, but the six-core Core i7-970 takes first place (at a significant cost).

Xvid isn’t as nice to the Gulftown-based chip, crashing before the job can finish. This is an issue we’ve seen before, and it looks like it still hasn’t been fixed. Instead, the overclocked Xeon takes first-place here.

A 50%-higher core count and an extra 4 MB of L3 cache buys 34% additional performance when you compare the Core i7-970 to the i7-930. That’s not perfect scaling, but it’s reasonable. The overclocked Xeon E5620 is 2% quicker than the i7-930 baseline at 4 GHz.

Clearly, threading rules this test—the question is: are you ready to pay hundreds of dollars more for the speed boost?

To be honest, I didn’t tackle all of those benchmarks thinking that Intel’s Xeon E5620 was going to somehow magically outperform a six-core desktop chip. I also didn’t think 4 MB of additional L3 cache was going to put a huge lead over the 45 nm Bloomfield design, with its 8 MB repository.

Rather, I was hoping to see higher frequencies at lower operating temperatures, perhaps with a little power-savings sprinkled on top.

I took ambient temperature readings in between each result using an Extech TM200 thermometer. You can disregard those orange and red bars—the GPU remains fairly consistent at idle and load, regardless of the processor behind it. More interesting are the blue and green bars.

It comes as little surprise that the stock Xeon E5620 is an example of low thermal output thanks to conservative clocks and low operating voltage.

The overclocked Xeon runs significantly warmer due to a 1.6 GHz frequency increase and a higher fixed voltage.

Two additional cores mean that the Core i7-970 gets hotter still—about 10% warmer than the quad-core Xeon.

And an older manufacturing process translates to significantly hotter idle and load temperatures for the overclocked Core i7-930. And when you consider the ambient temp hovered around 32 degrees in my lab, adding 57 to that sticks the loaded Bloomfield core up around 90 degrees. That’s uncomfortably warm, long-term. In fact, I’d probably recommend dialing back to 3.73 GHz or so and dialing back voltage a bit in order to hopefully get a little more useful life out of the chip.

At idle, the overclocked Gulftown-based processors use the same amount of power. The stock Xeon is quite a bit more conservative with its consumption. And the Core i7-930 is only moderately higher than the other 4 GHz CPUs.

Load the CPUs down, though, and you get another story entirely. The stock Xeon E5620 is still fairly power-friendly. Overclocked and overvolted, consumption rises by nearly 100 W. Yet, the Xeon E5620 still uses 50 W less than the overclocked Core i7-970. And the Xeon uses roughly 60 W less than the other overclocked quad-core chip in this comparison, Intel’s Core i7-930.

Those results translate over to CPU+GPU power measurements, too. The overclocked Xeon E5620 uses 60 W less than the 4 GHz Core i7-930 setup. So, you’re getting roughly the same performance, significant power savings, and less heat output for a $100 price premium.

First, let’s take a look at the most underwhelming chart of this little experiment:

When it comes to gaming, even a 2.4 GHz Xeon E5620 delivers 96% of the performance made available by a Core i7-930 overclocked to 4 GHz. That’s pretty darn damning.

As you shift over to A/V- and productivity-oriented apps, the overclocked Xeon manages to establish a slim victory at the same frequency as Intel’s Core i7-930. But it’s the Core i7-970 that pulls the largest lead, thanks to its six cores.

Factor power consumption into the picture and the focus gets a little sharper. The stock Xeon E5620 is about 75% as fast as the Core i7-930 overclocked to 4 GHz, but it also uses 75% of the power. And as a result, it’s as efficient. Overclocked, the same CPU sports 102% of the i7’s performance, yet it only uses 92% of its power, thanks to 32 nm manufacturing. Thus, it gains a more significant efficiency advantage. Finally, the Gulftown-based Core i7-970 is much faster in our thread-optimized benchmark suite, it uses less power than the quad-core i7-930, and so it’s the most efficient CPU being tested.

This chart knocks Intel’s second-best desktop chip off of its high horse in a big way, though. A price tag more than 300% higher than the Core i7-930 means you only get 38% of the Bloomfield chip’s value, as measured by average performance over price.

Because it’s sold at such an attractive price, the Core i7-930 at 4 GHz actually gives you the best value of the group. Remember, though, that it takes a bit of coaxing to get this retail chip stable at 4 GHz—including one of the highest-end air coolers available. Remember also that the Xeon E5620 can actually be coaxed up to 4.2 GHz or so if you’re willing to pull out the stops and break the 220 MHz BCLK ceiling. So, this chart doesn't tell the whole tale. A Xeon E5620 will likely buy you more performance than the Core i7-930, which we'd scale back to 3.73 GHz anyway.

I can only lament the fact that Intel isn’t selling the Xeon E5620 on the desktop as a different K-series SKU with an unlocked multiplier. I’d even take a $10 or $20 premium on a part like that. Nevertheless, with a bit of fancy footwork, it’s possible to overclock this thing to 4 GHz and beyond.

Is Intel’s Xeon E5620 a solid enough value to displace CPUs like the Core i7-930 and -950 in the minds of ambitious overclockers? Not definitively, no. It’s on par. But because its multiplier is set so low, you really need 1) a good motherboard and 2) a solid approach to overcoming BCLK frequency limitations.

Fortunately, Asus’ Rampage III Formula is good enough. And with relatively little effort, 4 GHz is a piece of cake. It takes a little more ambition to get 4.2 or 4.3 GHz running stable. But even still, the 32 nm chip doesn’t run hot at all—I never saw the thing crest 80 degrees, even with 1.425 V applied to it. A combination of higher IOH voltage, an elevated PCIe clock, and lower QPI data rates are all viable strategies for breaking past that pesky BCLK wall so many enthusiasts run into.

Don’t expect boatloads of additional performance due to the extra 4 MB of L3 cache—in our testing, there were only a couple of instances where the overclocked Westmere-EP-based Xeon E5620 outperformed the Bloomfield-based Core i7 at 4 GHz (and that was with dissimilar memory clocks, so a couple of variables likely came into play).

What you can expect, though, is a processor that requires less voltage at 4 GHz and beyond than those based on the 45 nm process. Consequently, it runs less hot and consumes less power. For my money, I’d be far more comfortable with it in my own workstation, long-term, than a CPU running at 90+ degrees under load. For many overclockers, those attributes alone won’t be worth the $100 price premium Intel’s Xeon E5620 holds over its Core i7-930. For others, that $100 is far more palatable than an extra $600+ for the Core i7-970—the only enthusiast-class 32 nm CPU in the company’s desktop lineup.

With Sandy Bridge still months away, aimed at the mainstream, and largely inaccessible to the overclocking community (save a couple of K-series SKUs that’ll be marked up), the Xeon E5620 may very well be a viable choice for power users who just can’t wait for Intel to refresh its enthusiast platform late next year with LGA 2011.

No comments:

Post a Comment