Monday, March 28, 2011
Lucid's Virtu Enables Simultaneous Integrated/Discrete GPU on Sandy Bridge Platforms
We first met LucidLogix (now just Lucid) 2.5 years ago at IDF. The promise was vendor-agnostic multi-GPU setups with perfect performance scaling. The technology was announced at a very important time. Intel and NVIDIA were battling out support for SLI on Nehalem motherboards.
NVIDIA didn't want SLI enabled on any non-NVIDIA chipsets, and Intel wasn't about to let NVIDIA build any chipsets for Nehalem. Lucid's Hydra technology seemed to be exactly what we needed to get around the legal holdup that kept Nehalem users from enjoying SLI.
Three things made Lucid's technology less interesting as time went on. Hydra took two years to come to market, NVIDIA enabled SLI on Intel platforms and single GPU performance got really, really good.
What made Lucid's Hydra tech possible was a software layer that intercepted OpenGL and DirectX calls from the CPU and directed them to a GPU of Lucid's choosing. While Hydra saw limited success, parts of the technology had another application.
Sandy Bridge's Platform Issues
Although we came away impressed by Intel's Sandy Bridge CPU and GPU, it was the platform that really let us down. SATA controller errata aside, Intel's 6-series chipset lineup had a huge problem. At launch the P67 was the only chipset that supported CPU overclocking, however P67 doesn't support SNB's on-die GPU. Enter the H67 chipset, which does support processor graphics but it doesn't support overclocking. It gets worse.
One of the biggest features Sandy Bridge has to offer is the support for hardware assisted video transcoding (Quick Sync). In our review we found Intel's Quick Sync to be the absolute best way to transcode video for use on portable devices. There's just one issue: Quick Sync only works when the on-die GPU is active.
If you pair Sandy Bridge with a discrete GPU on the desktop, you lose the ability to use one of the CPU's biggest features.
Intel will address the overclocking/processor graphics exclusion through the upcoming Z68 chipset, however that doesn't solve the problem of not being able to use Quick Sync if you have a discrete GPU installed. Intel originally suggested using multiple monitors with one hooked up to the motherboard's video out and the other hooked up to your discrete GPU to maintain Quick Sync support, however that's hardly elegant. At CES this year we were shown a better alternative from none other than Lucid.
Remember the basis of how Hydra worked: intercept API calls and dynamically load balance them across multiple GPUs. In the case of Sandy Bridge, we don't need load balancing - we just need to send games to a discrete GPU and video decoding/encoding to the processor's GPU. This is what Lucid's latest technology Virtu, does.
The name Virtu is short for GPU Virtualization and the setup is pretty simple at a high level.
Start with a platform that supports Sandy Bridge's processor graphics (H6x or Z68) and connect your display to the motherboard's video out. Add in a supported discrete GPU, supply power but don't connect your monitor to it.
Virtu behaves a lot like Hydra. It intercepts API calls and passes them along to a GPU of its choosing. Unlike Hydra however, the goal here isn't to spread the load across multiple GPUs. Instead, Virtu aims to match each task with the GPU best suited to it.
Video output is handled by SNB's GPU, data is simply copied from the dGPU's frame buffer to the iGPU's frame buffer for output. There should be some overhead in this process however Lucid claims it's minimal.
What we end up with is a system that should run all 3D games on your discrete GPU, and run all video decoding and encoding on SNB's GPU. Since this isn't switchable graphics but rather a form of GPU virtualization you can actually run iGPU and dGPU applications at the same time (e.g. you can watch a movie in one window on the iGPU and play a game in another on the dGPU).
Virtu relies on profiles and hard coded GPU support. Currently there are around 100 games/benchmarks that are supported by Virtu. Eventually you'll be able to manually add your own titles but for now we have to rely on what Lucid has validated and enabled. GPU support is broad but limited to anything from the AMD 4xxx, 5xxx and 6xxx series as well as the NVIDIA 2xx, 4xx and 5xx series. Lucid pledges to always ensure the top games are tested/supported as well as the previous two generations of AMD and NVIDIA GPUs.
The Virtu software will be bundled with motherboards. The business arrangements will take place between the motherboard manufacturers and Lucid itself, the end user shouldn't have to worry about licensing the software.
Lucid gave us a copy of the software it shared with motherboard manufacturers: a Virtu release candidate. The software is still not mass production and there are some limits (e.g. can't define our own game profiles, there's a Virtu logo plastered randomly on the screen when you're gaming) but it's enough to give us a brief look at the technology.
Installing Virtu was very simple. Just go through the installer application, reboot and you're good to go. The only requirements are that you're using a compatible video card and that your display is connected to the SNB video out and not the discrete GPU.
Once loaded the first thing I noticed was AMD's Catalyst Control Center and NVIDIA's control panel refused to load. As far as they were concerned, I was running an Intel HD 3000 GPU and they weren't needed. The appropriate AMD and NVIDIA drivers did load however.
Other than the irate control panels, the rest of the experience was completely seamless. I ran games, browsed the web and even transcoded a video - each application behaved as if the only GPU available was the one best suited for the task. Quick Sync even came up as an option under Arcsoft's Media Converter 7.
I measured performance with Virtu and natively off of the dGPU itself in four games to see how much overhead the frame buffer copying and Virtu interception posed:
AMD Lucid Virtu Performance Impact - 1920 x 1200, 4X AA, High Quality
Civilization V DiRT 2 Metro 2033 World of Warcraft
AMD Radeon HD 6970 39.6 fps 76.4 fps 34.7 fps 111.5 fps
AMD Radeon HD 6970 (Virtu) 36.5 fps 74.4 fps 32.3 fps 102.8 fps
NVIDIA Lucid Virtu Performance Impact - 1920 x 1200, 4X AA, High Quality
Civilization V DiRT 2 Metro 2033 World of Warcraft
NVIDIA GeForce GTX 460 38.8 fps 69.4 fps 18.7 fps 85.4 fps
NVIDIA GeForce GTX 460 (Virtu) 35.8 fps 48.0 fps 18.0 fps 79.7 fps
I generally saw a 2 - 8% drop in performance compared to a standalone discrete GPU without Virtu. The only exception was a big 30% drop on the GeForce GTX 460 running the DiRT 2 benchmark. Given the relatively consistent performance everywhere else, I'm guessing this is an early-software-artifact rather than a normal occurrence.
I also ran a Quick Sync test both with and without a discrete GPU attached - performance remained unchanged:
Lucid Virtu Performance Impact
Quick Sync Nikon D7000 (1080p24) to iPhone 4
AMD Radeon HD 6970 + Intel HD Graphics 3000 (Virtu) 199.3 fps
Intel HD Graphics 3000 199.3 fps
Finally I decided to run a Quick Sync test while I ran our Metro 2033 benchmark to see how running two tasks, each on an independent GPU, impacted each other:
Lucid Virtu Performance Impact (Metro 2033 + Quick Sync)
Quick Sync Nikon D7000 (1080p24) to iPhone 4 Metro 2033 Benchmark
Peak Theoretical Performance 199.3 f[s 36.5 fps
AMD Radeon HD 6970 + Intel HD Graphics 3000 (Virtu) 72.0 fps 32.1 fps
While Metro didn't lose much performance, the Quick Sync task ran considerably slower. Remember that the Quick Sync engine shares resources with the Sandy Bridge CPU cores (mainly the ring bus and L3 cache). Having the CPU working on feeding the dGPU vertex data definitely impacts Quick Sync performance.
Finally I measured power consumption:
Lucid Virtu Power Consumption
Idle Load (Metro 2033)
Intel HD Graphics 3000 34.7W N/A
AMD Radeon HD 6970 (Virtu) 126W 265W
NVIDIA GeForce GTX 460 (Virtu) 52.0W 191W
Here we see that there are still some kinks that need to be worked out. With the Radeon HD 6970 idle power is still quite high, even with the dGPU idle. The GeForce GTX 460 paints a different picture as Lucid manages to mostly power down the NVIDIA GPU when it's not in use. Note that even in this case there's a power penalty over a purely integrated setup - the dGPU is still active to a certain extent.
Intel is slowly correcting the issues with the Sandy Bridge platform situation. The first B3 stepping 6-series chipsets are now in the hands of OEMs and motherboard manufacturers and Z68 boards are coming in the next quarter. Lucid's Virtu is a key part of the strategy however, at least on the desktop. In mobile it's a non-issue as everyone supports some form of switchable graphics there, but for desktops we need a universal solution. While the Virtu release candidate still needs some work, it's far more polished than I expected it to be.
Once setup there's no user intervention necessary - the software just works. Fire up a game and it'll run on your discrete GPU. Visit YouTube or transcode a video and your discrete GPU powers down leaving Sandy Bridge's on-die graphics to handle the workload.
There is definite overhead to Virtu - I measured 2 - 8% on average, however I did see a 30% figure pop up in DiRT 2 on NVIDIA hardware. I'd expect the performance hit to be less than 10% in most cases.
Board makers and OEMs should have their hands on the RC of Virtu now, meaning we should see it show up in motherboard boxes in the not too distant future. Of course this still doesn't take care of those users who wish to overclock their CPU, pair it with a discrete GPU and use Quick Sync as well. We'll have to wait until Z68 for that to happen. Even then, Lucid's Virtu will still likely play a role in those systems.