Original Link: https://www.anandtech.com/show/1138
As we look toward the introductions on September 23rd, some things are starting to become clearer about Athlon64. Announcements from AMD and word-of-mouth all point to the performance of Athlon64 and Opteron being very close if not identical. We also are hearing rumors from the Inquirer and elsewhere that the 754-pin Athlon64 will likely be introduced initially at 2.0GHz, with a revised (and more realistic) Performance Rating that will place it somewhere around 3200+, which is the current highest PR of the top Barton. However, no one has really done much in answering how the Athlon64 will perform compared to current Athlon and Pentium 4 CPUs. While the delay of Microsoft’s 64-bit Windows XP still will not allow us to test 64-bit Athlon64 performance, we do have the tools at hand to give a good idea of what to expect from 32-bit Athlon64 performance when it is introduced in the next few weeks.
When Anand Shimpi first tested Opteron in April, there were only server-based boards available for testing. The single-CPU nVidia nForce3, which has real AGP 8X and Enthusiast-level overclocking options, would not be released for a couple of months. With nForce3 for Opteron now available in the market, and the expectation of a 2.0GHz Athlon64 introduction, we went back to our nVidia nForce3 reference board with an Opteron capable of running 200FSB to see where Athlon64 might land.
With the AGP/PCI lock and FSB overclocking of the nForce3, we were able to reach a stable 2.0Ghz (222x9) at default voltage with our 1.8 Opteron, even though we were running 2GB (512MBx4) of Dual-Channel ECC memory. With full support of AGP 8X, we were also able to use our standard ATI Radeon 9800 PRO for benchmarking.
With the nForce 3 running Opteron at 2.0Ghz with a Radeon 9800 PRO, we had the platform to give our readers a decent preview of Athlon64 performance. So how will Athlon64 likely compare to the best Pentium 4 CPU’s and current Barton processors?
nVidia nForce3 Chipset
Anand Shimpi will be talking much more about chipsets at launch, but since this is our first real look at the nForce3 chipset, we should talk a bit about key features. One of the most exciting, and also controversial, features of the Opteron/Athlon64 is AMD’s decision to include the memory controller on the CPU. There are tremendous potential speed advantages to this solution, but the complexity of manufacturing also increases significantly. This can dramatically lower yields, which are becoming increasingly important in a competitive CPU environment.nVidia’s nForce3 PRO is the only one of the Opteron/nForce3 chipsets in a single-chip package. Both VIA K8T800 and SiS755 will use the more familiar Northbridge/Southbridge arrangement. Key features for nForce3 PRO are:
We also see nVidia including some more familiar options, like the much-talked-about AMD Hypertransport first used in their nForce chipset, 6 USB 2.0 ports, and AC’97 with an SPDIF interface. Undoubtedly, the consumer versions of nForce3 will have even more features, but as a base chipset, nForce 3 is certainly competitive with Intel’s latest 875/865 chipsets.
- Single-Chip Solution — Revolutionary single-chip solution designed for the AMD Opteron CPU enables higher-quality, full-featured motherboards and delivers maximum performance with the lowest latency. The single-chip design also means less power consumption and less heat dissipation.
- Dual-Channel DDR400 Memory — Our reference board includes full support for Dual-Channel DDR ECC memory, and the Athlon64 version will also support non-ECC memory
- Integrated SATA/IDE Raid — Provides support for RAID 0, RAID 1, and RAID 0+1, enabling the highest disk data transfer rates for highest system and application performance, and the highest performance fault tolerant solution for maximum data integrity. NVIDIA RAID supports both SATA and ATA-133 disk controller standards.
- Enterprise-Class Networking — Delivers the required manageability features required by IT professions while maintaining the highest level of reliability, quality, and performance. Also delivers the highest throughput for network transfers and lower CPU utilization, resulting in lower total cost of ownership.
- 64-Bit Architecture — NVIDIA nForce3 Pro provides advanced processing capabilities and system innovations for the new 64-bit AMD processor architecture.
More information on nForce3 PRO is available at www.nvidia.com/page/nforce3.html.
If you are interested in learning more about the features and architecture of Opteron and Athlon 64, you can access Anand’s excellent 3-part article at www.anandtech.com/cpu/showdoc.html?i=1815.
HyperTransport and Opteron/Athlon64 Overclocking
The first question many will have about our efforts to look at how Athlon64 will perform is how we can possibly compare an overclocked Opteron to a chip that is not overclocked. In the case of the Opteron, the comparison is more accurate than you might first think.In normal setups (e.g. Athlon/P4), the CPU gets its clock from the FSB clock and multiplies it by the “clock multiplier” to determine how fast its internal clock should be. With a 16x multiplier, when the external clock ticks once, the CPU ticks 16 times. However, with the Athlon 64/Opteron, there is no FSB, so the CPU must get its clock from somewhere. It doesn't produce it internally; instead, it derives it from the native HT (HyperTransport) frequency, which is 200MHz, but because of the bus' nature, it runs at an effective 800MHz.
So, for our 1.8GHz Opteron 144, the multiplier is 9x, which is why raising the HT frequency to 222MHz increases the clock speed to around 2GHz. But we are increasing the HyperTransport clock in our overclocking, and not a FSB clock, which does not exist on Opteron/Athlon64. In real terms, this means our CPU overclocking has a significant impact on Performance, but it is unlikely that our increase in memory speed will have nearly as much impact on performance. Since we are nowhere near saturating the Hypertransport bus at 200 (effective 800), increasing HyperTransport to 222 (888) will not likely have much, if any, impact on overall performance. Our performance improvements, with Opteron/Athlon64, are mainly coming from increase in CPU clock — much more so than on the Pentium 4 or Athlon architectures.
Obviously, the PCI bus operates at a different frequency than the HT bus than the CPU, but they all operate based on multiples of each other, and are all derived from the HyperTransport clock.
Performance Test Configuration
Performance Test Configuration | |
Processor(s): | AMD Opteron Socket 940 at 2.0GHz (9x222) 444FSB AMD Athlon XP 3200+ (2.2GHz, 400MHz FSB) Intel Dual Xeon 3.06 (1 Mb Cache)* 533FSB Intel Pentium 4 at 3.0GHz (800FSB) |
RAM: | 4 x 512MB Legacy ECC at 2.5-3-4-5 2 x 512MB Mushkin PC3500 Level II 2 x 256MB Corsair PC3200 TwinX LL (v1.1 or 1.2) Modules (SPD rated) |
Hard Drive(s): | Maxtor 120GB 7200 RPM (8MB Buffer) Western Digital 120GB 7200 RPM Special Edition (8MB Buffer) |
Video AGP & IDE Bus Master Drivers: | NVIDIA nForce version 2.45 (7/29/2003) NVIDIA nForce version 2.03 (1/30/03) VIA 4in1 Hyperion 4.47 (May 20, 2003) |
Video Card(s): | ATI Radeon 9800 PRO 128MB (AGP 8X) |
Video Drivers: | ATI Catalyst 3.6 |
Operating System(s): | Windows XP Professional SP1 |
Motherboards: | nVidia Reference nForce3 @ 222.0 MHz FSB DFI NFII Ultra LANParty (nForce2 Ultra 400) @ 201.35 MHz FSB Gigabyte 7NNXP (nForce2 Ultra 400) @ 202.77MHz FSB Soltek KT600-R (KT600)@200.01 FSB Asus PC-DL Dual 3.06 Xeon* @200.0 MHz Asus P4C800-E @200.5 MHz ABIT IS7-G (865PE) ABIT IC7-G (875P) Gigabyte 8KNXP (875P) |
*IMPORTANT: While the Dual Xeon 3.06 Asus PC-DL was included for comparison, please keep in mind that our standard benchmarks are not multi-threading enabled. Results should not be considered a comparison of multi-threading to a single processor. Since the Dual 875 is being targeted at the Gaming and Enthusiast markets, we believe it is fair to include the Dual Xeon 875 in comparisons to other solutions that also target the gaming and enthusiast market. |
Recent performance tests on Intel 875/865 boards used 2x512MB Mushkin PC3500 Level II Double-bank memory. Previous tests of Intel motherboards used 2x256MB Corsair 3200LL Ver. 1.1.
All performance tests run on nForce2 Ultra 400 motherboards utilized two 256MB Corsair TwinX LL PC3200 (v1.1 or 1.2) modules set to SPD timings in Dual-Channel DDR400 mode.
All performance tests that ran on the KT600-based motherboard used two 256MB Corsair TwinX LL PC3200 (v1.1 or 1.2) Corsair modules in DDR400 mode. 4-bank interleave and the highest available timing option (Turbo or Ultra) was used.
All performance tests were run with the ATI 9800 PRO 128MB video card with AGP Aperture set to 128MB with Fast Write enabled. Resolution in all benchmarks is 1024x768x32.
Additions to Performance Tests
We have standardized on ZD Labs Internet Content Creation Winstone 2003 and ZD Labs Business Winstone 2002 for system benchmarking.Game Benchmarks
We have added Gun Metal DirectX Benchmark 2 from Yeti Labs as a standard game benchmark. We are also evaluating the new X2 Benchmark, which includes Transform and Lighting effects as part of the standard benchmark. Results are reported here for reference.Jedi Knight II has been dropped form our standard Benchmark Suite. We were forced to use different patches for operation on Athlon and Intel Pentium 4, which made cross-platform comparison difficult, if not impossible. In addition, Opteron/Athlon64 requires a third patching variation for benchmarking. JK2 uses a Quake engine, and we are continuing Quake3 as a standard benchmark for the time being.
New Hardware
With the release of DirectX 9 late in 2002, the availability of Benchmarks to test DX9, and the availability of DX9-supporting video cards from both nVidia and ATI, we are now using the ATI Radeon 9800 PRO for all hardware reviews.
Media Encoding and Gaming Performance Benchmarks
Media Encoding and Gaming Performance Commentary
Gamers have always been an important and loyal market for AMD, but recently, Athlon has lost quite a bit of ground to Intel in this area. The gaming benchmarks were a very pleasant surprise on our Athlon64 level Opteron. The 2.0GHz Opteron on an ATI Radeon 9800 Pro video card significantly out-performed the same setup with Pentium 4. As you can see in our benchmarks, the older Quake 3 is about 10% faster on the 2.0GHz Opteron than it is on the fastest P4 that we have tested.Even more surprising is the performance of the A64 level Opteron on Gun Metal 2. This DX9 benchmark is an up-to-date gaming benchmark that shows the Opteron out-performing P4 and Athlon 3200+ by a whopping 42% to 54%. As we continue through Unreal Tournament 2003, our Opteron running at A64 speed is the clear gaming champion at 12% to 19% faster than number 2. We also are experimenting with the new X2 Benchmark as an addition to our gaming suite. X2 is heavy on Transform and Lighting effects, and therefore, adds another dimension to game benchmarks. The 2.0 Opteron was also the best performer in X2, but not by the margins we see in other game benchmarks.
This gaming performance is very good news for AMD, as Athlon64 appears capable of mopping the floor with the competition when it comes to gaming. The on-chip memory controller has had the promise of making this kind of difference in gaming performance. In as much as our Opteron at 2.0Ghz is representative of Athlon64 gaming performance, the Athlon64 will be a must-have for dedicated gamers. Keep in mind that this is a comparison of 32-bit gaming performance. As effective as the Athlon64/Opteron appear to be in this area, we can’t wait to see 64-bit gaming results.
XMpeg conversion benchmarks show nForce3/Opteron significantly faster than the 3200+ Barton, with a performance improvement of about 20%. This is still not enough to bring it to the best Pentium 4 performance levels, but it does make the 2.0 Opteron competitive with the best encoding performance.
High End Workstation Performance Benchmarks
High End Workstation Performance Commentary
Workstation performance, as measured by SPECviewperf, has been an area where Pentium 4 has dominated recently. The 2.0GHz Opteron certainly brings this area into parity at the very least, and Opteron becomes the leader in some areas. Comparing the ATI 9800 PRO performance, we find the A64 level Opteron the top performer in DRV-08 and UGS-01 benchmarks. In other SPECview tests, performance is very competitive and much faster than we have tested with the Athlon 3200+ on nForce2.To satisfy curiosity, we also compared performance of the Workstation nVidia Quadro FX2000 video card on both the dual Xeon Intel 875 platform and the single-CPU Opteron platform. You would expect that 2 Xeon 3.06 CPUs with 1MB of cache would be the clear winner of this comparison. The results, however, are quite surprising:
SPECviewperf 7.0 Performance nVidia Quadro FX2000 Workstation Video |
||
Benchmark | Asus PC-DL Dual Xeon 3.06 | nForce3 Single Opteron 2.0 |
3DSMax | 22.17 | 21.88 |
DRV | 59.80 | 59.80 |
DX | 59.83 | 59.87 |
Light | 29.88 | 25.75 |
PROE | 27.55 | 29.18 |
UGS | 31.39 | 28.48 |
The results are basically even, which is amazing considering we are comparing a single 2.0 GHz Opteron to Dual 3.06 Xeon with 1Mb cache. Certainly, Opteron/Athlon64 has much improved in Workstation performance compared to Athlon.
Content Creation and General Usage Performance Benchmarks
Content Creation and General Usage Performance Commentary
This area is the only mixed bag in evaluating our Opteron running at 2.0GHz. While the new AMD chips perform much better than the previous generation in Multimedia Content Creation, they continue to lag behind the best Pentium 4 solutions. However, performance is now competitive and final chipset and processor tweaks may indeed make this a stronger area for Athlon64. General Usage/Business Winstone has always been a stronger area for Athlon. Performance is again competitive, but since nForce2/Athlon is the leader in this performance area, we fully expect the release versions to perform much better than we are currently seeing.Final Words
Speculation has churned for months over whether AMD could reach the release speeds necessary for Athlon64 to compete effectively with Pentium 4 and the upcoming Prescott processor. The other concern was whether 32-bit performance would be good enough to make the Athlon64 the winner that AMD needs right now. If Athlon64 is released as a 2.0GHz chip, as rumors have reported, then it looks like Athlon64 will be a Processor that is competitive with the best Pentium 4 in all areas, with compelling performance in several areas.The impact of Dual-Channel memory is a little harder to estimate in our tests. Athlon64 has been widely reported to be single-channel, where Opteron is Dual-Channel. Again, we expect our results reported here to be in the ballpark. Particularly since reports from the web now indicate there will also be an AthlonFX introduced on the 23rd that is targeted at the Enthusiast, runs even faster, and is based on the Opteron with Dual-Channel memory.
Gaming is one area where our tests show Opteron at 2.0GHZ an amazing performer. When you find game benchmarks 10% to 20% higher, you are genuinely impressed. However, in some of the very latest DX9 benchmarks, Athlon64/Opteron was 40% to 50% faster. This will get the attention of the gaming community, which seems to have a genuine affection for anything AMD already. It is the kind of trend-setting performance that Athlon64 needed to get the attention of an influential market segment.
Workstation Graphics was expected to be a good performer for Athlon64/Opteron, and across the board, the 2.0Ghz Opteron did very well against the best from Intel. One particularly noteworthy area was the performance of the A64 level Opteron compared to an 875 Dual Xeon 3.06 system. We really expected the Xeon dually to trounce our single Opteron, but instead, found a virtual dead-heat. Multiple Opteron systems have been setting records in many areas, and we are certainly looking forward to looking at multiple 200 series Opterons after seeing what our single 144 can do.
The Content Creation and General Usage performance, while competitive, did not stand out like the other performance areas for the 2.0GHz Opteron. We were not really surprised in the Content Creation area, which has always been a challenge for AMD. But, we were a little surprised in the General Usage/Business area, which has always been an AMD strong suit. Since the top performers in this area are nForce2/Athlon combos, we expect that final release products will fare much better in this area. Remember that our Reference board is now a couple of months old, and much has been done in tweaking the nForce3 chipset already. We would be surprised if the Athlon64/nForce3 combo does not perform better in almost every area at launch.
As excited as we are with the performance we found in our Opteron tweaked to Athlon64, keep in mind that this is all 32-bit performance. To quote AMD:
“AMD64 processors like the AMD Opteron and upcoming AMD Athlon 64 processors are compatible with today’s hardware and software and smooth the transition to the next crucial step in the evolution of the personal computer, workstation, server, and supercomputing cluster.”While delayed, Microsoft’s 64-bit Operating System will carry Athlon64/Opteron to even higher Performance levels. There are also other 64-bit alternatives like Linux, which are not delayed, and who now have a platform opportunity to really grow as the 64-bit alternative. Time will tell if these other players will have any real impact on the 64-bit market. To make launch even more intriguing, we are also seeing many reports that another Athlon64, geared to the Enthusiast community, clocked higher, and an ever better performer, may also emerge on September 23rd.
These all look like good omens for AMD, and after a very long wait, it’s about time!!