Hi, AMD released the Opteron processor family today leaving people with the budget to buy new hardware wondering what exactly to purchase next. Here are some data to underpin your decision-making and convincing (whomever: head of department, parents, wife). I've let the suite of benchmarks from GiNaC-1.1.3 run on various machines, all clocked at 1.4 GHz, with one exception: the Itanium1 in the list was clocked at a mere 930MHz. Its timings were adjusted to accomodate for this difference in clock rates as far as possible. You'll notice anyway how that early silicon isn't worth being listed, it's just way slower than all the other machines. This has changed with the Itanium2. But apart from the two monsters M2 and N (with unconclusive results) it seems like the AMD chips generally perform faster than the Intel silicon. Naturally, all functions (except, perhaps, A, B and C) exercise some fairly jerky code with many branches. This is generally the case with CAS. Note that with the machines tested, the P-IV's handicap (its ridiculously large pipeline) was compensated by it having a comparatively fast DDR333, as opposed to the P-III which had less memory bandwith. Well, for such reasons the numbers below should be taken with a grain of salt. But still, my next personal machine won't be from Intel, and it won't be 32Bit either. So, without further ado, here are the numbers: Opteron Itanium2 Itanium1 P-III P-IV Athlon ------------------------------------------------------------------------------------------------------------ commutative expansion and substitution, size 200. 0.45s 0.55s 1.04s 0.63s 0.56s 0.53s commutative expansion and substitution, size 500. 3.43s 4.17s 7.78s 4.46s 4.25s 4.22s Laurent series expansion of Gamma function, order 20. 0.34s 0.43s 0.84s 0.57s 0.5s 0.36s Laurent series expansion of Gamma function, order 25. 1.37s 1.65s 3.27s 2.29s 2.01s 1.59s determinant of symbolic 10x10 Vandermonde matrix. 0.32s 0.37s 0.65s 0.45s 0.38s 0.27s determinant of symbolic 12x12 Vandermonde matrix. 2.99s 3.56s 6.07s 4.14s 3.48s 2.62s determinant of symbolic 8x8 Toeplitz matrix. 0.31s 0.37s 0.66s 0.43s 0.45s 0.27s determinant of symbolic 9x9 Toeplitz matrix. 1.25s 0.89s 2.59s 1.64s 1.88s 1.29s hash map size 500000 insert. 1.02s 1.16s 1.8s 1.01s 1.35s 1.15s hash map size 500000 find. 0.56s 0.69s 1.15s 0.63s 0.93s 0.62s hash map size 500000 erase. 0.43s 0.5s 0.85s 0.44s 0.66s 0.5s Lewis-Wester test A (divide factorials). 0.34s 0.357s 0.61s 0.38s 0.4s 0.43s Lewis-Wester test B (sum of rational numbers). 0.005s 0.008s 0.01s 0.006s 0.005s 0.004s Lewis-Wester test C (gcd of big integers). 0.059s 0.129s 0.16s 0.093s 0.116s 0.059s Lewis-Wester test D (normalized sum of rational fcns). 0.095s 0.136s 0.21s 0.135s 0.13s 0.083s Lewis-Wester test E (normalized sum of rational fcns). 0.07s 0.093s 0.16s 0.10s 0.098s 0.062s Lewis-Wester test F (gcd of 2-var polys). 0.009s 0.011s 0.021s 0.015s 0.011s 0.008s Lewis-Wester test G (gcd of 3-var polys). 0.27s 0.42s 0.65s 0.38s 0.43s 0.27s Lewis-Wester test H (det of 80x80 Hilbert). 1.37s 1.63s 2.45s 2.32s 2.08s 1.18s Lewis-Wester test I (invert rank 40 Hilbert). 0.38s 0.47s 0.73s 0.64s 0.56s 0.33s Lewis-Wester test J (check rank 40 Hilbert). 0.21s 0.28s 0.43s 0.34s 0.33s 0.18s Lewis-Wester test K (invert rank 70 Hilbert). 2.62s 3.24s 4.93s 4.28s 3.92s 2.31s Lewis-Wester test L (check rank 70 Hilbert). 1.28s 1.7s 2.69s 1.96s 1.9s 1.14s Lewis-Wester test M1 (26x26 sparse, det). 0.055s 0.064s 0.105s 0.093s 0.07s 0.049s Lewis-Wester test M2 (101x101 sparse, det). 264.99s 247.25s ---- 219.45s 304.25s 325.45s Lewis-Wester test N (poly at rational fcns). 205.08s 219.29s ---- 186.6s 242.13s 236.26s Lewis-Wester test O1 (three 15x15 dets)... (average) 7.95s 9.32s 16.14s 9.23s 10.0s 9.54s Lewis-Wester test P (det of sparse rank 101). 0.289s 0.24s 0.35s 0.38s 0.38s 0.3s Lewis-Wester test P' (det of less sparse rank 101). 0.78s 0.96s 1.46s 1.35s 1.16s 0.68s Lewis-Wester test Q (charpoly(P)). 16.54s 19.18s 31.04s 25.22s 24.01s 14.29s Lewis-Wester test Q' (charpoly(P')). 33.75s 41.05s 64.28s 49.95s 51.62s 27.46s computation of antipodes in Yukawa theory...... (total) 8.27s 10.4s 18.52s 12.54s 11.51s 7.15s Fateman's polynomial expand benchmark. 28.84s 28.29s 38.86s 31.72s 41.8s 41.17s Cheers -richy. -- Richard B. Kreckel <Richard.Kreckel@GiNaC.DE> <http://www.ginac.de/~kreckel/>
Hi! On Wed, Sep 24, 2003 at 12:07:40AM +0200, Richard B. Kreckel wrote:
AMD released the Opteron processor family today leaving people with the budget to buy new hardware wondering what exactly to purchase next.
Well, for those among us who don't have the budget to always buy the latest kick-ass machines (with their "SDRAM memory" and "hardware accelerated 3D" and other crazy stuff), the GiNaC Retro Hardware Testing Labs are proud to present what you've all been waiting for: The ultimate CAS shootout at 2x200 MHz - No rules, no mercy. Two CPUs enter, one CPU leaves. (then, after a while, the other CPU leaves, as soon as I manage to get the heat sink off the f*cking thing...) The contestants: System 1 - ppc: Umax Pulsar, Dual PowerPC 604e ("Extreme"?) at 200 MHz L1 cache: 32KB I, 32KB D per CPU Apple Tsunami board (also used in PowerMac 9500) L2 cache: 512KB for both CPUs, at 50 MHz 50 MHz system bus 144MB EDO RAM, 60ns Yellow Dog Linux 2.3 (based on Red Hat 7.2) Kernel 2.4.19-4asmp GCC 2.95.4 System 2 - x86: Dual Pentium Pro 512K at 200 MHz L1 cache: 8KB I, 8KB D per CPU L2 cache: 512KB per CPU, at 200 MHz Intel Providence (PR440FX) board 66 MHz system bus 256MB registered EDO RAM, 60ns Red Hat Linux 7.3 Kernel 2.4.20-20.7smp GCC 2.96 Both machines were equipped with Matrox Millennium graphics cards and SCSI hard disks (ppc: 4GB IBM Fast Narrow; x86: 2GB Conner Fast Wide). The Umax Pulsar features a fan that appears to be optimized for maximum noise output. Jet pilots should feel right at home with this computer. The Intel machine, on the other hand, sports a hard disk that I could still hear while standing under the shower. Ear protection should be worn at all times when running both systems in the same room. But on to the benchmarks... The tests consisted of compiling GiNaC 1.0.15 (GiNaC >=1.1 would have required GCC 3), and running its standard benchmark suite. The compiler options used were ppc: -g -O2 -mcpu=604e x86: -g -O2 -march=pentiumpro and GiNaC was configured with the --disable-static option (the shared library will be the one used most by applications, anyway). For the compilation test, only the time required for compiling the library and tools (ginsh/viewgar) was measured, not the time for compiling the benchmark suite. The library was built with "make -j 2" ("make -j 3" was slower by about 30s on both machines). ppc x86 ---------------------------------------------------------------------- compile GiNaC 1.0.15 25m 34s 16m 42s The Pentium Pro really shines here, which may be due to its faster and larger (combined) L2 cache. But this comparison isn't quite fair really, as the compilers are of course using different backends on both systems and producing different output. So, without further ado, on to the real tests: ppc x86 ---------------------------------------------------------------------- commutative expansion and substitution, size 100 1.43s 1.62s commutative expansion and substitution, size 200 7.32s 7.14s ratio [5.12] [4.41] Laurent series expansion of Gamma function, order 20 9.91s 7.429s Laurent series expansion of Gamma function, order 25 38.74s 28.339s ratio [3.91] [3.81] determinant of symbolic 10x10 Vandermonde matrix 6.55s 6.86s determinant of symbolic 12x12 Vandermonde matrix 56.57s 63.28s ratio [8.64] [9.22] determinant of symbolic 8x8 Toeplitz matrix 4.82s 5.65s determinant of symbolic 9x9 Toeplitz matrix 18.98s 21.12s ratio [3.94] [3.74] Lewis-Wester test A (divide factorials) 0.38s 0.56s Lewis-Wester test B (sum of rational numbers) 0.04s 0.059s Lewis-Wester test C (gcd of big integers) 0.4s 0.619s Lewis-Wester test D (normalized sum of rational fcns) 1.5s 1.689s Lewis-Wester test E (normalized sum of rational fcns) 1.28s 1.489s Lewis-Wester test F (gcd of 2-var polys) 0.17s 0.19s Lewis-Wester test G (gcd of 3-var polys) 3.91s 4.459s Lewis-Wester test H (det of 80x80 Hilbert) 23.12s 27.66s Lewis-Wester test I (invert rank 40 Hilbert) 7.37s 8.6s Lewis-Wester test K (invert rank 70 Hilbert) 47.17s 54.45s ratio [6.40] [6.33] Lewis-Wester test J (check rank 40 Hilbert) 3.95s 5.05s Lewis-Wester test L (check rank 70 Hilbert) 22.25s 28.36s ratio [5.63] [5.62] Lewis-Wester test M1 (26x26 sparse, det) 0.88s 1.189s Lewis-Wester test O1 (three 15x15 dets) (average) 109.783s 90.246s Lewis-Wester test P (det of sparse rank 101) 2.86s 4.19s Lewis-Wester test P' (det of less sparse rank 101) 14.66s 17.51s computation of antipodes in Yukawa theory (total) 192.64s 172.27s timing Fateman's polynomial expand benchmark 362.21s 293.579s Now, this comes as a bit of a surprise. After reading the MuPAD benchmarks published at http://www.heise.de/ct/english/96/11/270/ running on machines very similar to mine, I really expected the Pentium Pro to wipe the floor with the PowerPC here, but it's actually the other way round. The 604e wins almost all categories, with some notable exceptions: the Gamma series expansion, O1, the Yukawa thing, and the expand benchmark. On the other hand, judging from the "ratio" lines above, the performance of the Pentium Pro appears to scale better with larger data sets (again with one exception: the Vandermonde determinants). This, no doubt, is due to the faster cache and generally better memory interface of the Intel machine. But still, my next personal machine won't be a Pentium Pro, and it won't be a "G2" PowerMac, either. The VCS 2600 is going cheap on eBay, though... Bye, Christian -- / Physics is an algorithm \/ http://www.uni-mainz.de/~bauec002/
participants (2)
-
Christian Bauer
-
Richard B. Kreckel