Intel Xeon 6154 (DL380) vs AMD EPYC 7551 (DL385) on HPE ProLiant Gen10

Our partners at HPE have generously provided us with a couple of ProLiant Gen10 servers, and I’m going to do some testing using Red Hat Enterprise Linux 7.4 with KVM. (Update: had to switch to CentOS 7.4 with RDO OpenStack later on).

I’m really excited to see HPE being a front runner with AMD and DL385s, as well as AMD finally releasing a CPU that’s finally (and hopefully) will provide a much needed competition for Intel in the datacentre space.

Here’s the gear I’ll be working with for the next couple of days:

1. HPE ProLiant DL380 Gen10

This one has two Intel Xeon Gold 6154 CPUs and 1.5 TB of RAM (768 GB per CPU socket) in 64GB LRDIMMs. BIOS updated to U30 v1.32 (02/01/2018).

Processor Name Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz
Processor Status  OK
Processor Speed 3000 MHz
Execution Technology 18/18 cores; 36 threads
Memory Technology 64-bit Capable
Internal L1 cache 1152 KB
Internal L2 cache 18432 KB
Internal L3 cache 25344 KB

Networking is HPE Ethernet 10Gb 2-port 562FLR-SFP+ (Intel X710) and HPE Ethernet 10Gb 2-port 562SFP+ (Intel X710) for a total of 4×10 GbE ports, and HPE Ethernet 1Gb 4-port 331i (Broadcom BCM5719).

Storage is pretty basic as we have SAN infrastructure – HPE Smart Array E208i-a SR Gen10 with 2xVK000240GWJPD SSDs (240 GB) in a RAID1 configuration.

2. HPE ProLiant DL385 Gen10

This one has two AMD EPYC 7551 CPUs and 1.5 TB of RAM (768 GB per CPU socket) in 64GB LRDIMMs. BIOS updated to A40 v1.04 (12/12/2017).

Processor Name AMD EPYC 7551 32-Core Processor
Processor Status  OK
Processor Speed 2000 MHz
Execution Technology 32/32 cores; 64 threads
Memory Technology 64-bit Capable
Internal L1 cache 3072 KB
Internal L2 cache 16384 KB
Internal L3 cache 65536 KB

Networking is HPE Ethernet 10Gb 2-port 562FLR-SFP+ (Intel X710) and HPE Ethernet 10Gb 2-port 562SFP+ (Intel X710) for a total of 4×10 GbE ports, and HPE Ethernet 1Gb 4-port 331i (Broadcom BCM5719).

Storage is similar to the one above – HPE Smart Array E208i-a SR Gen10 with 2xVK000240GWEZB SSDs (240 GB) in a RAID1 configuration.

Phoronix Testing Suite

Phoronix Testing Suite results are available here (PDF): AMD_EPYC-vs-Intel_Xeon

Geekbench 4 testing

This one is not as thorough as Phoronix, however, it also gives a good indication as to what the performance differences are.

AMD EPYC 7551 – Geekbench 4 Scores
vCPU RAM (GB) Single Core Multi Core Reference
2 64 2960 5046 https://browser.geekbench.com/v4/cpu/7653460
4 64 2949 9000 https://browser.geekbench.com/v4/cpu/7653373
8 64 2772 14025 https://browser.geekbench.com/v4/cpu/7653374
16 64 2995 22812 https://browser.geekbench.com/v4/cpu/7653290
32 64 2432 29930 https://browser.geekbench.com/v4/cpu/7653215
64 64 2860 42876 https://browser.geekbench.com/v4/cpu/7652905
Intel Xeon 6154 – Geekbench 4 Scores
vCPU RAM (GB) Single Core Multi Core Reference
2 64 4454 8171 https://browser.geekbench.com/v4/cpu/7653507
4 64 4351 15035 https://browser.geekbench.com/v4/cpu/7653442
8 64 4476 27073 https://browser.geekbench.com/v4/cpu/7653431
16 64 4396 40824 https://browser.geekbench.com/v4/cpu/7653325
32 64 4364 54354 https://browser.geekbench.com/v4/cpu/7653259
64 64 4361 68964 https://browser.geekbench.com/v4/cpu/7653182

And some combined, side-by-side results:

Single Core Scores Multi Core Scores
vCPU RAM (GB) AMD EPYC 7551 Intel Xeon 6154
Difference AMD EPYC 7551 Intel Xeon 6154 Difference
2 64 2960 4454 66.46% 5046 8171 61.75%
4 64 2949 4351 67.78% 9000 15035 59.86%
8 64 2772 4476 61.93% 14025 27073 51.80%
16 64 2995 4396 68.13% 22812 40824 55.88%
32 64 2432 4364 55.73% 29930 54354 55.06%
64 64 2860 4361 65.58% 42876 68964 62.17%

Differences in POST duration

The first noticeable difference is the boot up of the server – I didn’t do any precise timing measurements but it feels like the AMD-based server is approximately 50% slower during POST, to the extent that the Intel-based server does POST and the OS finishes boot before the AMD-based server finishes POST, so to me it’s an indication that the AMD platform is still pretty much work in progress.

AMD – the bad and the ugly

No issues whatsoever with the Intel-based server – RHEL or CentOS 7.4 boot up nicely, even with stock 7.4 kernel with no updates at all.

AMD system is a completely different story though. The first thing I’ve noticed was CPU soft lockups during OS boot and later during server operation. Like this:

Message from syslogd@hpe-demo-amd7551 at Mar 8 11:38:50 ...
 kernel:NMI watchdog: BUG: soft lockup - CPU#64 stuck for 22s! [kworker/u257:1:1274]

Message from syslogd@hpe-demo-amd7551 at Mar 8 11:39:58 ...
 kernel:NMI watchdog: BUG: soft lockup - CPU#66 stuck for 22s! [kworker/u257:1:1274]

Message from syslogd@hpe-demo-amd7551 at Mar 8 11:40:26 ...
 kernel:NMI watchdog: BUG: soft lockup - CPU#66 stuck for 22s! [kworker/u257:1:1274]

I suspect this has something to do with the fact that the server was equipped with two dual-port Intel X710-based 10 GbE NICs because the kernel is pushing out stack traces specific to i40e driver:

[ 57.567895] tg3 0000:05:00.0 eth1: Tigon3 [partno(N/A) rev 5719001] (PCI Express) MAC address d0:67:26:cc:08:54
[ 57.567899] tg3 0000:05:00.0 eth1: attached PHY is 5719C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
[ 57.567901] tg3 0000:05:00.0 eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
[ 57.567902] tg3 0000:05:00.0 eth1: dma_rwctrl[00000001] dma_mask[64-bit]
[ 58.494137] i40e 0000:04:00.1: fw 6.70.48768 api 1.7 nvm 10.2.5
[ 59.985611] perf: interrupt took too long (7365 > 6721), lowering kernel.perf_event_max_sample_rate to 27000
[ 76.098346] perf: interrupt took too long (9757 > 9206), lowering kernel.perf_event_max_sample_rate to 20000
[ 80.121630] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:2:1015]
[ 80.121831] Modules linked in: sd_mod uas usb_storage qla2xxx nvme_fc nvme_fabrics nvme_core smartpqi crc32c_intel scsi_transport_fc scsi_transport_sas mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm i40e(O+) drm tg3(+) ptp i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
[ 80.121854] CPU: 0 PID: 1015 Comm: kworker/0:2 Tainted: G O L 4.15.9 #1
[ 80.121855] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 12/12/2017
[ 80.121861] Workqueue: events work_for_cpu_fn
[ 80.121876] RIP: 0010:i40e_poll_sr_srctl_done_bit+0x2a/0x70 [i40e]
[ 80.121877] RSP: 0018:ffffa37b5d92fa98 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff11
[ 80.121879] RAX: 000000008ae84000 RBX: 00000000000186a0 RCX: ffffa37b5fc00000
[ 80.121879] RDX: ffff89f1fd524742 RSI: 0000000000002ba1 RDI: ffff89f1fb258008
[ 80.121880] RBP: ffff89f1fb258008 R08: 00000000000271a0 R09: 0000000000008000
[ 80.121881] R10: 0000000000000000 R11: ffff89f1fb25a1c0 R12: ffff89f1fd524000
[ 80.121882] R13: 00000000000003a1 R14: ffff89f1fd524742 R15: ffffa37b5bb87b88
[ 80.121883] FS: 0000000000000000(0000) GS:ffff89f1ff000000(0000) knlGS:0000000000000000
[ 80.121884] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 80.121885] CR2: 00007f760f289000 CR3: 0000003e85a0a000 CR4: 00000000003406f0
[ 80.121886] Call Trace:
[ 80.121900] i40e_read_nvm_word_srctl+0x99/0xf0 [i40e]
[ 80.121908] i40e_calc_nvm_checksum+0x15a/0x250 [i40e]
[ 80.121915] i40e_validate_nvm_checksum+0x65/0xc0 [i40e]
[ 80.121922] i40e_diag_eeprom_test+0x4c/0x70 [i40e]
[ 80.121929] i40e_verify_eeprom+0x16/0x80 [i40e]
[ 80.121938] i40e_probe.part.78+0x6cc/0x1450 [i40e]
[ 80.121945] ? _cond_resched+0x15/0x30
[ 80.121949] ? kmem_cache_alloc+0x18d/0x1a0
[ 80.121952] ? __radix_tree_lookup+0x80/0xf0
[ 80.121953] ? __radix_tree_lookup+0x80/0xf0
[ 80.121958] ? irq_get_irq_data+0xa/0x20
[ 80.121961] ? mp_map_pin_to_irq+0xc0/0x330
[ 80.121963] ? kmem_cache_alloc_trace+0x18e/0x1a0
[ 80.121965] ? _raw_spin_unlock_irqrestore+0x11/0x20
[ 80.121969] ? pci_conf1_read+0xb2/0xf0
[ 80.121973] ? pci_read_config_word.part.9+0x6a/0x80
[ 80.121976] ? do_pci_enable_device+0xcd/0x100
[ 80.121978] local_pci_probe+0x44/0xa0
[ 80.121980] ? irq_affinity_notify+0xde/0x100
[ 80.121982] work_for_cpu_fn+0x16/0x20
[ 80.121985] process_one_work+0x158/0x370
[ 80.121987] worker_thread+0x1cf/0x3e0
[ 80.121989] kthread+0xf8/0x130
[ 80.121991] ? rescuer_thread+0x360/0x360
[ 80.121992] ? kthread_bind+0x10/0x10
[ 80.121994] ret_from_fork+0x22/0x40
[ 80.121996] Code: 00 0f 1f 44 00 00 55 48 89 fd 53 bb a0 86 01 00 eb 0f bf e3 53 00 00 e8 d5 b0 a0 fb 83 eb 01 74 13 48 8b 45 00 8b 80 10 61 0b 00 <85> c0 79 e3 31 c0 5b 5d c3 f6 85 c0 06 00 00 80 75 08 5b b8 db
[ 99.663108] perf: interrupt took too long (13010 > 12196), lowering kernel.perf_event_max_sample_rate to 15000
[ 108.121613] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:2:1015]
[ 108.121804] Modules linked in: sd_mod uas usb_storage qla2xxx nvme_fc nvme_fabrics nvme_core smartpqi crc32c_intel scsi_transport_fc scsi_transport_sas mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm i40e(O+) drm tg3(+) ptp i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
[ 108.121821] CPU: 0 PID: 1015 Comm: kworker/0:2 Tainted: G O L 4.15.9 #1
[ 108.121822] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 12/12/2017
[ 108.121863] Workqueue: events work_for_cpu_fn
[ 108.121872] RIP: 0010:i40e_poll_sr_srctl_done_bit+0x2a/0x70 [i40e]
[ 108.121873] RSP: 0018:ffffa37b5d92fa98 EFLAGS: 00000a12 ORIG_RAX: ffffffffffffff11
[ 108.121874] RAX: 00000000994d0000 RBX: 00000000000186a0 RCX: ffffa37b5fc00000
[ 108.121875] RDX: ffff89f1fd524a6a RSI: 0000000000006535 RDI: ffff89f1fb258008
[ 108.121876] RBP: ffff89f1fb258008 R08: 00000000000271a0 R09: 0000000000008000
[ 108.121877] R10: 0000000000000000 R11: ffff89f1fb25a1c0 R12: ffff89f1fd524000
[ 108.121878] R13: 0000000000000535 R14: ffff89f1fd524a6a R15: ffffa37b5bb87b88
[ 108.121879] FS: 0000000000000000(0000) GS:ffff89f1ff000000(0000) knlGS:0000000000000000
[ 108.121880] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 108.121881] CR2: 00007f760f289000 CR3: 0000003e85a0a000 CR4: 00000000003406f0
[ 108.121881] Call Trace:
[ 108.121890] i40e_read_nvm_word_srctl+0x45/0xf0 [i40e]
[ 108.121896] i40e_calc_nvm_checksum+0x15a/0x250 [i40e]
[ 108.121902] i40e_validate_nvm_checksum+0x65/0xc0 [i40e]
[ 108.121908] i40e_diag_eeprom_test+0x4c/0x70 [i40e]
[ 108.121915] i40e_verify_eeprom+0x16/0x80 [i40e]
[ 108.121923] i40e_probe.part.78+0x6cc/0x1450 [i40e]
[ 108.121925] ? _cond_resched+0x15/0x30
[ 108.121926] ? kmem_cache_alloc+0x18d/0x1a0
[ 108.121928] ? __radix_tree_lookup+0x80/0xf0
[ 108.121930] ? __radix_tree_lookup+0x80/0xf0
[ 108.121932] ? irq_get_irq_data+0xa/0x20
[ 108.121933] ? mp_map_pin_to_irq+0xc0/0x330
[ 108.121935] ? kmem_cache_alloc_trace+0x18e/0x1a0
[ 108.121937] ? _raw_spin_unlock_irqrestore+0x11/0x20
[ 108.121939] ? pci_conf1_read+0xb2/0xf0
[ 108.121940] ? pci_read_config_word.part.9+0x6a/0x80
[ 108.121942] ? do_pci_enable_device+0xcd/0x100
[ 108.121944] local_pci_probe+0x44/0xa0
[ 108.121945] ? irq_affinity_notify+0xde/0x100
[ 108.121947] work_for_cpu_fn+0x16/0x20
[ 108.121949] process_one_work+0x158/0x370
[ 108.121950] worker_thread+0x1cf/0x3e0
[ 108.121952] kthread+0xf8/0x130
[ 108.121954] ? rescuer_thread+0x360/0x360
[ 108.121955] ? kthread_bind+0x10/0x10
[ 108.121956] ret_from_fork+0x22/0x40
[ 108.121958] Code: 00 0f 1f 44 00 00 55 48 89 fd 53 bb a0 86 01 00 eb 0f bf e3 53 00 00 e8 d5 b0 a0 fb 83 eb 01 74 13 48 8b 45 00 8b 80 10 61 0b 00 <85> c0 79 e3 31 c0 5b 5d c3 f6 85 c0 06 00 00 80 75 08 5b b8 db
[ 117.033575] random: crng init done
[ 118.434740] INFO: rcu_sched self-detected stall on CPU
[ 118.434745] 0-....: (59982 ticks this GP) idle=312/140000000000001/0 softirq=923/923 fqs=15000
[ 118.434746] (t=60000 jiffies g=111 c=110 q=397)
[ 118.434766] NMI backtrace for cpu 0
[ 118.434768] CPU: 0 PID: 1015 Comm: kworker/0:2 Tainted: G O L 4.15.9 #1
[ 118.434769] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 12/12/2017
[ 118.434772] Workqueue: events work_for_cpu_fn
[ 118.434773] Call Trace:
[ 118.434775] <IRQ>
[ 118.434778] dump_stack+0x5c/0x83
[ 118.434780] nmi_cpu_backtrace+0xc6/0xd0
[ 118.434783] ? lapic_can_unplug_cpu+0xa0/0xa0
[ 118.434784] nmi_trigger_cpumask_backtrace+0xd5/0x110
[ 118.434788] rcu_dump_cpu_stacks+0x89/0xb9
[ 118.434790] rcu_check_callbacks+0x726/0x870
[ 118.434794] ? tick_sched_do_timer+0x60/0x60
[ 118.434795] update_process_times+0x28/0x50
[ 118.434797] tick_sched_handle+0x26/0x60
[ 118.434798] tick_sched_timer+0x34/0x70
[ 118.434801] __hrtimer_run_queues+0xde/0x230
[ 118.434803] hrtimer_interrupt+0x99/0x190
[ 118.434805] smp_apic_timer_interrupt+0x62/0x130
[ 118.434807] apic_timer_interrupt+0x87/0x90
[ 118.434809] </IRQ>
[ 118.434816] RIP: 0010:i40e_poll_sr_srctl_done_bit+0x2a/0x70 [i40e]
[ 118.434817] RSP: 0018:ffffa37b5d92fa98 EFLAGS: 00000a12 ORIG_RAX: ffffffffffffff11
[ 118.434819] RAX: 000000009ea88000 RBX: 00000000000186a0 RCX: ffffa37b5fc00000
[ 118.434819] RDX: ffff89f1fd524546 RSI: 0000000000007aa3 RDI: ffff89f1fb258008
[ 118.434820] RBP: ffff89f1fb258008 R08: 00000000000271a0 R09: 0000000000008000
[ 118.434821] R10: 0000000000000000 R11: ffff89f1fb25a1c0 R12: ffff89f1fd524000
[ 118.434822] R13: 00000000000002a3 R14: ffff89f1fd524546 R15: ffffa37b5bb87b88
[ 118.434830] i40e_read_nvm_word_srctl+0x45/0xf0 [i40e]
[ 118.434836] i40e_calc_nvm_checksum+0x15a/0x250 [i40e]
[ 118.434842] i40e_validate_nvm_checksum+0x65/0xc0 [i40e]
[ 118.434848] i40e_diag_eeprom_test+0x4c/0x70 [i40e]
[ 118.434855] i40e_verify_eeprom+0x16/0x80 [i40e]
[ 118.434862] i40e_probe.part.78+0x6cc/0x1450 [i40e]
[ 118.434989] ? _cond_resched+0x15/0x30
[ 118.434992] ? kmem_cache_alloc+0x18d/0x1a0
[ 118.434994] ? __radix_tree_lookup+0x80/0xf0
[ 118.434995] ? __radix_tree_lookup+0x80/0xf0
[ 118.434997] ? irq_get_irq_data+0xa/0x20
[ 118.434999] ? mp_map_pin_to_irq+0xc0/0x330
[ 118.435000] ? kmem_cache_alloc_trace+0x18e/0x1a0
[ 118.435003] ? _raw_spin_unlock_irqrestore+0x11/0x20
[ 118.435004] ? pci_conf1_read+0xb2/0xf0
[ 118.435005] ? pci_read_config_word.part.9+0x6a/0x80
[ 118.435007] ? do_pci_enable_device+0xcd/0x100
[ 118.435009] local_pci_probe+0x44/0xa0
[ 118.435010] ? irq_affinity_notify+0xde/0x100
[ 118.435012] work_for_cpu_fn+0x16/0x20
[ 118.435014] process_one_work+0x158/0x370
[ 118.435015] worker_thread+0x1cf/0x3e0
[ 118.435017] kthread+0xf8/0x130
[ 118.435018] ? rescuer_thread+0x360/0x360
[ 118.435020] ? kthread_bind+0x10/0x10
[ 118.435021] ret_from_fork+0x22/0x40

These issues were present with the stock updated RHEL 7.4 kernel (3.10.0-693.21.1.el7.x86_64) and HPE provided i40e driver (2.3.6 – kmod-hp-i40e-2.3.6-1.rhel7u4.x86_64.rpm), and custom-compiled 4.15.9 one, with the latest open source i40e driver (2.4.6 – https://sourceforge.net/projects/e1000/files/i40e%20stable/), with all latest NIC NVM updates.

With the stock kernel things were even more serious with rcu_sched CPU stalls (even with RHEL 7.5 Beta):

Perhaps this is an issue of this particular AMD EPYC and Intel X710 combo but to me this is a serious issue when considering whether or not to deploy such server to a production environment.

Closing thoughts

Yes, it’s nice to see 128 CPU threads in htop but still, when looking at the synthetic benchmark performance results, considering the issues with the kernels and CPU soft lockups, I have to admit that AMD EPYC platform is still pretty immature for the enterprise IT world. That 1 GHz core base clock difference is a pretty obvious one – Intel’s 6154 destroys AMD’s 7551 in multi-core and single-core tests.

I think it can concluded that for a general purpose virtualization environment Intel’s Xeon 6154 is a much better choice than AMD’s EPYC 7551, despite the fact that dual EPYC 7551 has more cores (56 to be precise) than Xeon 6154 and can house more memory (2 TB AMD vs 1.5 TB Intel). Stability and enterprise readiness it still on Intel’s side but I’m happy to see AMD making steps in the right direction.

One thought on “Intel Xeon 6154 (DL380) vs AMD EPYC 7551 (DL385) on HPE ProLiant Gen10”

  1. Very informative. It’s nice to knew the Intel Xeon 6154 (DL380) differences with AMD EPYC 7551 (DL385) on HPE ProLiant Gen10. I only use AMD, but after read your post, I think I will consider to try the Intel’s Xeon 6154 :). Regards

Leave a Reply

Your email address will not be published. Required fields are marked *

75 + = 85