Page MenuHomePhabricator

cp3033 unreacheable since 2018-07-15 11:47:31
Closed, DuplicatePublic

Description

cp3033 is unreachable via the production interface since 2018-07-15 11:47:31, mgmt interface is reachable and the console doesn't show nothing out of the ordinary, after logging, dmesg log shows NIC issues

Event Timeline

root@cp3033:/var/log# ethtool -i eth0
driver: bnx2x
version: 1.712.30-0
firmware-version: FFV7.10.17 bc 7.10.11
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
root@cp3033:/var/log# ethtool eth0
Settings for eth0:
	Supported ports: [ FIBRE ]
	Supported link modes:   1000baseT/Full
	                        10000baseT/Full
	Supported pause frame use: Symmetric Receive-only
	Supports auto-negotiation: No
	Advertised link modes:  10000baseT/Full
	Advertised pause frame use: No
	Advertised auto-negotiation: No
	Speed: Unknown!
	Duplex: Unknown! (255)
	Port: FIBRE
	PHYAD: 1
	Transceiver: internal
	Auto-negotiation: off
	Supports Wake-on: g
	Wake-on: d
	Current message level: 0x00000000 (0)

	Link detected: no
[10415964.660782] ------------[ cut here ]------------
[10415964.660790] WARNING: CPU: 13 PID: 34222 at /srv/kernel/linux/net/sched/sch_generic.c:316 dev_watchdog+0x226/0x230
[10415964.660793] NETDEV WATCHDOG: eth0 (bnx2x): transmit queue 6 timed out
[10415964.660793] Modules linked in: cdc_ether usbnet mii joydev hid_generic usbhid hid cpuid binfmt_misc esp6 xfrm6_mode_transport drbg ansi_cprng seqiv xfrm4_mode_transport cpufreq_conservative cpufreq_powersave cpufreq_userspace xfrm_user xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 af_key xfrm_algo 8021q garp mrp stp llc tcp_bbr sch_fq intel_rapl sb_edac ipmi_watchdog edac_core x86_pkg_temp_thermal intel_powerclamp coretemp mgag200 ttm drm_kms_helper kvm dcdbas irqbypass crct10dif_pclmul iTCO_wdt crc32_pclmul iTCO_vendor_support evdev drm ghash_clmulni_intel pcspkr i2c_algo_bit mei_me lpc_ich mei shpchp mfd_core wmi button ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler autofs4 ext4 crc16 jbd2 fscrypto mbcache raid1 md_mod sg sd_mod ahci libahci aesni_intel aes_x86_64 glue_helper lrw ehci_pci
[10415964.660847]  gf128mul bnx2x ablk_helper ptp ehci_hcd cryptd libata pps_core mdio libcrc32c usbcore crc32c_generic scsi_mod usb_common crc32c_intel
[10415964.660860] CPU: 13 PID: 34222 Comm: cache-worker Not tainted 4.9.0-0.bpo.6-amd64 #1 Debian 4.9.82-1~wmf1
[10415964.660861] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 1.0.4 08/28/2014
[10415964.660863]  0000000000000000 ffffffffa67305e5 ffff8fe9bf183e38 0000000000000000
[10415964.660865]  ffffffffa6479184 0000000000000006 ffff8fe9bf183e90 ffff8fc9b136c000
[10415964.660868]  000000000000000d ffff8fc9b1377100 000000000000005b ffffffffa64791ff
[10415964.660871] Call Trace:
[10415964.660872]  <IRQ>
[10415964.660878]  [<ffffffffa67305e5>] ? dump_stack+0x5c/0x77
[10415964.660882]  [<ffffffffa6479184>] ? __warn+0xc4/0xe0
[10415964.660884]  [<ffffffffa64791ff>] ? warn_slowpath_fmt+0x5f/0x80
[10415964.660888]  [<ffffffffa696e476>] ? tcp_retransmit_timer+0x286/0x890
[10415964.660891]  [<ffffffffa69369a6>] ? dev_watchdog+0x226/0x230
[10415964.660893]  [<ffffffffa6936780>] ? dev_deactivate_queue.constprop.27+0x60/0x60
[10415964.660898]  [<ffffffffa64e85b2>] ? call_timer_fn+0x32/0x130
[10415964.660899]  [<ffffffffa64e9385>] ? run_timer_softirq+0x1e5/0x440
[10415964.660902]  [<ffffffffa67398a4>] ? timerqueue_add+0x54/0xa0
[10415964.660904]  [<ffffffffa64ea808>] ? enqueue_hrtimer+0x38/0x90
[10415964.660909]  [<ffffffffa6a1617c>] ? __do_softirq+0x10c/0x2a2
[10415964.660911]  [<ffffffffa647f4b8>] ? irq_exit+0x98/0xa0
[10415964.660913]  [<ffffffffa6a15c14>] ? smp_apic_timer_interrupt+0x44/0x50
[10415964.660915]  [<ffffffffa6a14496>] ? apic_timer_interrupt+0x96/0xa0
[10415964.660916]  <EOI>
[10415964.660920]  [<ffffffffa64c5bb3>] ? native_queued_spin_lock_slowpath+0x113/0x190
[10415964.660922]  [<ffffffffa6a1245d>] ? _raw_spin_lock+0x1d/0x20
[10415964.660924]  [<ffffffffa64fb018>] ? futex_wake+0xc8/0x170
[10415964.660926]  [<ffffffffa64fd149>] ? do_futex+0x2d9/0xb40
[10415964.660930]  [<ffffffffa64257d9>] ? __switch_to+0x2c9/0x730
[10415964.660932]  [<ffffffffa64fda33>] ? SyS_futex+0x83/0x180
[10415964.660936]  [<ffffffffa6a0dd52>] ? schedule+0x32/0x80
[10415964.660939]  [<ffffffffa6403bd3>] ? do_syscall_64+0x93/0x1a0
[10415964.660941]  [<ffffffffa6a126b8>] ? entry_SYSCALL_64_after_swapgs+0x42/0xb0
[10415964.660942] ---[ end trace 17a2f2dfd85d5ced ]---

After a power cycle the server it's behaving properly. Since it was already depooled I'm not repooling it

Vgutierrez triaged this task as Medium priority.Jul 16 2018, 1:56 PM

That sounds like a hang in the NIC, but I doubt we have any useful hardware diagnostics/logging on that level.

The host also shows that power supplies are not redundant.. which had a comment linking to T177403 -> T177228.

And support has expired (https://netbox.wikimedia.org/dcim/devices/831/)

Should we rather create a decom ticket for it?

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy