Broadcom NICs dropping out on Solaris 10
June 14th, 2010
Update 2010-07-01: Sun got back to one of the blog commenters regarding the issue with Broadcom NICs dropping out on HP servers and stated the issue relates to the HP supplied Broadcom drivers, and Sun recommended using these. So HP people may be seeing a different issue. Please see this blog comment for details. Many thanks for passing this information on Daniel!.
BREAKING NEWS - 2010-06-25 11:30 BST (GMT+1): I’ve just spoken with a chap called mui on #opensolaris on irc.freenode.net who reports that this issue relates to “C States”. Disabling “C States” in the BIOS (It’s in “Processor Settings” on Dell boxes) supposedly will work-around the issue. C States support was added in Solaris 10 update 8, so this is probably why our Solaris 10 update 7 boxes are unaffected.
Supposedly Sun/Oracle have a patch internally they can supply to you for Solaris 10 if you have a support contract. If you’re on OpenSolaris, Mui has made this package available that works with snv_134. DISCLAIMER: Please test this prior to putting it into production as it’s provided with no warranty. Alternatively you might be able to grab the latest 6.0.1 BNX driver from the on-closed-bins.i386.tar.bz2 package on the OpenSolaris website.
Here’s the rest of the (now somewhat out of date) post…
We’ve encountered this bug quite a few times and up until I found these bug reports, we weren’t sure what was causing the issue:
S10 bnx NICs randomly hang/drop out of the network
The symptoms are basically that the server loses network connectivity - traffic just stalls. Because this keeps happening on production boxes we have to reboot pretty damn quickly so haven’t had an opportunity to diagnose the issue in detail. We tried a number of fixes to no avail, and I was at my wits end until I encountered the above bug report.
Our servers are Dell R410 machines and we’ve seen this happening on Dell R710 machines as well, with Solaris 10 update 8. We’re running with the latest Solaris 10 patches and the latest Broadcom drivers from the Broadcom website (5.2.2). I believe we’ve seen this issue with the stock drivers shipped with Solaris 10 update 8 as well.
From the bug reports, the issue seems related to the firmware running on the cards - version 5* is affected, version 4* isn’t. I believe the Firmware is tied to the Dell BIOS running on the machine. Here’s the output from one of our affected boxes:
# prtdiag | head -n 2 System Configuration: Dell Inc. PowerEdge R410 BIOS Configuration: Dell Inc. 1.3.9 04/07/2010 # grep -i BCM /var/adm/mes* /var/adm/messages:Jun 12 03:21:38 bnx: [ID 995108 kern.info] NOTICE: bnx0: BCM5709 device with F/W Ver500000b is initialized. /var/adm/messages:Jun 12 03:21:38 bnx: [ID 995108 kern.info] NOTICE: bnx1: BCM5709 device with F/W Ver500000b is initialized.
Here is the output from a machine that’s not affected:
# prtdiag | head -n 2 System Configuration: Dell Inc. PowerEdge R410 BIOS Configuration: Dell Inc. 1.1.5 07/29/2009 # grep BCM /var/adm/messages* /var/adm/messages.2:May 27 15:11:43 bnx: [ID 995108 kern.info] NOTICE: bnx1: BCM5709 device with F/W Ver4060004 is initialized. /var/adm/messages.2:May 27 15:11:43 bnx: [ID 995108 kern.info] NOTICE: bnx0: BCM5709 device with F/W Ver4060004 is initialized.
My understanding is that the fix is to downgrade the BIOS of the machine to a previous release that uses a 4* Broadcom Firmware release. We haven’t yet tested this but should be able to later this week. So far it doesn’t look like Sun/Oracle have released a publicly available patch to address the issue.
Update: 2010-06-25 - Upgrading/Downgrading the system BIOS makes no difference to the Broadcom FW (duh! silly me). I’ve written an updated post with more information here: http://blogs.everycity.co.uk/alasdair/2010/06/update-to-broadcom-nic-dropping-out-on-solaris-issue/
Entry Filed under: Solaris

13 Comments Add your own
1. Tim | June 15th, 2010 at 3:10 pm
Hi, experiencing the same issue as described. Solaris 10/09, latest patch cluster, Dell R610. Upgrading to latest Broadcom driver did not solve the issue.
Did the firmware downgrade resolve the issue?
Any suggestions appreciated.
Thank you.
-Tim
2. Alasdair | June 16th, 2010 at 7:29 am
Hi Tim,
I haven’t yet had a chance to downgrade the BIOS on the affected machines, but we use identical builds of Solaris across all physical hosts and only the ones with the latest Dell BIOS are doing it.
I’ll let you know how we get on.
Cheers,
Alasdair
3. James | June 16th, 2010 at 11:04 am
I suspect we are seeing the same issue on HP Proliant DL380 G6. We have a case with HP, though they do not believe that the issue we are seeing is related I have yet to be convinced. If anybody is interested some background is hear: http://forums.sun.com/thread.jspa?messageID=11005595�
4. Alasdair | June 16th, 2010 at 11:11 am
Hi James,
What does “grep -i BCM /var/adm/mes*” show for the Firmware revision?
Cheers,
Alasdair
5. Daniel | June 16th, 2010 at 11:22 am
Hi,
I, along with James, also have seen this issue on HP DL380 G6 servers with Broadcom 5709 nic’s.
I’ve seen it on Firmware Ver4060004 and Ver5020002 and drivers 4.6.2 and 5.2.2
great (in a way) that it’s not just James and I having this issue
6. James | June 17th, 2010 at 12:00 pm
Like Daniel I have seen it on version 4 and version 5. If you look at bug 6938878 which is on the related list of bug 6926051 they mention broadcoms code has updated to 6.0.1 and intergrated into OpenSolaris. I’m pushing our vendor for a fix from broadcom if possible.
7. Teddie | June 21st, 2010 at 6:01 pm
We’re experiencing the same thing with a Dell R410.. will be trying out BIOS downgrade soon
8. Alasdair | June 22nd, 2010 at 8:12 am
For those with Dell servers wishing to downgrade to a previous BIOS release, Dell have an FTP site which has these:
http://ftp.dell.com/bios/
For example, here are the Linux BIOS update binaries for a Dell R410:
PER410_BIOS_LX_1.0.5.BIN
PER410_BIOS_LX_1.1.5.BIN
PER410_BIOS_LX_1.2.4.BIN
PER410_BIOS_LX_1.3.8.BIN
PER410_BIOS_LX_1.3.9.BIN
Cheers,
Alasdair
9. Teddie | June 22nd, 2010 at 3:47 pm
Anyone tried yet on the earlier R410 BIOS’s? The release notes don’t contain anything about a change in Broadcom firmware.
10. Alasdair | June 25th, 2010 at 9:37 am
Hi All,
I’ve done a new post here which has the latest info in:
http://blogs.everycity.co.uk/alasdair/2010/06/update-to-broadcom-nic-dropping-out-on-solaris-issue/
Cheers,
Alasdair
11. Alasdair | June 25th, 2010 at 10:33 am
Hi All,
I’ve just spoken with a chap called mui on #opensolaris on irc.freenode.net who reports that this issue relates to “C States”. Disabling “C States” in the BIOS (It’s in “Processor Settings” on Dell boxes) supposedly will work-around the issue. C States support was added in Solaris 10 update 8, so this is probably why our Solaris 10 update 7 boxes are unaffected.
Cheers,
Alasdair
12. Teddie | June 28th, 2010 at 3:47 pm
Nice, we are trying this out currently. The problem is a bit unfortunate as it only comes out under some load.
13. Attila | August 17th, 2010 at 8:33 pm
All,
we were experiencing exactly the same problem on a Solaris 10u8 installation running on a VSphere cluster. The initially configured virtual NIC was an Intel e1000g. Switching to VMwares own vmxnet3s virtual NIC seems to have solved the problem but since this is load related one can’t be sure. The underlying hardware consists of DELL R710 servers with broadcom NICs. Very strange.
Attila
Leave a Comment
Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>
Trackback this post | Subscribe to the comments via RSS Feed