More Solaris Broadcom Driver Information
June 26th, 2010
Update 2010-07-01: Sun got back to one of the blog commenters regarding the issue with Broadcom NICs dropping out on HP servers and stated the issue relates to the HP supplied Broadcom drivers, and Sun recommended using these. So HP people may be seeing a different issue. Please see this blog comment for details. Many thanks for passing this information on Daniel!.
As previously mentioned, we’ve been having a nightmare with Broadcom NICs suddenly dropping out / hanging / freezing. All network traffic ceases / halts, despite the interfaces being up and showing no signs of any issues. This issue started affecting us after rolling out an upgrade to Solaris 10 update 8, but it also affects recent OpenSolaris builds. This has been on Dell R410 servers and R710 servers, and we’ve heard about people on HP servers having the same issue.
We thankfully found a workaround for it, which basically consists of disabling C-States in the BIOS. This is a power saving feature and support for it was added into Solaris 10 update 8, which is where we’re seeing the issue.
However prior to finding this workaround, I contacted Broadcom via their “Submit a support request” feature on their website. Nobody got back to me, and we were getting rather desperate so I was rather naughty and dropped one of their Kernel driver engineers a direct email. I won’t say who as he probably doesn’t want others mailing him directly.
The chap replied promptly, which was very impressive. He was very polite and explained that he couldn’t really help customers directly, as the OEM suppliers get upset, but he did offer some hints/tips. He mentioned that MSI-X was causing issues on Linux and suggested disabling it if we’re using v5.2.3 drivers or later. We’re not, we’re on 5.2.2 and 5.2.2 is the newest release available on the Broadcom website, so that was quite interesting.
He attached the release notes for the 6.0.1 driver which isn’t publicly available yet. Here is a snippet of the contents:
Broadcom NetXtreme II Gigabit Ethernet Driver
For Solaris 10 for i386 platform
Copyright (c) 2000-2010 Broadcom Corporation
All rights reserved.
Version 6.0.1 (21 May, 2010)
============================
Fixes
-----
1) Problem : default MTU now set to 1500, fixed jumboframe
and vlan issues.
Cause : buffer sizes weren't being allocated properly
to account for MAC header overhead w/ vlan tags
Change : allocations are now correct
2) Problem : when MSIX interrupt allocation failed driver
fails to attach
Cause : code didn't exist to revert down to Fixed
Change : driver now reverts to Fixed when MSIX interrupt
allocation fails
Version 5.2.3 (23 March, 2010)
==============================
Enhancements
------------
1) Change : Reworked interrupt code to no longer use deprecated
Solaris interrupt APIs.
2) Change : Added support for MSI-X interrupts. MSI-X is now used
by default and can be turned off via "disable_msix"
inside bnx.conf. When MSI-X is disabled then Fixed
level interrupts are used.
2) Change : Added a new "statistics" group to kstat which contains
driver version and interrupt information.
Version 5.2.2 (14 December, 2009)
=================================
Fixes
-----
1) Problem : Kernel Panic in the send routine:
assertion failed: umpacket->mp == NULL,
Cause : The umpacket->mp was not scrubbed properly because
the umpacket never went through the
bnx_xmit_ring_reclaim() function.
Change : After recycling the packet in the TX routine,
the packet is now reclaimed before it is being used.
The 6.x driver for Solaris 10 should hopefully be available later this year. The one that’s in OpenSolaris unfortunately can’t be used with Solaris 10 due to network stack differences.
But the interesting thing is that there *is* a newer 5.2.3 Driver out there that came out in March this year. So I had a google, and it looks like that this driver has been supplied to OEMs but still isn’t available from Broadcom directly. So I downloaded an IBM Driver ISO Image that contains this newer driver, and it installs fine. We’re going to be using this in conjunction with disabling C-States and I’ll report back on how that combination is going.
After discovering the C-States workaround for the NIC dropouts I mailed the Broadcom guy again to let him know, and stated we’d be disabling C-States to see if it fixes the issue. He replied with:
Please let me know if this works for you so that I can pass it on to our Solaris developers. I checked with them to see if this was a known issue and they replied that they had been trying to duplicate the problem but had not been successful to date. When performance testing we often disable certain CPU features in order to maximize Ethernet throughput so it may be that the system BIOS settings are the key difference here.
So this is very encouraging – hopefully this tip will enable the Broadcom Solaris engineers to reproduce the issue and fix it.
Another final thing – to keep all our servers identical, in addition to flashing the system bios, DRAC Firmware and LSI/SAS6i Firmware, we’ve now started upgrading the Firmware on all the Broadcom NICs too.
This is easier said than done. My method involved producing a 2.88MB Dos boot image with the appropriate files, taken from various places. I nabbed the latest Dell Broadcom NIC Firmware Linux package to get the firmware files. I then pinched the DOS uxdiag.exe tool from the Broadcom diagnostics ISO to do the upgrades. I then produced a .bat file which runs:
uxdiag -c 1 -t abcd -F -fbc bc09x50b.bin uxdiag -c 2 -t abcd -F -fbc bc09x50b.bin uxdiag -c 1 -t abcd -F -fncsi ncsifw_x.205 uxdiag -c 2 -t abcd -F -fncsi ncsifw_x.205 uxdiag -c 1 -t abcd -F -fib_ipv4n6 ib6btv41.06 uxdiag -c 2 -t abcd -F -fib_ipv4n6 ib6btv41.06 uxdiag -c 1 -t abcd -F -fmba bxmba508.nic uxdiag -c 2 -t abcd -F -fmba bxmba508.nic uxdiag -c 1 -t abcd -mfw 0 uxdiag -c 2 -t abcd -mfw 0
What a lot of faffing about. You’d think Dell would make this stuff easier to do. Anyway, if you’re interested, please feel free to download my Broadcom DOS Firmware update disk image.
Entry Filed under: General

14 Comments Add your own
1. Daniel | June 28th, 2010 at 12:02 am
hi alasdair,
for my understanding is the current proposed solution:
5.2.3 with MSI-X disabled
OR
5.2.3 with C-States disabled
OR
5.2.3 no other changes
or a combination of above?
cheers
2. Alasdair | June 28th, 2010 at 8:06 am
Hi Daniel,
I believe “C-States Disabled” is necessary under any driver/firmware version to work around the issue.
The latest drivers/firmware won’t fix the issue alone. But there are a few bug fixes in 5.2.3 that look interesting.
MSI support was introduced in the 5.2.3 drivers – I’ve left it enabled and haven’t seen any issues. I think the MSI problems the broadcom guy spoke about are mostly a Linux thing.
Good luck!
3. Daniel | July 1st, 2010 at 4:04 am
Sun got back to me and said the issue:
- is nothing to do with c-states
- nothing to do with the previously opensolaris bug
- nothing to do with driver revisions
Unrelated!
They said that the fix was to remove the Hp/Broadcom driver and use the Sun BRCMbnx driver (which is a v4.6.3). They also listed various patches to put on (basically the latest patch cluster).
So i’ll re-enable C-states and put on the Sun driver and see how it goes.
Are you using Sun or vendor driver packs?
The HP docs are ambigous on the topic.
Cheers
4. Alasdair | July 1st, 2010 at 5:51 am
Hi Daniel,
That’s very strange indeed.
No, we were using both the stock Solaris 10u8 BRCMbnx package and the Broadcom supplied BRCMbnx 5.2.2 package from their website, with the latest patch cluster.
We’re reasonably sure disabling C-States has fixed the issue for us, as we haven’t seen a re-occurrence (yet).
So perhaps the issue on HP servers is different to the issue we were having on Dell servers. I’ll put a note at the top of this blog post for HP people.
Cheers,
Alasdair
5. Alasdair | July 1st, 2010 at 5:51 am
Oh and thanks for commenting to let others know – much appreciated! :)
6. Teddie | July 1st, 2010 at 8:01 am
We have also been running network load test on an R410 with the C-states disabled and haven’t had reoccurrence yet. This has been done now for around 40 hours.. some 12h more and it will go to production.
Interesting part with the problem was that it only occurred under network or some other load. So when production load was taken off from the server, the network card did work for an extended period.
7. nada | July 6th, 2010 at 12:51 pm
The link to the IBM Driver ISO Image above is broken. Here is a working one (tested today):
http://www-947.ibm.com/systems/support/supportsite.wss/docdisplay?brandind=5000008&lndocid=MIGR-5081963
8. Jack Howard | July 7th, 2010 at 8:26 am
FYI Guys –
Some decent thoughts about this here;
http://forums.sun.com/thread.jspathreadID=5440714&start=0&tstart=0
As I mention in the post there, we’ve been running the same hardware with U7 wit no problems for 9 months. As soon as I patched to U8+ patch cluster we got the failures. Strongly suspect it is in fact related to C-states. It only affects my servers [HP DL360 G6's] that have the newer Xeon E5440′s that support C6 mode.
9. James | July 7th, 2010 at 4:55 pm
Hi Alasdair,
So how long ago did you change the C-state on your boxes and are they still up with no re-occurrence of the issue ?
We have the same issue with the Dell 710s and sometimes the issues takes weeks to occur and others twice a day.
Just curious on the length of time.
Thanks.
10. Alasdair | July 8th, 2010 at 10:30 am
Hi James,
We disabled C-States over two weeks ago on a collection of servers that were exhibiting the issue. It hasn’t happened since. So we’re satisfied this fixes the issue.
11. fred | July 30th, 2010 at 12:55 am
Sun have supplied a S10U8 bnx 6.0.1 driver under an IDR patch 144627-02 – i have it on 1 system so far (which hasn’t failed but also neither have the other ones with the old driver)
It’s hard to get specifics but they aren’t 100% certain that this is the fix.
I’m told it will never be available publicly under S10u8 and that it’s targeted for S10u10 (and no timeframe on s10u10)
12. Alasdair | July 30th, 2010 at 8:13 am
Hi Fred,
Interesting – thanks for the info! :)
That’s a bit of a shame that they won’t be providing it for S10u8, but not too much of a surprise.
I have reason to believe Broadcom might be making it available later this year on their driver download page. So hopefully it’ll fix the issue.
Cheers,
Alasdair
13. evg | November 7th, 2010 at 4:55 pm
Hi,
I can confirm that upgrading our Dell R710 servers to the latest BIOS 2.1.15 and latest Broadcom firmware (5.0.13) solves this issue. We are using latest Broadcom 6.0.3 driver.
cheers,
evg
14. Martin Matuska | November 12th, 2010 at 1:28 am
Broadcom has released the Solaris 10 driver version 6.0.3 on their website (Nov 5th, 2010).
Leave a Comment
Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>
Trackback this post | Subscribe to the comments via RSS Feed