Alasdair on Everything

Posts filed under 'Sun Hardware'

Flashing LSI SAS HBA Raid Cards on Sun Fire servers

We have a fair bunch of Sun Fire x2100, x2200 and x2250 servers, all of which we slap LSI SAS HBA cards (LSISAS3041E-R) in (Sun part code SG-XPCIE4SAS3-Z or "4 port SAS PCIE Internal HBA B3"). They’re not the best RAID cards out there (Management software is exceptionally poor), but they’re fast and well supported by Solaris, Linux and Windows. Periodically LSI release Firmware/BIOS updates for the cards, and believe me - if you care about your data, install the firmware updates.

Unfortunately it’s not made easy for you. You need to use something called a "DOS Boot Disk" After doing some research, I learned that DOS was the operating system used on early PCs based on the 8086 chip back in 1981. It was distributed on something called a "floppy disk" - a piece of hardware I can only assume computers of the era shipped with. Unfortunately in this modern day age, computers no longer have them. This makes updating the LSI card quite tricky to say the least.

Fret not however! There are methods of getting by without the required floppy disk drive.

DrDos, UltraISO and Nero Burning Rom

Download this humble DrDos image. Unzip. Obtain and install the trial version of UltraISO. Open UltraISO. Select "Open" from the "File" menu, and open the unzipped drdosmin.img file. Select "Change Image Format" from the "Actions" menu. Choose 2.88MB, select a location to save the file, and save.

Download the latest firmware files for your LSI raid card. Unzip. Drag the files into the UltraISO window, which should add the files to the image. Save.

Using your favourite CD Burning software, create a new "Bootable CD". In Nero you select "New Compilation" from the "File" menu, then select "CD-ROM (Boot)". From the "Boot" tab, select the Image file you created above with the firmware files on. Tick "Enable Expert Settings", and choose "Floppy Emulation 2.88MB". Burn to an ISO image by choosing the "Image Recorder" via "Choose Recorder" from the "Recorder Menu". (Note older Nero versions don’t support burning to anything other than Nero’s native image format which is no good - you’ll probably have to burn an actual CD)

iLOM, Beautiful iLOM

Head into your iLOM, mount the ISO you created, reboot, set the boot order if needed, and boot up your lovely ISO. You should be met with this:

Type "hbaflash" and hit return. Say no to the question asking you if you want to save a copy of your bios - saving to a read only CD-Rom won’t work. It is a good idea to save the old version, but sadly it’s not possible with this method.

Answer the on-screen questions. Be very very careful - giving the wrong answers may lead to your RAID card ceasing to function and/or bursting into flames. The main one to get right is the "Which Chip Version?" question. The answer is actually above under the "Ctrl" column - mine reads 1064E(B3), therefore it’s the B3 chip.

Once answering the questions, congratulations - your raid card will now be slightly less (or more, depending on how buggy the new bios release is) likely to frag your data.

Add comment December 8th, 2008

SNMP Monitoring of LSI MegaRaid Cards

We use LSI 3041E raid cards (which use the SAS1064ET chipset) in a bunch of our Sun x2100 and x2200 Servers, and naturally you want a simple and straight forward method of monitoring the raid status.

Checking the Raid Status on Linux

On Linux, we opted for the simple and easy to use mpt-status utility, which you can script easily. You can install it straight from Debian apt-get, although it doesn’t seem to be in the normal CentOS Yum repositories. It’s pretty easy to use, as this demonstrates:

# mpt-status
open /dev/mptctl: No such file or directory
  Try: mknod /dev/mptctl c 10 220
Make sure mptctl is loaded into the kernel

# modprobe mptctl
# mpt-status

You seem to have no SCSI disks attached to your HBA or you have
them on a different scsi_id. To get your SCSI id, run:

    mpt-status -p

# mpt-status -p
Checking for SCSI ID:0
Checking for SCSI ID:1
Checking for SCSI ID:2
Found SCSI id=2, use ''mpt-status -i 2`` to get more information.

# mpt-status -i 2
ioc0 vol_id 2 type IM, 2 phy, 135 GB, state OPTIMAL, flags ENABLED
ioc0 phy 1 scsi_id 4 SEAGATE  ST314654SSUN146G 022D, 136 GB, state ONLINE, flags NONE
ioc0 phy 0 scsi_id 3 SEAGATE  ST3146855SS      0002, 136 GB, state ONLINE, flags NONE

You can then write a simple bash script to check that the status is “OPTIMAL”, and set up some kind of remote monitoring to access it via SNMP or Nagios NRPE.

Checking the Raid Status on Windows

On Windows Server 2003/2008, for remote monitoring your best (only?) option is to install Windows SNMP, and install LSI MegaRaid Storage Manager with the SNMP plugin. You can download the LSI MegaRaid Storage Manager from LSI’s website. Once SNMP and the MegaRaid SNMP plugin are installed, you should be able to snmpwalk your Windows server:

root mibs (mon01): snmpwalk -v1 -c public w01.someserver.everycity.co.uk | head
SNMPv2-MIB::sysDescr.0 = STRING: Hardware: x86 Family 15 Model 67 Stepping 3 AT/AT COMPATIBLE - Software: Windows Version 5.2 (Build 3790 Multiprocessor Free)
SNMPv2-MIB::sysObjectID.0 = OID: SNMPv2-SMI::enterprises.311.1.1.3.1.2
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (819029) 2:16:30.29
SNMPv2-MIB::sysContact.0 = STRING:
SNMPv2-MIB::sysName.0 = STRING: W01-SOMESERVER
...

Great! Now, you need the LSI Mib Files. Technically you don’t "need" them to check the relevant SNMP OIDs, but it’s helpful to know what you’re querying. I obtained them by downloading and digging through the Linux version of LSI MegaRaid Storage Manager. At the time of writing this was MSM_Linux_28800.zip Inside this is a tar.gz file called MSM_linux_installer-2.88-00.tar.gz. Inside this are 4 RPM files. This is starting to remind me of Russian dolls. Inside sas_ir_snmp-3.16-1002.i386.rpm and sas_snmp-3.16-1002.i386.rpm (Which you can extract with "rpm2cpio *.rpm | cpio -idmv"). Finally you can get your two MIB files:

./etc/lsi_mrdsnmp/sas/LSI-AdapterSAS.mib
./etc/lsi_mrdsnmp/sas-ir/LSI-AdapterSASIR.mib

If you don’t want to arse around, and lets face it who enjoys arsing around, please enjoy LSI-AdapterSAS.mib and LSI-AdapterSASIR.mib.

On a typical ucd/net SNMP install, you’d place these in /usr/share/snmp/mibs. There’s a good guide on ensuring the mibs get loaded when you call tools such as snmpwalk, which means instead of getting:

# snmpwalk -v1 -c public w01.someserver.everycity.co.uk .1.3.6.1.4.1.3582 | head -n 50
SNMPv2-SMI::enterprises.3582.4.1.1.1 = STRING: "W01-SOMESERVER"
SNMPv2-SMI::enterprises.3582.4.1.2.1 = STRING: "Microsoft Windows 2003 Service Pack 2.0"
SNMPv2-SMI::enterprises.3582.4.1.3.1.1 = STRING: "1.23-02"
SNMPv2-SMI::enterprises.3582.4.1.3.2.1 = STRING: "lsi_mrdsnmpagent.dll"
SNMPv2-SMI::enterprises.3582.4.1.3.3.1 = STRING: "3.16.0.1"
SNMPv2-SMI::enterprises.3582.4.1.3.4.1 = STRING: "28th May 2008"
SNMPv2-SMI::enterprises.3582.4.1.9.1.1 = STRING: "LSI Corporation"
SNMPv2-SMI::enterprises.3582.5.1.1.1 = STRING: "W01-SOMESERVER"
SNMPv2-SMI::enterprises.3582.5.1.2.1 = STRING: "Microsoft Windows 2003 Service Pack 2.0"
SNMPv2-SMI::enterprises.3582.5.1.3.1.1 = STRING: "1.14-01"
SNMPv2-SMI::enterprises.3582.5.1.3.2.1 = STRING: "lsi_mrdsnmpagent.dll"
SNMPv2-SMI::enterprises.3582.5.1.3.3.1 = STRING: "3.16.0.1"
SNMPv2-SMI::enterprises.3582.5.1.3.4.1 = STRING: "28th May 2008"

You get:

LSI-MegaRAID-SAS-MIB::hostName.1 = STRING: "W01-SOMESERVER"
LSI-MegaRAID-SAS-MIB::hostOSInfo.1 = STRING: "Microsoft Windows 2003 Service Pack 2.0"
LSI-MegaRAID-SAS-MIB::mibVersion.1 = STRING: "1.23-02"
LSI-MegaRAID-SAS-MIB::agentModuleName.1 = STRING: "lsi_mrdsnmpagent.dll"
LSI-MegaRAID-SAS-MIB::agentModuleVersion.1 = STRING: "3.16.0.1"
LSI-MegaRAID-SAS-MIB::releaseDate.1 = STRING: "28th May 2008"
LSI-MegaRAID-SAS-MIB::copyright.1 = STRING: "LSI Corporation"
LSI-megaRAID-SAS-IR-MIB::hostName.1 = STRING: "W01-SOMESERVER"
LSI-megaRAID-SAS-IR-MIB::hostOSInfo.1 = STRING: "Microsoft Windows 2003 Service Pack 2.0"
LSI-megaRAID-SAS-IR-MIB::mibVersion.1 = STRING: "1.14-01"
LSI-megaRAID-SAS-IR-MIB::agentModuleName.1 = STRING: "lsi_mrdsnmpagent.dll"
LSI-megaRAID-SAS-IR-MIB::agentModuleVersion.1 = STRING: "3.16.0.1"
LSI-megaRAID-SAS-IR-MIB::releaseDate.1 = STRING: "28th May 2008"

This is obviously much more readable and understandable. You can also view the comments in the MIB file, for example:

pdDiskPredFailureCount                OBJECT-TYPE
    SYNTAX                      INTEGER
    ACCESS                      read-only
    STATUS                      optional
    DESCRIPTION                 "Number of disk devices in this adapter those are critical"

alarmStatus                OBJECT-TYPE
    SYNTAX                      INTEGER{
                                status-ok(1),
                                status-critical(2),
                                status-nonCritical(3),
                                status-unrecoverable(4),
                                status-not-installed(5),
                                status-unknown(6),
                                status-not-available(7)
                                }

Depending on the model of your RAID card, the most useful OIDs to monitor are:

# snmptranslate -IR -On vdDegradedCount
.1.3.6.1.4.1.3582.4.1.4.1.2.1.19

# snmptranslate -IR -On vdOfflineCount
.1.3.6.1.4.1.3582.4.1.4.1.2.1.20

# snmptranslate -IR -On pdDiskFailedCount
.1.3.6.1.4.1.3582.4.1.4.1.2.1.24

# snmptranslate -IR -On pdDiskPredFailureCount
.1.3.6.1.4.1.3582.4.1.4.1.2.1.23

Or:

# snmptranslate -IR -On vdDegradedCount
.1.3.6.1.4.1.3582.5.1.4.1.1.3.1.20

# snmptranslate -IR -On vdOfflineCount
.1.3.6.1.4.1.3582.5.1.4.1.1.3.1.21

# snmptranslate -IR -On pdDiskFailedCount
.1.3.6.1.4.1.3582.5.1.4.1.1.3.1.25

# snmptranslate -IR -On pdDiskPredFailureCount
.1.3.6.1.4.1.3582.5.1.4.1.1.3.1.24

All of which should be zero. You can script snmpget or use nagios’s snmp plugin directly to monitor these values.

Last bot not least, checking on Solaris

Solaris is the easiest of all:

# raidctl -l
Controller: 1
        Volume:c1t0d0
        Disk: 0.1.0
        Disk: 0.2.0

# raidctl -l c1t0d0
Volume                  Size    Stripe  Status   Cache  RAID
        Sub                     Size                    Level
                Disk
----------------------------------------------------------------
c1t0d0                  135.9G  N/A     OPTIMAL  OFF    RAID1
                0.1.0   135.9G          GOOD
                0.2.0   135.9G          GOOD

Enjoy!

1 comment November 18th, 2008

Sun x4500 Thumper: Mapping logical drives to physical

The Sun x4500 has 48 disk slots, numbered 0 to 47. However on Solaris, drives are named according to their controller/target location. I was wondering how you work out how to go from the logical naming, to the physical one.

Well the answer lays on the x4500 Tools & Drivers CD. On it is a nifty package named "SUNWhd-1.07.pkg", which plonks a utility called "hd" at "/opt/SUNWhd/hd/bin/hd". Running spits out the serial numbers of the disks, their temperature, and at the end, it finally spits out some ASCII art depicting the layout:

---------------------SunFireX4500------Rear----------------------------

36:   37:   38:   39:   40:   41:   42:   43:   44:   45:   46:   47:
c4t3  c4t7  c3t3  c3t7  c6t3  c6t7  c5t3  c5t7  c1t3  c1t7  c0t3  c0t7
^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++
24:   25:   26:   27:   28:   29:   30:   31:   32:   33:   34:   35:
c4t2  c4t6  c3t2  c3t6  c6t2  c6t6  c5t2  c5t6  c1t2  c1t6  c0t2  c0t6
^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++
12:   13:   14:   15:   16:   17:   18:   19:   20:   21:   22:   23:
c4t1  c4t5  c3t1  c3t5  c6t1  c6t5  c5t1  c5t5  c1t1  c1t5  c0t1  c0t5
^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++
 0:    1:    2:    3:    4:    5:    6:    7:    8:    9:   10:   11:
c4t0  c4t4  c3t0  c3t4  c6t0  c6t4  c5t0  c5t4  c1t0  c1t4  c0t0  c0t4
^b+   ^b+   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++
-------*-----------*-SunFireX4500--*---Front-----*-----------*----------

Rather funky, and useful!

Add comment November 16th, 2008