Posts filed under 'Storage'
SNMP Monitoring of LSI MegaRaid Cards
We use LSI 3041E raid cards (which use the SAS1064ET chipset) in a bunch of our Sun x2100 and x2200 Servers, and naturally you want a simple and straight forward method of monitoring the raid status.
Checking the Raid Status on Linux
On Linux, we opted for the simple and easy to use mpt-status utility, which you can script easily. You can install it straight from Debian apt-get, although it doesn’t seem to be in the normal CentOS Yum repositories. It’s pretty easy to use, as this demonstrates:
# mpt-status
open /dev/mptctl: No such file or directory
Try: mknod /dev/mptctl c 10 220
Make sure mptctl is loaded into the kernel
# modprobe mptctl
# mpt-status
You seem to have no SCSI disks attached to your HBA or you have
them on a different scsi_id. To get your SCSI id, run:
mpt-status -p
# mpt-status -p
Checking for SCSI ID:0
Checking for SCSI ID:1
Checking for SCSI ID:2
Found SCSI id=2, use ''mpt-status -i 2`` to get more information.
# mpt-status -i 2
ioc0 vol_id 2 type IM, 2 phy, 135 GB, state OPTIMAL, flags ENABLED
ioc0 phy 1 scsi_id 4 SEAGATE ST314654SSUN146G 022D, 136 GB, state ONLINE, flags NONE
ioc0 phy 0 scsi_id 3 SEAGATE ST3146855SS 0002, 136 GB, state ONLINE, flags NONE
You can then write a simple bash script to check that the status is “OPTIMAL”, and set up some kind of remote monitoring to access it via SNMP or Nagios NRPE.
Checking the Raid Status on Windows
On Windows Server 2003/2008, for remote monitoring your best (only?) option is to install Windows SNMP, and install LSI MegaRaid Storage Manager with the SNMP plugin. You can download the LSI MegaRaid Storage Manager from LSI’s website. Once SNMP and the MegaRaid SNMP plugin are installed, you should be able to snmpwalk your Windows server:
root mibs (mon01): snmpwalk -v1 -c public w01.someserver.everycity.co.uk | head SNMPv2-MIB::sysDescr.0 = STRING: Hardware: x86 Family 15 Model 67 Stepping 3 AT/AT COMPATIBLE - Software: Windows Version 5.2 (Build 3790 Multiprocessor Free) SNMPv2-MIB::sysObjectID.0 = OID: SNMPv2-SMI::enterprises.311.1.1.3.1.2 DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (819029) 2:16:30.29 SNMPv2-MIB::sysContact.0 = STRING: SNMPv2-MIB::sysName.0 = STRING: W01-SOMESERVER ...
Great! Now, you need the LSI Mib Files. Technically you don’t "need" them to check the relevant SNMP OIDs, but it’s helpful to know what you’re querying. I obtained them by downloading and digging through the Linux version of LSI MegaRaid Storage Manager. At the time of writing this was MSM_Linux_28800.zip Inside this is a tar.gz file called MSM_linux_installer-2.88-00.tar.gz. Inside this are 4 RPM files. This is starting to remind me of Russian dolls. Inside sas_ir_snmp-3.16-1002.i386.rpm and sas_snmp-3.16-1002.i386.rpm (Which you can extract with "rpm2cpio *.rpm | cpio -idmv"). Finally you can get your two MIB files:
./etc/lsi_mrdsnmp/sas/LSI-AdapterSAS.mib ./etc/lsi_mrdsnmp/sas-ir/LSI-AdapterSASIR.mib
If you don’t want to arse around, and lets face it who enjoys arsing around, please enjoy LSI-AdapterSAS.mib and LSI-AdapterSASIR.mib.
On a typical ucd/net SNMP install, you’d place these in /usr/share/snmp/mibs. There’s a good guide on ensuring the mibs get loaded when you call tools such as snmpwalk, which means instead of getting:
# snmpwalk -v1 -c public w01.someserver.everycity.co.uk .1.3.6.1.4.1.3582 | head -n 50 SNMPv2-SMI::enterprises.3582.4.1.1.1 = STRING: "W01-SOMESERVER" SNMPv2-SMI::enterprises.3582.4.1.2.1 = STRING: "Microsoft Windows 2003 Service Pack 2.0" SNMPv2-SMI::enterprises.3582.4.1.3.1.1 = STRING: "1.23-02" SNMPv2-SMI::enterprises.3582.4.1.3.2.1 = STRING: "lsi_mrdsnmpagent.dll" SNMPv2-SMI::enterprises.3582.4.1.3.3.1 = STRING: "3.16.0.1" SNMPv2-SMI::enterprises.3582.4.1.3.4.1 = STRING: "28th May 2008" SNMPv2-SMI::enterprises.3582.4.1.9.1.1 = STRING: "LSI Corporation" SNMPv2-SMI::enterprises.3582.5.1.1.1 = STRING: "W01-SOMESERVER" SNMPv2-SMI::enterprises.3582.5.1.2.1 = STRING: "Microsoft Windows 2003 Service Pack 2.0" SNMPv2-SMI::enterprises.3582.5.1.3.1.1 = STRING: "1.14-01" SNMPv2-SMI::enterprises.3582.5.1.3.2.1 = STRING: "lsi_mrdsnmpagent.dll" SNMPv2-SMI::enterprises.3582.5.1.3.3.1 = STRING: "3.16.0.1" SNMPv2-SMI::enterprises.3582.5.1.3.4.1 = STRING: "28th May 2008"
You get:
LSI-MegaRAID-SAS-MIB::hostName.1 = STRING: "W01-SOMESERVER" LSI-MegaRAID-SAS-MIB::hostOSInfo.1 = STRING: "Microsoft Windows 2003 Service Pack 2.0" LSI-MegaRAID-SAS-MIB::mibVersion.1 = STRING: "1.23-02" LSI-MegaRAID-SAS-MIB::agentModuleName.1 = STRING: "lsi_mrdsnmpagent.dll" LSI-MegaRAID-SAS-MIB::agentModuleVersion.1 = STRING: "3.16.0.1" LSI-MegaRAID-SAS-MIB::releaseDate.1 = STRING: "28th May 2008" LSI-MegaRAID-SAS-MIB::copyright.1 = STRING: "LSI Corporation" LSI-megaRAID-SAS-IR-MIB::hostName.1 = STRING: "W01-SOMESERVER" LSI-megaRAID-SAS-IR-MIB::hostOSInfo.1 = STRING: "Microsoft Windows 2003 Service Pack 2.0" LSI-megaRAID-SAS-IR-MIB::mibVersion.1 = STRING: "1.14-01" LSI-megaRAID-SAS-IR-MIB::agentModuleName.1 = STRING: "lsi_mrdsnmpagent.dll" LSI-megaRAID-SAS-IR-MIB::agentModuleVersion.1 = STRING: "3.16.0.1" LSI-megaRAID-SAS-IR-MIB::releaseDate.1 = STRING: "28th May 2008"
This is obviously much more readable and understandable. You can also view the comments in the MIB file, for example:
pdDiskPredFailureCount OBJECT-TYPE
SYNTAX INTEGER
ACCESS read-only
STATUS optional
DESCRIPTION "Number of disk devices in this adapter those are critical"
alarmStatus OBJECT-TYPE
SYNTAX INTEGER{
status-ok(1),
status-critical(2),
status-nonCritical(3),
status-unrecoverable(4),
status-not-installed(5),
status-unknown(6),
status-not-available(7)
}
Depending on the model of your RAID card, the most useful OIDs to monitor are:
# snmptranslate -IR -On vdDegradedCount .1.3.6.1.4.1.3582.4.1.4.1.2.1.19 # snmptranslate -IR -On vdOfflineCount .1.3.6.1.4.1.3582.4.1.4.1.2.1.20 # snmptranslate -IR -On pdDiskFailedCount .1.3.6.1.4.1.3582.4.1.4.1.2.1.24 # snmptranslate -IR -On pdDiskPredFailureCount .1.3.6.1.4.1.3582.4.1.4.1.2.1.23
Or:
# snmptranslate -IR -On vdDegradedCount .1.3.6.1.4.1.3582.5.1.4.1.1.3.1.20 # snmptranslate -IR -On vdOfflineCount .1.3.6.1.4.1.3582.5.1.4.1.1.3.1.21 # snmptranslate -IR -On pdDiskFailedCount .1.3.6.1.4.1.3582.5.1.4.1.1.3.1.25 # snmptranslate -IR -On pdDiskPredFailureCount .1.3.6.1.4.1.3582.5.1.4.1.1.3.1.24
All of which should be zero. You can script snmpget or use nagios’s snmp plugin directly to monitor these values.
Last bot not least, checking on Solaris
Solaris is the easiest of all:
# raidctl -l
Controller: 1
Volume:c1t0d0
Disk: 0.1.0
Disk: 0.2.0
# raidctl -l c1t0d0
Volume Size Stripe Status Cache RAID
Sub Size Level
Disk
----------------------------------------------------------------
c1t0d0 135.9G N/A OPTIMAL OFF RAID1
0.1.0 135.9G GOOD
0.2.0 135.9G GOOD
Enjoy!
5 comments November 18th, 2008
