Archive for March, 2010
We use Solaris Zones, with each zone stored on its own zpool. The ZPool is stored on a SAN, and accessed via iSCSI. We’ve been doing this since Solaris 10 update 6, and Solaris 10 update 8 introduced an interesting issue we’ve run into.
When we asked a S10u8 box to reboot, it sat there for 10 minutes shutting down. Why? Because it was trying to stop the iSCSI initiator whilst there were live iSCSI filesystems in use. Duh! Stupid Solaris.
So I compared the iSCSI manifest from S10u7 to S10u8 and they’ve changed it in a few places. It used to depend on svc:/network/physical and svc:/system/metainit, and now it depends on svc:/network/service and svc:/network/loopback. However the biggest change was the timeout value, it was upped from 5 seconds to 600 seconds. Yes, 10 minutes.
So this highlighted an interesting problem – when rebooting boxes previously, Solaris would always try to stop the iSCSI initiator with live filesystems on it, and give up after 5 seconds and the box would come down.
Rather than hack the timeout value back to 5 seconds, I decided to investigate and see if I could add a dependency to fix this properly. I decided to make the svc:filesystem/local service depend on the iSCSI initiator service. The theory here was that filesystem/local mounts and unmounts the ZFS filesystems, so if it depends on the initiator, the initiator won’t be stopped before it unmounts the ZFS filesystems.
Unfortunately this didn’t work. Somewhere in the enormous SMF dependency tree, I ended up with a cycle, and upon boot services wouldn’t come up. At this point, I gave up and set the timeout back to 5 seconds.
If I can find the time, I’ll try and reproduce this issue on OpenSolaris, then file it on defects.opensolaris.org. After it’s been accepted, I’ll escalate it against our Solaris 10 premium support contract, and see if Sun will actually fix something for us.
3 comments March 23rd, 2010
Just a quick post. If you’re upgrading an OpenSolaris host on the dev branch and get this error:
# beadm create snv134 # beadm mount snv134 /mnt # pkg -R /mnt install email@example.com Creating Plan -pkg: Cannot remove 'pkg://opensolaris.org/SUNWgnomefirstname.lastname@example.org,5.11-0.127:20091111T055042Z' due to the following packages that depend on it: pkg://opensolaris.org/SUNWgnomeemail@example.com,5.11-0.127:20091111T055202Z
Then do this to resolve:
# beadm umount snv134 # beadm destroy snv134 # pkg uninstall SUNWgnome-a11y-reader PHASE ACTIONS Removal Phase 346/346 # beadm create snv134 # beadm mount snv134 /mnt
You might then get this new error:
# pkg -R /mnt install firstname.lastname@example.org Creating Plan \pkg: Cannot remove 'pkg://opensolaris.org/SUNWipkgemail@example.com,5.11-0.127:20091111T075414Z' due to the following packages that depend on it: pkg://opensolaris.org/SUNWipkgfirstname.lastname@example.org,5.11-0.127:20091111T075333Z
Which is easily fixed with:
# beadm umount snv134 # beadm destroy snv134 # pkg uninstall SUNWipkg-gui PHASE ACTIONS Removal Phase 251/251 # beadm create snv134 # beadm mount snv134 /mnt
Then it should all work nicely:
# pkg -R /mnt install email@example.com DOWNLOAD PKGS FILES XFER (MB) Completed 1523/1523 120934/120934 1152.0/1152.0 PHASE ACTIONS Removal Phase 144368/144368 Install Phase 174332/174332 Update Phase 1592/1592
(You can also ignore errors like this:)
Removal Phase 1156/144368 driver (tl) clone permission update failed with return code 252 command run was: /usr/sbin/update_drv -b /mnt -d -m ticlts 0666 root sys clone command output was: ------------------------------------------------------------ No entry found for driver (clone) in file (/mnt/etc/minor_perm). ------------------------------------------------------------
After though, you might get errors related to /dev/ptmx when logging in via SSH. Log in via the console/ilom and do “chmod 777 /dev/ptmx” to fix.
Add comment March 22nd, 2010
Gosh this one was quite hard. I was getting errors such as:
rlibmemcached_wrap.c:2074: error: syntax error before ‘bool’ rlibmemcached_wrap.c: In function ‘SWIG_AsVal_bool’: rlibmemcached_wrap.c:2076: error: ‘obj’ undeclared (first use in this function) rlibmemcached_wrap.c:2076: error: (Each undeclared identifier is reported only once rlibmemcached_wrap.c:2076: error: for each function it appears in.) rlibmemcached_wrap.c:2077: error: ‘val’ undeclared (first use in this function) rlibmemcached_wrap.c:2077: error: ‘true’ undeclared (first use in this function) rlibmemcached_wrap.c:2080: error: ‘false’ undeclared (first use in this function)
So to solve this I basically followed these helpful instructions Nick Sellen:
Nick Sellen says (January 27, 2010): I had trouble installing it on my Solaris 10 with 32bit / gcc compiled ruby but managed it with a few modifications to extconf.rb: 1. added "--disable-64bit" to the libmemcached configure arguments 2. added "-std=gnu99" to CFLAGS (the rlibmemcached_wrap.c compilation was failing without that) 3. added an extra -R path for ext/lib - not sure if this was needed actually 4. recreated the rlibmemcached_wrap.c with swig (it removed a bunch of methods, not sure if this will bite me later) 5. added three extra libraries "-lnsl -lsocket -lposix4" to resolve a "symbol getaddrinfo: referenced symbol not found" relocation error with rlibmemcached.so (might only need libsocket)
You might also want to view the extconf.rb modifications directly.
The swig step basically involves downloading, compiling and installing swig to somewhere like /opt/swig, then doing “export SWIG=true” in your shell.
3 comments March 3rd, 2010