Alasdair on Everything

Archive for April, 2010

OpenSolaris 2010.03 (2010.04)

Like many people, I’ve been waiting eagerly and anxiously for OpenSolaris 2010.03 to come out. Over the past 9 months, OpenSolaris has come along leaps and bounds with a never ending procession of new features and enhancements, such as ZFS Deduplication and COMSTAR. While the OpenSolaris dev builds provide access to these new features, are generally stable enough for production use if you pick the right build, it’s obviously best to stick to stable releases for production environments.

However March came and went, and we’re now one third of the way through April, and still no sight of 2010.03. This has caused a lot of people to become quite anxious regarding OpenSolaris’ future.

Oracle have stated they’re going to invest more in Solaris than Sun did, and have stated their intention to keep OpenSolaris going. However they have also recently revoked the free version of Solaris 10, making it a 90 day trial. My understanding is that now, to run Solaris 10, you need an entitlement to do so, which comes from having a valid support contract for Solaris on Sun^H^H^HOracle hardware.

This has serious implications for many businesses, including EveryCity. Our current platform is Solaris 10 based, and while we started off using Sun servers, their Intel Nehalem range is simply too expensive, so we’ve been purchasing Dell R410 and R610 machines.

Thankfully, we have already planned to move to OpenSolaris for some time now, and our forthcoming platform will be OpenSolaris based. So this is no real issue for us - it just means Solaris 10 update 8 will be the last update to our Solaris 10 platform, and at some point down the line all our Solaris 10 Zones will become branded zones under OpenSolaris.

The future is uncertain, but thankfully OpenSolaris shows no sign of going away. Oracle’s culture is quite different from the one at Sun; it’s clear they’re very corporate, and rather secretive. They have also stated that they take a very “hands off” approach to their User Groups, so for example the LoSUG group now has to be managed by non-Oracle employees, and there’s no longer any free food/drink for attending. I was tempted to volunteer to help organise LoSUG, but unfortunately I just don’t have the time at present. Hopefully that will change in the future.

Oracle are hell bent on making their investment profitable, and as long as OpenSolaris continues to develop at the pace it has been going thus far, and continues to be free and open, I’m happy. If anything Oracle’s behaviour may bring the OpenSolaris community outside of Sun/Oracle closer together, and foster more community involvement and development, which can only be a GoodThing[tm].

And while there’s been a lot of suggestion that OpenSolaris 2010.03 may never materialise on the grape vine, this is clearly not the case, and it seems it may be just round the corner:

Re: osol-discuss - So who has been able to update to the b136 image?
by Alan Coopersmith alan.coopersmith@oracle... 2010-04-07T13:58:54+00:00.

Chad Welsh wrote:
> I have run update from IPS package manager and from the
> pkg image-update from the command line and nothing. are
> the packages only in the ON Gate? If so when will they be
> released into the wild for us to use?

Packages for the later builds have been built for all the gates, not just ON,
but not published to pkg.opensolaris.org while the 2010.03 release is being
finished.
Hi Sarah,

Due to some security concerns and other issues, I'd flipped a coin
and await the forthcoming snv_b138 kernel release.

As for any independent distro releases between now and before snv_b138, my opinion is to wait on doing any major system upgrades, 'production' related migrations, or journalistic reviews until snv_138 is officially released. 

This is for mainly commented for current users using OpenSolaris for 'very' high-end production-grade audio/video workstations or high-availability servers with several TBs of in-flux data. 

If you are having ANY major issues with a prior OpenSolaris release, just give OSOL 2010.03 until April 16th or await the snv_138 kernel release.

You'll be 'very' glad you did.

~ Ken Mays

1 comment April 10th, 2010

The case for RAIDZ2

We have an old x4500 knocking around which is getting on for 3 years old now. At the beginning of last month, we did a scrub, and to our horror discovered checksum errors on almost all the drives:

  pool: pool01
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub completed after 23h0m with 0 errors on Wed Mar  3 12:55:36 2010
config:

        NAME         STATE     READ WRITE CKSUM
        pool01       DEGRADED     0     0     0
          raidz1-0   ONLINE       0     0     0
            c11t3d0  ONLINE       0     0     4  2.50K repaired
            c10t3d0  ONLINE       0     0     0
            c13t3d0  ONLINE       0     0     4  1.50K repaired
            c7t1d0   ONLINE       0     0     0
            c8t3d0   ONLINE       0     0     5  1K repaired
            c7t3d0   ONLINE       0     0     4  2K repaired
            c10t2d0  ONLINE       0     0     3  1K repaired
            c13t2d0  ONLINE       0     0     2  1K repaired
            c11t6d0  ONLINE       0     0     3  1K repaired
            c8t2d0   ONLINE       0     0    16  7K repaired
            c7t2d0   ONLINE       0     0     4  2.50K repaired
          raidz1-1   DEGRADED     0     0     0
            c11t7d0  ONLINE       0     0     6  64K repaired
            c10t7d0  DEGRADED     0     0    58  too many errors
            c13t7d0  ONLINE       0     0     4  3.50K repaired
            c12t7d0  ONLINE       0     0     3  7K repaired
            c8t7d0   ONLINE       0     0     2  4.50K repaired
            c7t7d0   ONLINE       0     0     4  11.5K repaired
            c10t6d0  ONLINE       0     0     4  11K repaired
            c13t6d0  ONLINE       0     0     8  86K repaired
            c12t6d0  ONLINE       0     0     0
            c8t6d0   ONLINE       0     0     2  1K repaired
            c7t6d0   ONLINE       0     0     2  2.50K repaired
          raidz1-2   DEGRADED     0     0     0
            c11t5d0  ONLINE       0     0     1  9K repaired
            c10t5d0  ONLINE       0     0     1  13K repaired
            c13t5d0  ONLINE       0     0     2  1.50K repaired
            c12t5d0  ONLINE       0     0     1  1K repaired
            c8t5d0   DEGRADED     0     0   135  too many errors
            c7t5d0   ONLINE       0     0     2  1.50K repaired
            c10t4d0  ONLINE       0     0     8  44K repaired
            c13t4d0  ONLINE       0     0     3  5K repaired
            c12t4d0  ONLINE       0     0     3  2K repaired
            c8t4d0   ONLINE       0     0     2  6.50K repaired
            c7t4d0   ONLINE       0     0     2  13.5K repaired

errors: No known data errors

Thankfully it’s not used for production, so this didn’t bother us a huge amount. ZFS repaired the data errors without issue (hurrah for ZFS!), and we have been replacing the worst affected disks. We’re now doing weekly scrubs to keep the data “fresh” and stop it rotting away.

However one interesting issue that cropped up. We’re using RAIDZ1, which only stores enough parity for 1 disk to be out of service. Since ZFS uses the parity data to reconstruct blocks with checksum errors, if you’re one disk down, and have a block with a checksum error, you’re in trouble - it can’t repair it and you’re data is corrupted.

So when you replace a failed disk in a RAIDZ1 set, you had better hope you don’t encounter any checksum errors on the other disks during the resilver process. Because ZFS has to read in all the data from the other disks to resilver the new disk, you’re at a high risk of encountering checksum errors, especially in our situation where the disks are wearing out.

And this is precisely what happened next. We replaced a failed disk, and during the resilver, ZFS encountered checksum errors on the other disks it couldn’t repair, and we started to lose data:

  pool: pool01
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: resilver completed after 15h47m with 219 errors on Sat Apr 10 16:14:59 2010
config:

        NAME         STATE     READ WRITE CKSUM
        pool01       DEGRADED     0     0   331
          raidz1-0   ONLINE       0     0     0
            c11t3d0  ONLINE       0     0     0
            c10t3d0  ONLINE       0     0     0
            c13t3d0  ONLINE       0     0     0
            c8t5d0   ONLINE       0     0     0
            c8t3d0   ONLINE       0     0     0
            c7t3d0   ONLINE       0     0     0
            c10t2d0  ONLINE       0     0     0
            c13t2d0  ONLINE       0     0     0
            c11t6d0  ONLINE       0     0     0
            c8t2d0   ONLINE       0     0     0
            c7t2d0   ONLINE       0     0     0
          raidz1-1   ONLINE       0     0     0
            c11t7d0  ONLINE       0     0     0
            c11t2d0  ONLINE       0     0     0
            c13t7d0  ONLINE       0     0     0
            c12t7d0  ONLINE       0     0     0
            c8t7d0   ONLINE       0     0     1
            c7t7d0   ONLINE       0     0     0
            c10t6d0  ONLINE       0     0     0
            c13t6d0  ONLINE       0     0     0
            c12t6d0  ONLINE       0     0     0
            c8t6d0   ONLINE       0     0     0
            c7t6d0   ONLINE       0     0     0
          raidz1-2   DEGRADED     0     0   888
            c11t5d0  DEGRADED     0     0     0  too many errors
            c10t5d0  DEGRADED     0     0     0  too many errors
            c13t5d0  DEGRADED     0     0     0  too many errors
            c12t5d0  ONLINE       0     0     0  401G resilvered
            c12t3d0  DEGRADED     0     0     0  too many errors
            c7t5d0   DEGRADED     0     0     0  too many errors
            c10t4d0  DEGRADED     0     0     0  too many errors
            c13t4d0  DEGRADED     0     0     0  too many errors
            c12t4d0  DEGRADED     0     0     0  too many errors
            c8t4d0   DEGRADED     0     0     0  too many errors
            c7t4d0   DEGRADED     0     0     0  too many errors

errors: 219 data errors, use '-v' for a list

Ouch! 219 data errors.

Thankfully ZFS knows precisely which files are affected, and you can just delete/replace/restore the affected files/snapshots and it keeps on running.

However after this, I’m sold on RAIDZ2. I don’t think I’ll be using RAIDZ1 again - the risk of losing data when you’re replacing a failed disk is just too high.

3 comments April 10th, 2010