Archive for August, 2007

Aug 24 2007

Michigan Weather

Published by Brian under Travel, Weather

I’m in Michigan for work, and was quickly reminded just how bad things can get here.

This afternoon, I drove through what I can easily say was the most terrifying weather I’ve ever experienced. The national weather service has already confirmed two tornadoes, with current estimates being four total. Straight line winds were measured at 70 MPH. It started right after I left work at 5:30pm, on the West-side of the city, and was headed to Grosse Pointe on the East side.

Right before I was getting on I-696 west, I saw the funnel cloud pictured spinning wildly–it almost looked liquid. It was heading right towards me, and I (stupidly) decided to try and outrun it on the freeway.

Bad move. I caught three or four red lights before I reached the on ramp, and that’s all the time the storm needed to bear down on me. At the last red light a massive white blast of lightning hit close enough to cause a simultaneous boom and blowing up a transformer in a brilliant green mushroom cloud–it temporarily blinded me.

Then the rain hit. I’ve never seen it rain this hard in my life. Weather reporters are now claiming there was 3″ of rain in 30 minutes. Winds were blowing it so hard at the car that water was making it inside while the windows were rolled up. I could not see out the windshield at all, even with the wipers at full blast.

I-696 sits below street level, at some parts by more than 15 feet. Water was starting to build on the freeway, and pressure in the storm sewers causing air to blast water up four feet high from them as I drove by. At first I was excited by the sudden weather, but now I was honestly scared for my life–not knowing whether it was safer to stop and risk getting flooded out, or continue and try to push through it. It was that bad.

Luckily I made it to Grosse Pointe okay. Apparently right after I exited the freeway, the water was getting to four feet deep in spots.

At this time Fenton, which is further out on the West side of Detroit is in full lockdown with a complete power outage and building damage. There is a curfew in effect there.

I snapped a couple pictures on the iPhone in between clutching the steering wheel.

Funnel1BoilingRain1Rain2

One response so far

Aug 09 2007

Luminous Vegetables

Published by Brian under Cool, Funny

Every once in a while you come across something equal parts weird and cool. Like making glowing tomatoes.

2 responses so far

Aug 08 2007

VMWare Server on NFS & RedHat Cluster Suite

Published by Brian under Computers, Linux, Tlf, vmware

Over the past few weeks I’ve managed to get a pretty darn stable NFS / VMWare Server setup running.

The basic specs are as follows:

  • VMWare Host: Dell PowerEdge 1950 (dual quad core, 8 gigs RAM)
  • NFS Cluster: Two Dell PowerEdge 860s (single quad core, 4 gigs RAM each)
  • Networking: Dell PowerConnect 5324
  • Centralized storage: EonStor A24F-R2224
  • Internal DRAC on each node for cluster fencing (not ideal, read below).
  • All the VMDKs are stored on an NFS mount from the cluster.

Through quite a bit of experimentation and trial and error, I have it running pretty solid. Some of the key points:

  1. Used RedHat EL 5 all the way around, with the related GFS/RHCS packages
  2. Mounted GFS on the NFS nodes with noatime,noquota for some minor speed improvements.
  3. NFS4 and TCP for everything. Makes failover to the other node more reliable.
  4. Use ‘hard’ mounts on the VMWare server, use timeo=600,retrans=2 in your options. This allows TCP to handle transmission delays during a failover versus NFS.
  5. On the export side, craft your /etc/exports so that each has a matching ‘fsid=’ for every export. This gets around stale handles.
  6. Use the GFS/shared storage/floating IP technique as documented in the Cluster NFS Cookbook versus managed NFS (read why below).
  7. Bonded the NICs on the NFS nodes for higher throughput (in our case they will be exporting to more than one server when in production, so this was necessary).
  8. Spanning tree algorithm delays on the PowerConnect can get you in trouble with a fencing loop in a two-node setup. During a reboot situation one of the nodes, the NICs come up quicker during Linux sysinit than they do on the switch. Thus, Linux thinks the interface should be reachable (when it’s not) and when fenced attempts to initalize, it cannot reach the other node and consequently fences that one. Solution is to either add “LINKDELAY” to /etc/sysconfig/network or just disable spanning tree on the switch.

I intially tried the managed NFS setup in Cluster Suite (check the cookbook), however there are two major problems. At this time, managed NFS appears to be set up to use NFSv2 and v3 only, as there is no opportunity to modify the export options via Cluster Suite. Also, there are timing delays with how Cluster Suite manages the NFS daemons…

Of course, during a failover speed is of the essence. So, when I had this rig configured for managed NFS failover, I was experiencing 12+ second delays in failover. Why?

Well, it turns out RedHat has a sleep command in /usr/share/cluster/ip.sh (the virtual IP management script) that adds 10 seconds to the failver so NFSD can clear its cache (!?). Pretty hackish, and results in an unacceptable delay during a failovers. Unfortunately if you’re running managed NFS, there’s no real way around this unless you want to risk corruption of NFSD going down without flushing its cache to disk.

I found this in the Cluster Project FAQ. With the ’sleep 10′ command gone, failover is much, much quicker. As long as you’re doing the GFS thing versus the managed NFS setup this works quite nicely and fast enough that VMWare doesn’t seem to know the better of what is going on.

Performance-wise, it’s pretty darn good. I have a dedicated PowerConnect 5324 for use as an “ethernet SAN” to interconnect the NFS nodes, and VMWare Server. That being said, 20 concurrent lightly loaded VMs results in nothing abnormal in terms of performance or reliability. In fact, it’s hard to tell the difference from local disk–even during a failover. The NIC being used for “front-end” access to the VMWare Server Console even seems to experience more traffic than the NFS one according to the PowerConnect’s interface reports, though that leaves me a bit skeptical.

I would have been interesting to see if the TOE (TCP Offload Engine) equipped on the 1950’s NetExtremeII NICs would have made a performance improvment, but it works in Windows 2003 only. Bummer.

Another “gotcha” to watch for is using RAC cards on the servers for fencing purposes. In most cases it works fine however when power is lost to the entire server, the DRAC goes down with it and becomes unreachable. This leaves the surviving node stuck trying to fence the dead one, and failover never occurs. A better option would be to use a managable PDU (which we’ll do ultimately).

Bottom line it seems to work very well in almost all failure situations I’ve tested it in. The only time I was able to make it fail (badly) was to yank the power cord out of one of the cluster nodes, and have the entire cluster crunch to a halt, due to the problem I mentioned above. I did this while installing X Windows on 50% of the VMs to simulate a lot of NFS write activity.

After bringing the entire cluster back up manually, the only damage was a corrupt RPM DB on one of the VMs. The others came back up fine after a fsck on boot. Not bad!

After my testing, I’m confident this set up will work in a production environment. If you wish to do try the same, ensure your testing plan includes every possible outage situation you can fathom. Weird/odd stuff can come up (for example the spanning tree thing) and of course it is far better to nail those down in R&D than in production!

3 responses so far

Aug 06 2007

Resizing mounted EXT3 filesystems

Published by Brian under Computers, Linux

In case you were not aware, RedHat (and I’m assuming most other Linuxes) can resize and extend EXT3 filesystems while they’re mounted and being used. You need to be using LVM to do this (and if you aren’t, why not?)

Behold:
[root@tlfvm ~]# lvextend -L8G /dev/vg00/lvusr
Extending logical volume lvusr to 8.00 GB
Logical volume lvusr successfully resized

[root@tlfvm ~]# resize2fs -p /dev/vg00/lvusr
resize2fs 1.39 (29-May-2006)
Filesystem at /dev/vg00/lvusr is mounted on /usr; on-line resizing required
Performing an on-line resize of /dev/vg00/lvusr to 2097152 (4k) blocks.
The filesystem on /dev/vg00/lvusr is now 2097152 blocks long.

Try that on your NTFS partition, Windows fanboy!

4 responses so far

Aug 03 2007

ssh-copy-id

Published by Brian under Computers, Linux

Linux/Unix is great. You learn something new just about every day.

I found a script called ssh-copy-id that when run with the format of:

ssh-copy-id remotelogin@remotehost.com

will take the RSA public key (~/.ssh/id_rsa.pub) of the ID on the current host, and set it up under authorized_keys for “remotelogin@remotehost.com”. This allows for passwordless, key-based authentication–all set up in one command!

It’s normally a pretty trivial task anyway, but using SCP to ship it over and moving or concatenating the file in place gets old after a few dozen times.

Small and useful utilities like these make my day easier!

One response so far

Next »