05

←	May 2007	→
S	M	T	W	T	F	S
		1	2 note to my avian neighbours	3	4	5
6	7 endangered species unused software	8	9	10 ultimate 'in joke'	11	12
13	14	15	16	17	18	19
20	21	22 how not to use linux dm-raid5	23	24	25	26
27	28	29	30	31

SHUT UP.

I dragged my family to this little town for peace and quiet. But noooaaah.

When there is peace, there is no quiet. During daylight hours, visiting birds deafen with their happy chirps, swoons, tweets, and croons. They drown out the sounds of local traffic, or even the occasional small airplane.

If they keep this up, I may have to stop feeding them.

Posted Wed 02 May 2007 11:40:00 PM EDT Tags: seriously

Here is a little Fedora-oriented shell script to tell how long it’s been since any program included in a given RPM has been accessed, i.e., used. Packages that show no usage in many days might be considered for deletion.

#! /bin/sh
MINDAYS=${MINDAYS-7}
rpm -q -a "$@" | while read rpm; do
    rpm -ql "$rpm" | egrep '/bin|/sbin|/lib' | while read file; do
        echo `stat -c '%X' $file 2>/dev/null` $file
    done | sort -n | tail -1 |
    awk 'BEGIN {CONVFMT="%d"}
         { days=(systime()-$1)/86400
           if (days >= '$MINDAYS') {
              print "'$rpm'" " " ((systime()-$1)/86400) " days (" $2 ")"
           } }'
done

It runs thusly:

% MINDAYS=14 rpm-usage gnome\*
gnome-python2-desktop-2.16.0-1.fc6.x86_64 244 days (/usr/lib64/pkgconfig/gnome-python-desktop-2.0.pc)
gnome-python2-extras-2.14.2-9.fc6.x86_64 69 days (/usr/lib64/pkgconfig/gnome-python-extras-2.0.pc)
gnome-mime-data-2.4.2-3.1.x86_64 299 days (/usr/lib64/pkgconfig/gnome-mime-data-2.0.pc)
gnome-themes-2.16.3-1.fc6.noarch 83 days (/usr/share/icons/HighContrast-SVG/scalable/mimetypes/binary.svg)

Posted Mon 07 May 2007 01:13:00 PM EDT Tags: tech

Don’t be a car in France.

On election night, scattered violence was reported across France. Police reported that 270 people were taken in for questioning and that 367 [actually, 730] parked vehicles had been torched. On a typical night in France, about 100 cars are burned.

Posted Mon 07 May 2007 03:59:00 PM EDT Tags: politics

I can’t quite believe what I’m reading on BoW Today.

What’s a Six-Letter Word for ‘Humidor’?
“Bill Clinton Pens NY Times’ Crossword Puzzle”—headline, Reuters, May 7

Posted Thu 10 May 2007 09:56:00 AM EDT Tags: seriously

Here’s an excellent way of losing hours of work for an officeful of people.

First, have your main fileserver lose a disk. Since it’s configured optimistically, let there be no hot-spares, and let the SCSI backplane drag six other drives off to neverland.

Second, figure out which is the dead drive. Do this largely under remote control, since remote sysadmins are in charge of the local recovery operation (?!).

Third, pull the dead drive. Since it was part of a RAID5 set, the machine should have been happy to resume working in a non-redundant but functional mode. But something mysterious happens, and instead of doing this and letting others in the office resume their work ….

Fourth, initiate a RAID5 resync over the remaining drives, pretending that the dead drive was never in the array. For those following along at home, this multi-hour operation explicitly corrupts any of the remaining redundant data, rendering the whole array useless.

Fifth, while this resync is going, avoid checking the result by mounting the filesystem, or even running fsck on it. (This is entirely possible, and would let one see the results of #4). Instead, wait out the whole resync period, then notice … oh, golly, the data is lost!

Sixth, decide to start over, from tape backups. Add some spare or whatnot drive into the array. Resync it again. Look for tape backups, which might be at a remote site too — I don’t know. The people whose work got disrupted several hours ago might as well go home for the day.

Seventh, under no circumstances use this opportunity to improve the redundancy or capacity of this server during this forced multi-hour outage.

Eighth, let the tape backup robot die during recovery – which is only partly surprising as a full restore of this system has never been attempted. Oh well, what’s another few hours downtime. Oops, discover that the tape drive won’t work without cleaning, and we’re out of cleaning cartridges. Another day gone.

Ninth, find out that the sole tape drive that was formerly blocked on cleaning is now blocked on breaking. It’s deader than a doornail, and needs a replacement. Another day gone.

Tenth, after a test restore, find that the new tape drive actually … works. Restore the directories. Lose a bunch of the symbolic links that were also on that filesystem. Restore them individually, as people miss them. Then, after three hours of service (that’s three total in the last six days), the server dies again.

Eleventh, and the hits keep on coming, the scsi problem taking half the drives offline is back. RAID recovery is botched again, full restore starting over. Anyone who optimistically moved valuable files over to the server during the three-hour uptime is rewarded with BOHICA.

Finally, from all this, collect the lesson that a well-funded enterprise-level tape backup system would have been appropriate – instead of the lesson that low-cost high-redundancy DIY disk-to-disk backups are a good thing. The latter is not “enterprise level” and requires less budget, so it can’t be right.

Note: this story is in no way representative of an actual event, real or imaginary. Facts may be missing or inaccurate, and assumed not in evidence. No actual sysadmins would have ever done something like this. No electrons were created or destroyed during the events depicted in this entirely fictional story.

PostScript (2008-11): Subsequently, discover that the enterprise-level tape backup system ™ has been incorrectly configured to recover from the reboot of an NFS server, and thus fails to actually perform backups … for months … without anyone noticing. That is, until someone needed files restored.

Posted Tue 22 May 2007 01:32:00 PM EDT Tags: tech