An incident during flying last night reminds me of why a partial failure can be more complex than a total failure.

Our home airport, CYFD, is open day & night. In poor visibility conditions (fog or darkness), a lighting system assists pilots in locating and descending to the runways for landing. Part of this is the PAPI, which is a color-coded indicator near the runway threshold, which tells pilots whether they are above, on, or below a normal 3-degree glide slope.

Following a normal glide slope for the last ~30 seconds of the flight makes it more likely to have a safe landing. Following the PAPI (or its predecssor VASI) visual indicator at night is important when one cannot see ground obstructions, when the runway edge lights alone could confuse a pilot due to unusual width/length/flatness. The rule is to line up the airplane on an inclined path where of the lights in the PAPI, half are steady red and half steady white. (Too white: too high; too red: too low.)

What if the PAPI is broken? Of course one can still land without it, but with a smaller safety margin. The airport would usually issue a NOTAM - a "note to airmen" - a broadly electronically distributed notice that reads something like this:

140054 CYFD BRANTFORD
  CYFD PAPI 05 U/S
1404130010 TIL APRX 1404132000
Even the layman might decode this compressed goo with some hints. The bottom numbers are YYMMDDHHMM timestamps in UTC; U/S means unservicable. In theory, pilots operating in the area are supposed to read all such announcements before a flight, and air traffic controllers may need to relay new/important announcements right on the air.

But this particular NOTAM is misleading. Last night, the PAPI light was not merely unserviceable. It was broken, but still lit. In particular, it stayed lit in the worst possible way. Take thirty seconds to think about how that could be before you read on.

Time's up. The PAPI lights were ... all white, regardless of glide slope angle. The red filters must have gotten broken off, or the lights were hit and misaligned, or something. The white told pilots that they were too high, implying a suggestion to descend faster. This false "too high" indication continued even if an airplane was well below the glide slope (I tested that part carefully). An unwary pilot could trust the lights and descend right into the ground. It got me worried enough to mention the problem to Toronto area ATC by radio.

Now it's clear why I was talking about a partial failure above. In this case, the broken indicator could be worse than no indicator at all. The power should be shut off, or lights completely blocked, if such a problem occurs.

It's a good reminder that indications of all sorts - whether on board the airplane, on the ground, or in one's head - can be erroneous. Cross-checking and scepticism are essential in critical phases of flight.

Posted Sun Apr 13 16:16:00 2014 Tags:

One of my day jobs is working on Performance Co-Pilot (PCP), a free-software package for monitoring computers/networks. It has some neat capabilities of its own, but one of the best parts is the ease of covering up the capabilities it lacks. In particular, when another package has something we need, we can usually interface to it and offer users the best of both worlds.

On today's menu is interfacing to the Graphite package, which has a certain cachet for nice interactive web-based charting and a clever data storage scheme, all done in Python. PCP has native charting tools too, but nothing as good & pretty for the web as Graphite.

Can the two talk together? As of today, heck yes. At least, in a git branch, there is a little python script pcp-graphite.py which feeds data from the PCP system into graphite. It allows the large numeric subset of PCP data to be easily browsed/manipulated with the Graphite toolset.

How does it look?

You're looking at a screenshot of the graphite web front-end on a Fedora 19 workstation. The picture is too small to contain the hundreds of PCP metrics being injected there (sorry).

How to try it? That's a bit complicated, since this is just an early development branch, but chances are that after installing the next PCP release (3.9.2, due this week), you'll be able to grab just the pcp-graphite.py script out of the repository and run it on top. Until then, if you're very keen, you could build a whole 3.9.2 pre-release snapshot out of the branch.

Oh, and you'll need Graphite etc. too. On Fedora, here are all the steps:

  1. Avail yourself of PCP 3.9.2 of some form.
    # wait till release or build your own
    # yum install pcp
    # /sbin/service pmcd start
  2. Avail yourself of the pcp-graphite script.
  3. Avail yourself of graphite-web (season to taste):
    # yum install graphite-web
    # cat >> /etc/httpd/conf.d/graphite-web.conf
    <Directory /usr/share/graphite>
        Options All
        AllowOverride All
        Require all granted
    </Directory>
    ^D
    # /sbin/service httpd start
  4. Avail yourself of python-carbon:
    # yum install python-carbon
    # /sbin/service carbon-cache start
  5. Run pcp-graphite.py in the background to start feeding data.
    % python pcp-graphite.py kernel mem network disk filesys
    Relaying 451 metric(s) in pickled mode to localhost:2004 every 60.000 s
  6. Run a few more copies if pcp-graphite.py if you'd like to relay different data from other PCP server (-h HOST), at different intervals (-t SECONDS), to remote graphite/carbon servers (-g HOST), with a unique graphite metric prefix (-m FOO.HOST.). More options coming soon.
  7. Sic a web browser on the bad boy, and click around the metric tree.
    % firefox //web.elastic.org/
  8. Find out more about the PCP metrics in question. Graphite's data model is too poor to tell you exactly what the numbers mean, but PCP will. (pcp-graphite will soon learn to rescale on demand.)
    % pminfo -d -t kernel.all.runnable network.interface.speed mem.numa.util.dirty
    
    kernel.all.runnable [total number of processes in the (per-CPU) run queues]
        Data Type: 32-bit unsigned int  InDom: PM_INDOM_NULL 0xffffffff
        Semantics: instant  Units: none
    
    network.interface.speed [interface speed in megabytes per second]
        Data Type: float  InDom: 60.3 0xf000003
        Semantics: discrete  Units: Mbyte / sec
    
    mem.numa.util.dirty [per-node dirty memory]
        Data Type: 64-bit unsigned int  InDom: 60.19 0xf000013
        Semantics: instant  Units: Kbyte

Note that this is all for passing data to Graphite; fetching it from its lovely embrace (so PCP tools can directly process it) is also possible, but would be a separate prototype. Please let us know if that capability would be of interest/use to you.

update pcp2graphite is included in pcp version 3.10.2

Posted Sun Apr 13 21:22:00 2014 Tags: