[Macpartners] Re: Macpartners Digest, Vol 17, Issue 4

John C. Welch jwelch at MIT.EDU
Sun Oct 10 16:37:13 EDT 2004


On 10/10/04 3:10 PM, "Kerem B Limon" <k_limon at MIT.EDU> wrote:

> Aha...now I understand. This is the cause of the problem. Since you haven't
> done
> this in more than several years, I can see how you are missing my point. As
> someone who's done just the opposite--using PCs and server hardware with HDD
> LEDs standard for over 10 - 12 years, and having diagnosed and troubleshot
> many
> such failures, I can immediately derive the bits/pieces of information I need
> from the LED feedback, subject to the constraints I gave above. And having
> used
> such machines routinely for so long, I am also intimately familiar with their
> so-called 'normal behaviour'. Now, a tech or support person could be in the
> same boat that I am or could have been taught this information, and could
> further assist customers by telling them to describe what an LED is doing or
> what to expect.

Oh, I've worked on just as wide a range of hardware as you have in that
time, in fact, I'll bet a far wider range. I just stopped using the LED,
because it didn't give me context, and by the time I got the context, I had
enough information that the LED was useless anyway.

> 
>> 
>> Quite literally, that's why the Xserve and the Xserve RAID have them. They
>> make people feel good. They're useless diagnostically.
> 
> Untrue. At least in the case of anyone but Apple, it appears. I can bring
> forth
> many sysadmins who will require drive LEDs, in most cases per-drive LEDs in
> RAIDs, as proper RAIDs do have. This is one of the best ways to diagnose a
> problematic drive--aside from the internal diagnostics or software
> reporting--in an array of sometimes tens of drives on a rack. And often so
> without even opening up anything or hooking into the control environment. I
> can
> see in a RAID if the LED is flickering in concert (or sequence, depending on
> the array type) with the other drives when I do an array wide access? Is it
> stuck on while everything is off, etc.?

Um...on any RAID worth buying, the diags that ship with the raid have
already told you there's a problem, have already told you what drive, downed
the drive, and brought up the hot spare, and all you're doing is swapping
out the drive that the RAID told you is dead. It told you this via email, or
a page.  On a RAID with 40-50 drives, you don't have time to play pattern
matching. The RAID software tells you the drive is bad, you yank it, slap in
another spare, and make sure that the drive is erased thoroughly, then stick
it in some other machine if you want to play with it.

And how do you specify an "array" wide access? Do you bypass the LUN setup
and do a series of raw reads and writes to each drive in the array? Do you
get the mapping of data across the drives?

The lights are far less useful than the RAID software. It's like that floppy
sound. That grind they make? It's unnecessary. The reason it's there is that
if the floppy drive didn't make grindy sounds people thought it wasn't
working. So the manufacturers made it make noise.

Oh, and if you look at an Xserve RAID, they have lights. Because there are
still people who think that rather than trust the RAID error detection
systems, they'll get the "true" story from a bit of plastic flickering every
time the drive is accessed.

> 
> Sure, they're perhaps not absolutely necessary, as in it will work without
> them,
> but then, most devices will work without any indicators, as well. They are
> useful, because they tell me information I can use in diagnosing the problem.

Which has no context by itself, and is useless without looking at the whole
problem. 


>> But that can happen while having nothing to do with the hard drive. If you
>> have a 1999 PowerBook G3 with a first Rev memory controller and >64MB of
>> RAM, I can make it happen on command. The memory controller had a bug. A
>> hard drive light was useless in that case.
> 
> Errr, context once again. If you read back a couple of messages, I indicated
> that most manufacturers, especially Apple, make ultra-quiet systems these
> days,
> including setting the hard drive acoustic mode to 'quiet', it is sometimes
> difficult to gauge if there is drive activity just listening to the computer
> or
> trying to feel the drive spin. In those cases, I reasoned, the presence of an
> HDD LED would help you if your screen froze or the application you were using
> hung. Yes, by itself, a screen freeze can happen having nothing to do with an
> HDD, but since we have narrowed down our issue to something to do with the
> filesystem/HDD, it is nice to have a reliable activity indicator when all else
> fails.

How? If there's no drive activity happening, then the light is going to be
dark, as it should be. Because there's nothing happening. The system is
down, the mechanics aren't. This whole LED thing reminds me of when we used
to hard-wire the turbo LEDs on PCs to the "On" position, and hardware the
LED speed displays so people would feel good that their computer was
"running fast". 

A blank LED simply means no activity. Is the drive even powered on? You
don't know, because no activity can happen in powered on and off states. So
it's ambiguous at best. Solidly on bad? Maybe. Maybe not. By the time you
have the other context to tell you that there's something wrong, the LED
isn't telling you anything new.

> 
> Again, combine the various bits of information/symptoms you have, and you can
> narrow things down better. No single indicator, including the HDD LED by
> itself
> will give you conclusive feedback; but an HDD LED is a particularly useful
> indicator, that when combined with other existing indicators/symptomology, can
> help you diagnose HDD/IDE/SCSI related issues faster and better.

How is an LED going to diagnose a SCSI bus error? It cannot. It can only
tell you one thing...is drive activity happening. Yes or no. It can kind of
tell you if a lot of activity is happening, but with the way a modern OS and
modern programs work, a lot of activity is happening constantly.

For example....I got a call once from a user in a panic. They "weren't doing
anything" and their drive light was flickering like mad. Virus, hackers,
mechanical failure, the wrath of deity?

No

It was the backup. 

That LED is just not needed anymore.


-- 
 "Klingon multitasking systems do not support "time-sharing".  When a
Klingon program wants to run, it challenges the  scheduler in hand-to-hand
combat and owns the machine."




More information about the Macpartners mailing list