[Macpartners] Re: Macpartners Digest, Vol 17, Issue 4

Kerem B Limon k_limon at MIT.EDU
Sun Oct 10 16:10:54 EDT 2004


Quoting "John C. Welch" <jwelch at MIT.EDU>:

> On 10/8/04 1:43 PM, "Kerem B Limon" <k_limon at MIT.EDU> wrote:
> 
> >> If it's a dead partition on the same drive, but you can get to your boot
> >> drive, the LED says "Stuff is happening", but since you can use your boot
> >> drive, you kind of expect that, since you're using the drive.
> > 
> > Again, more useful information available with an HDD LED. Obviously I have
> > booted. I am having trouble accessing stuff on this partition--hey, the
> HDD
> > LED
> > is getting stuck solid on! Maybe it's not my software, there's something
> on
> > the
> > partition there.
> 
> Or maybe not. It isn't telling you anything but that there's activity. You
> don't know why. You don't even know what kind. Could be meat, could be cake.
> A light is telling you...MeatCake.

You haven't read my description carefully, obviously. It is not *just the state
of the LED* that is the information. It is the combination of *what I am
doing/attempting to do* and the *state of the LED over time* in response to
that that gives me clues. Repeat this enough times to establish a pattern, and
I've got the hints I need.

> 
> > 
> >> 
> >> If it's a dead secondary drive, that's simple too, it won't mount. The
> light
> >> is ONLY telling you that *something* is happening. Not if it's good or
> bad.
> >> However, a bit of observation will tell you that. If the drive head motor
> >> has completely died, the light will be useless since a) you won't be able
> to
> >> access the drive *anyway* and the light will only tell you that nothing
> is
> >> happening, since they don't tell you that the spindle motor is okay, but
> the
> >> head motor is dead and gone.
> > 
> > Again, I've booted I presume, and we can't access secondary drive. The
> > secondary
> > hasn't mounted and when I try to use a disk mounting tool or partitioning
> > utility, I cannot see the drive and/or the LED gets stuck if I see it and
> am
> > having trouble accessing it. I know what/how it's failing.
> 
> No you don't at all. You know exactly what you know without one. That you
> can't get to the drive. You don't know if it head crashed, because even with
> one of those, assuming you can't hear it, you get activity. You don't know
> if the VBM is hosed, your b-trees are corrupt, or if the IDE/SATA cable is
> shorting/worked it's way loose. All you know is that there is some form of
> activity, but you have no idea what it is.

I know conclusively that my OS is trying to a mount the drive, that the
motherboard/controller is attempting to access it, that the drive is actually
the one failing. Of course the LED isn't going to tell me what specifically
internally is wrong with the HDD--it doesn't have the 'bandwidth' to give me
that info. But, I do know what and how it's failing, as in component vs. the
rest of the system. As for hardware vs. software failures, the LED will behave
differently; a hardware failure on the HDD is likely to result in solid stuck
LEDs vs. a software failure is more likely to show a repeated/rapid flashing
access pattern, followed by failure to mount. Again, it's not just the LED that
is telling me things, it is what I do and how it responds to it, coupled with
other clues.

BTW, a crashed head can almost certainly be heard. It's a typical failure mode
accompanied with distinct acoustic signature. The only time you head nothing
from a crashed head is such a severe crash that the head has gotten the platter
completely wedged and the spindle motor cannot even let it free. Even then you
will hear a momentary attempt by the motor to get it going.

> 
> 
> If the only diag you have is a binary light, it's really not going to tell
> you much.
> 
> > 
> >> 
> >> In Janet's case...she can tell the drive is running, since she booted off
> of
> >> it. What is the light going to do in this case...."Yes, just in case you
> >> didn't trust the booting bit, your hard drive arm is swinging back and
> forth
> >> like Britney Spears' hindquarters".
> > 
> > She can tell if it is something to do with the hard drive or software/OS
> (i.e.
> > media vs. software). She opens the folder, the drive starts accessing
> > vigorously (perhaps not as much as Mrs. Spears) and then gets stuck on.
> It's
> > probably a corrupt sector coinciding with a bit of a file that contains
> > directory/index info or some metadata stream that gets accessed when the
> > directory listing is done.
> 
> Or, it's a corrupt file in /Library/Preferences, which would affect all
> logins. Something hosed in the DirectoryServices folder will cause ALLLLllll
> kinds of grief. 
> 
> Or, there's a corruption in nidb, so the finder can't tell what any user is
> allowed to do.
> 
> On and on, and on. These are all problems I've seen by the way. None of the
> disks had bad structures. Light would have done no good.

Again, read my comment more carefully (continued on below). All of the problems
you describe are typical file-level corruption (flipped bits, incorrect data
that got flushed to disk during some erroneous operation, etc.) examples. The
LED will behave differently if the source of corruption is actually a physical
defect; i.e., it will act differently if it cannot actually read the bits vs.
if it can read them but they do not make sense.

> 
> > On the other hand, the drive accesses, the freeze
> > does happen, but HDD activity ceases--it's a hung process or thread, most
> > likely, due to perhaps some corrupted binary.
> 
> Or the spindle motor just died. Or the arm motor. Or maybe the power cable
> to the drive is bad. Or there's a process that is just using a lot of CPU at
> the moment. QuickTime Player will do this. So will Snapz Pro on startup. If
> the authentication frameworks are having a bad day, you're going to see this
> a LOT. Safari will cause this kind of behavior too with the right web page
> problems

Again, you are omitting context. What did you just do to cause this symptom? Can
you repeat it and establish a pattern? What are the sounds coming from the hard
drive as this happens? You can certainly tell and feel the spindle motor dying.
You can certainly hear the arm crashing or the motor stopping mid-seek. You can
hear the hard drive lose power. Combine that with what the LED is doing and
what the computer seems to think is going on, you have a far better idea.

> 
> > Or perhaps the LED goes into
> > overdrive, flickering wildly and thrashing--perhaps the software is trying
> to
> > do the 'wrong thing'? Is it trying to access something unnecessarily? Is
> there
> > something else running?
> 
> Perhaps you're recording audio. Perhaps you're ram bound and you're in the
> swap a lot. Perhaps you have a compile/build/install going. Perhaps there's
> some damned prebinding thing going on.
> 
> Again, the light is only telling that *something* is happening. It's not
> telling you what or why. Only that certain mechanical processes are going
> on. Nothing more, nothing less.

Ah, context. Janet is not recording audio. She is just trying to boot up her
machine, login, and access a folder. What you're describing is not he context
of her problem. Let's diagnose the specifics...what I describe above applies
equally in a case when you are fairly (if not darn) sure that something else
with similar symptomology is not happening. And you again seem to miss my point
that it is not the light alone that is telling me something; it's the pattern
of the state of the light over time, and the consistency of that behaviour in
response to similar input over time that tells me what I am looking for, in the
context of the problem I am addressing at the time.

> 
> > 
> >> 
> >> When hard drives weren't a requirement for function, the light made
> sense.
> >> It makes sense in a headless situation, since you may want to just
> visually
> >> check without logging into the machine remotely.
> >> But when you're booting off the drive, it's redundant and silly. If you
> have
> >> your CPU unit out of immediate visual range, it's not immediately useful
> >> anyway.
> > 
> > You've narrowed down the case to a desktop or workstation, obviously not
> > headless. With cable limitations the way they are, how far from visual
> range
> > can the computer be, anyway? Certainly not far with the case of an iMac!
> And
> > since when is it more troubling to take a peek at the box those rare
> (ahem)
> > times you run into trouble as compared to opening up your box and trying
> to
> > make sense of diagnostic LEDs?
> 
> Again...you see the LED..it tells you something's happening.
> 
> But you can't even use the flicker rate as an indication, unless you're
> intimately familiar with normal behavior in various situations.
> 
> I'm not saying they're totally useless, but I haven't used them regularly in
> over 7 years, and I haven't missed them much. They're more of a
> "oooooh...LIGHTS"

Aha...now I understand. This is the cause of the problem. Since you haven't done
this in more than several years, I can see how you are missing my point. As
someone who's done just the opposite--using PCs and server hardware with HDD
LEDs standard for over 10 - 12 years, and having diagnosed and troubleshot many
such failures, I can immediately derive the bits/pieces of information I need
from the LED feedback, subject to the constraints I gave above. And having used
such machines routinely for so long, I am also intimately familiar with their
so-called 'normal behaviour'. Now, a tech or support person could be in the
same boat that I am or could have been taught this information, and could
further assist customers by telling them to describe what an LED is doing or
what to expect.

> 
> Quite literally, that's why the Xserve and the Xserve RAID have them. They
> make people feel good. They're useless diagnostically.

Untrue. At least in the case of anyone but Apple, it appears. I can bring forth
many sysadmins who will require drive LEDs, in most cases per-drive LEDs in
RAIDs, as proper RAIDs do have. This is one of the best ways to diagnose a
problematic drive--aside from the internal diagnostics or software
reporting--in an array of sometimes tens of drives on a rack. And often so
without even opening up anything or hooking into the control environment. I can
see in a RAID if the LED is flickering in concert (or sequence, depending on
the array type) with the other drives when I do an array wide access? Is it
stuck on while everything is off, etc.?

Sure, they're perhaps not absolutely necessary, as in it will work without them,
but then, most devices will work without any indicators, as well. They are
useful, because they tell me information I can use in diagnosing the problem.

> 
> 
> >> If your display or screen is on the fritz, you're probably not doing much
> >> with the machine at that point.
> > 
> > "display/screen" here in context meaning the GUI/UI is stuck or not
> responding
> > and not the hardware itself failing per se.
> 
> But that can happen while having nothing to do with the hard drive. If you
> have a 1999 PowerBook G3 with a first Rev memory controller and >64MB of
> RAM, I can make it happen on command. The memory controller had a bug. A
> hard drive light was useless in that case.

Errr, context once again. If you read back a couple of messages, I indicated
that most manufacturers, especially Apple, make ultra-quiet systems these days,
including setting the hard drive acoustic mode to 'quiet', it is sometimes
difficult to gauge if there is drive activity just listening to the computer or
trying to feel the drive spin. In those cases, I reasoned, the presence of an
HDD LED would help you if your screen froze or the application you were using
hung. Yes, by itself, a screen freeze can happen having nothing to do with an
HDD, but since we have narrowed down our issue to something to do with the
filesystem/HDD, it is nice to have a reliable activity indicator when all else
fails.

Again, combine the various bits of information/symptoms you have, and you can
narrow things down better. No single indicator, including the HDD LED by itself
will give you conclusive feedback; but an HDD LED is a particularly useful
indicator, that when combined with other existing indicators/symptomology, can
help you diagnose HDD/IDE/SCSI related issues faster and better.

> 
> john
> 
> -- 
> "Cluster bombing from B-52s is very, very accurate. The bombs are guaranteed
> to always hit the ground."
> USAF Ammo Troop.
> 
> 
> _______________________________________________
> Macpartners mailing list
> Macpartners at mit.edu
> http://mailman.mit.edu/mailman/listinfo/macpartners
> 

Kerem
Kerem B. Limon
kerem.limon at mit.edu /e-mail



More information about the Macpartners mailing list