bad_drive_problem_perhaps

This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.



Subject: Re: RH 6.2 goes bonkers in 2 days!!!
From: Hal Burgiss <hburgiss@bellsouth.net>
Date: Wed, 7 Jun 2000 16:16:53 -0400


On Wed, Jun 07, 2000 at 03:19:25PM -0400, Jerry Human wrote:
> I sure hope you can help me with this one. Last week I installed a
> WD Caviar 102AA 10 gig hard drive and made four partitions: hdc1
> four gig, hdc2 100 meg, hdc3 three gig, and hdc4 three gig. I
> installed RH 6.2 on hdc1 and used hdc2 for the swap file. The drives
> hda and hdb are used for DOS and Win95.
> 
> After a two days on boot up I get a prompt to enter root password
> for maintenance to run fsck manually. When I do, I get a series of
> errors for inodes 2 through 283692:
> 
>     Inode xxxxx has imagic flag set. Clear <y>?
>     Inode xxxxx is in use, but has dtime set. Fix <y>?
>     Inode xxxxx has illegal block(s). Clear <y>?
>     Inode xxxxx ref count is x, should be x. Fix <y>?
>     i_fsize for inode xxxxx (...) is xxx, should be xxx. Clear <y>?
>     Inode xxxxx (...) has a bad mode (xxxxx). Clear <y>?
>     i_frag for inode xxxxx (...) is xxx, should be xxx. Clear <y>?
>     i_faddr for inode xxxxx (...) is xxx, should be xxx. Clear <y>?
> 
> After ~ 40 minutes of holding the 'Y' key down, fsck reports the fs
> is currupted and is automatically restarting and I spend another ~40
> minutes holding the 'Y' key down. Upon completion, shut down and
> reboot. RH does start and prompts for login. After login, RH seems
> to work slugishly and some things don't work anymore. Checking the
> /lost + found dir reveals ~16 screens of numbers in various colors,
> dark blue, yellow, brown, etc. I assume these were put here by fsck.
> 
> The first time this happened I just blew of RH and formatted the
> partition and installed again from CD. Now it's happened again.
> 
> Can anyone tell me what is happening and perhaps why and possibly a
> path/ method to fixing it permanetly?

Don't want to be the bearer of bad news, but I recently had some
similar errors on a newish WD caviar, and now RH is running on a
Maxtor instead. Maybe mine weren't so bad. I could run several days,
maybe a week or two, before errors. I did several partial re-installs.
Blamed it on SMP for a while. IOW, I would suspect the drive. Things
you might try: try a different cable. Make sure cable is seated
properly. Use a memory testing program like memtest86 to check RAM.
Also, cpu and mboard stress testers like burnCPU and burnBX (not exact
names, I forget). If using hdparm to tweak, turn it all off. I did all
this, and got it down to the drive. Zero errors on the Maxtor now (up
21 days). I also have another WD which is not doing this. The WD in
question, replaced a WD that died abruptly well before its time. I am
a little soured on WD at this point. I hope this is not it for you.

===

Subject: RE: RH 6.2 goes bonkers in 2 days!!!
From: "Burke, Thomas G." <thomas_g_burke@md.northgrum.com>
Date: Wed, 7 Jun 2000 16:50:07 -0400 

Jerry Human [SMTP:jerrbare@worldspy.net] wrote:
> 
> I sure hope you can help me with this one. Last week I installed a WD
> Caviar 102AA 10 gig hard drive and made four partitions: hdc1 four gig,
> hdc2 100 meg, hdc3 three gig, and hdc4 three gig. I installed RH 6.2 on
> hdc1 and used hdc2 for the swap file. The drives hda and hdb are used
> for DOS and Win95.
> 
> After a two days on boot up I get a prompt to enter root password for
> maintenance to run fsck manually. When I do, I get a series of errors
> for inodes 2 through 283692:
> 
>     Inode xxxxx has imagic flag set. Clear <y>?
>     Inode xxxxx is in use, but has dtime set. Fix <y>?
>     Inode xxxxx has illegal block(s). Clear <y>?
>     Inode xxxxx ref count is x, should be x. Fix <y>?
>     i_fsize for inode xxxxx (...) is xxx, should be xxx. Clear <y>?
>     Inode xxxxx (...) has a bad mode (xxxxx). Clear <y>?
>     i_frag for inode xxxxx (...) is xxx, should be xxx. Clear <y>?
>     i_faddr for inode xxxxx (...) is xxx, should be xxx. Clear <y>?
> 
> After ~ 40 minutes of holding the 'Y' key down, fsck reports the fs is
> currupted and is automatically restarting and I spend another ~40
> minutes holding the 'Y' key down. Upon completion, shut down and reboot.
> RH does start and prompts for login. After login, RH seems to work
> slugishly and some things don't work anymore. Checking the /lost + found
> dir reveals ~16 screens of numbers in various colors, dark blue, yellow,
> brown, etc. I assume these were put here by fsck.
> 
> The first time this happened I just blew of RH and formatted the
> partition and installed again from CD. Now it's happened again.
> 
> Can anyone tell me what is happening and perhaps why and possibly a
> path/ method to fixing it permanetly?

Yous gots a bad drive... (or possibly controller)

Those messages are all inode errors, which means that the FATs don't jibe
with what's on the drive...  lost+found is some extra space that unices set
aside for when the drive gets full.  there should be nothing in there unless
the drive is filled.  that something _does_ show up in there seems
indicative that you've got real problems.


===

Subject: RE:RH 6.2 goes bonkers in 2 days!!!
From: Frank Carreiro <fcarreiro@keylabs.com>
Date: Wed, 07 Jun 2000 23:04:34 +0000


I wonder if your running into the same thing I was...

Check your /etc/fstab file and make sure the sixth field has been setup
correctly.  1 for your root partition and 2 for all other partitions you
want mounted.

I noticed everything (almost) was a 1.  This is bad :-).  Check it out.

===

Subject: Re: RH 6.2 goes bonkers in 2 days!!!
From: Michael George <george@mintcity.com>
Date: Wed, 7 Jun 2000 21:14:24 -0400

On Jun 07, Burke, Thomas G. wrote:
> 
> lost+found is some extra space that unices set
> aside for when the drive gets full.  there should be nothing in there unless
> the drive is filled.  that something _does_ show up in there seems
> indicative that you've got real problems.

I'm not sure where you heard this from, but it's incorrect.  lost+found is for
files which become "detached" during fsck.  Rather than toss out all the data,
they are attached to the lost+found directory with their i-node number as
their filename.

Hence the name "lost & found"...

===

Subject: Re: RH 6.2 goes bonkers in 2 days!!!
From: "CH" <krikofer@cwnet.com>
Date: Wed, 7 Jun 2000 22:21:41 -0700


> After ~ 40 minutes of holding the 'Y' key down, fsck reports the fs is
> currupted and is automatically restarting and I spend another ~40
> minutes holding the 'Y' key down. Upon completion, shut down and reboot.
> RH does start and prompts for login. After login, RH seems to work
> slugishly and some things don't work anymore. Checking the /lost + found
> dir reveals ~16 screens of numbers in various colors, dark blue, yellow,
> brown, etc. I assume these were put here by fsck.

Did you actually keep your finger on 'Y' key for 40 mins
then another 40 min?  I must have misunderstood you or am I
understanding it.  It would have bruised my finger. ;-D

===

Subject: Re: RH 6.2 goes bonkers in 2 days!!!
From: Michael George <george@mintcity.com>
Date: Thu, 8 Jun 2000 09:54:03 -0400


On Jun 07, Hal Burgiss wrote:
> 
> Don't want to be the bearer of bad news, but I recently had some
> similar errors on a newish WD caviar, and now RH is running on a
> Maxtor instead. Maybe mine weren't so bad. I could run several days,
> maybe a week or two, before errors. I did several partial re-installs.
> Blamed it on SMP for a while. IOW, I would suspect the drive. Things
> you might try: try a different cable. Make sure cable is seated
> properly. Use a memory testing program like memtest86 to check RAM.
> Also, cpu and mboard stress testers like burnCPU and burnBX (not exact
> names, I forget). If using hdparm to tweak, turn it all off. I did all
> this, and got it down to the drive. Zero errors on the Maxtor now (up
> 21 days). I also have another WD which is not doing this. The WD in
> question, replaced a WD that died abruptly well before its time. I am
> a little soured on WD at this point. I hope this is not it for you.

I have an old system with WDs in it.  The first two have worked flawlessly
since they were installed.  I just put in a third one, though, and I got
similar problems as described.  However, it's only happened once and it was
after a test install of another system.  That's no excuse, but it might shed
some light.

I don't have a lot of money to spend on my system, so I'm not about to go out
and buy a new drive unless I need one...  as long as this latest WD keeps its
wits about it, I'm going to leave it alone...

===

Subject: Re: RH 6.2 goes bonkers in 2 days!!!
From: Hal Burgiss <hburgiss@bellsouth.net>
Date: Thu, 8 Jun 2000 11:45:37 -0400


On Thu, Jun 08, 2000 at 09:54:03AM -0400, Michael George wrote:
> On Jun 07, Hal Burgiss wrote:
> I have an old system with WDs in it.  The first two have worked flawlessly
> since they were installed.  I just put in a third one, though, and I got
> similar problems as described.  However, it's only happened once and it was
> after a test install of another system.  That's no excuse, but it might shed
> some light.
> 
> I don't have a lot of money to spend on my system, so I'm not about to go out
> and buy a new drive unless I need one...  as long as this latest WD keeps its
> wits about it, I'm going to leave it alone...

I would suggest every week or so, run 'e2fsck -c' to see if errors
keep occuring. Just for peace of mind. If so, they likely will get
worse, based on my recent experience.  

===

Subject: Re: RH 6.2 goes bonkers in 2 days!!!
From: Jerry Human <jerrbare@worldspy.net>
Date: Thu, 08 Jun 2000 12:55:02 -0400



Hal Burgiss wrote:

> On Thu, Jun 08, 2000 at 09:54:03AM -0400, Michael George wrote:

> > On Jun 07, Hal Burgiss wrote:

> > I have an old system with WDs in it.  The first two have
> > worked flawlessly since they were installed.  I just put
> > in a third one, though, and I got similar problems as
> > described.  However, it's only happened once and it was
> > after a test install of another system.  That's no
> > excuse, but it might shed some light.

> > I don't have a lot of money to spend on my system, so
> > I'm not about to go out and buy a new drive unless I
> > need one...  as long as this latest WD keeps its wits
> > about it, I'm going to leave it alone...

> I would suggest every week or so, run 'e2fsck -c' to see
> if errors keep occuring. Just for peace of mind. If so,
> they likely will get worse, based on my recent experience.

Ok, Good People:

I'm going to attempt to answer all the many replies in this email.

Most of you seem to think I have a bad drive. This is a very
definite possibility.  Therefore I surfed to the WD support
site and downloaded their install/diags software and tested
it. Didn't find any real errors but UMD was turned off,
which I corrected and reset the drive.

With that done, I wiped the drive and installed RH 6.2 a
fourth time. Time will tell if that is the end of the
problems.

Unfortunately, my bios only supports one IDE channel so
previously I installed a controller that handles two EIDE
channels giving me a total of four drive capacity.  That has
become: hda - one gig Maxtor, hdb - 340 meg WD, hdc - 10.2
gig WD and hdd - 40X CD.

As I have mentioned before, as money permits, I will build
another box and hdc will become hda in the new box. In the
mean time, it is in this box to get Linux on it, to give me
something to learn Linux on, to give me something to learn
C++ on and possibly something to basically replace
Windoze. Once the hard drive problem is overcome and I get a
stable RH install and manage to get on the web with RH, I'll
be able to settle into learning all I can about it and the
637 packages that seem to get installed each time.

I mentioned before that I didn't install everything. To
clarify, I have installed everything except the
servers. That is to say, I did not install the software for
a web server, news server, ftp server, nfs server, smb
server, etc. At this point, when I learn how to do it I will
have a dialup connection to the web and won't be able to
serve anything so the servers were left out. Everything
else, including utilities, dev software, desktops, web
client, publishing software, etc. was installed. If I've
left out something that I need, please tell me. I am still a
struggling newbie trying to learn as much as I can.

I really appreciate all the replies I've received so far. I
hope I'm not becoming a pest on this list. I've always felt
that asking someone who knows is much better than making
endless mistakes and ending with something that could be an
embarrassment and non functioning.

===

Subject: Re: RH 6.2 goes bonkers in 2 days!!!
From: Michael George <george@mintcity.com>
Date: Thu, 8 Jun 2000 13:24:05 -0400


On Jun 08, Hal Burgiss wrote:
> 
> I would suggest every week or so, run 'e2fsck -c' to see if errors
> keep occuring. Just for peace of mind. If so, they likely will get
> worse, based on my recent experience.  

Thanks for the suggestion, I may just do that.  Too bad there isn't a way to
run it on a mounted (but idle) filesystem.

After checking the man page, what would happen if I ran "e2fsck -c -n" on my
/usr/local partition (the one in question and the one that will be idle during
the night)?  My wife will be less-than-impressed if I have to log her out
every week or so to run e2fsck...  She'll think of that as like rebooting
Windoze...

===

Subject: Re: RH 6.2 goes bonkers in 2 days!!!
From: "John J. Donohue" <jdonohue@mcallen.lib.tx.us>
Date: Sun, 11 Jun 2000 14:25:57 -0500 (CDT)


On Wed, 7 Jun 2000, Jerry Human wrote:

> I sure hope you can help me with this one. Last week I installed a WD
> Caviar 102AA 10 gig hard drive and made four partitions: hdc1 four gig,
> hdc2 100 meg, hdc3 three gig, and hdc4 three gig. I installed RH 6.2 on
> hdc1 and used hdc2 for the swap file. The drives hda and hdb are used
> for DOS and Win95.
> 
> After a two days on boot up I get a prompt to enter root password for
> maintenance to run fsck manually. When I do, I get a series of errors
> for inodes 2 through 283692:

There's a recall on of a certain serial number range of those drives. Go
to www.westerndigital.com to see if yours is one of them.

===
Subject: Re: RH 6.2 goes bonkers in 2 days!!!
From: "Steven Pierce" <steven_pierce@powerinter.net>
Date: Sun, 11 Jun 2000 15:56:35 -0700




On 6/11/2000 at 2:25 PM John J. Donohue wrote:

>On Wed, 7 Jun 2000, Jerry Human wrote:
>
>> I sure hope you can help me with this one. Last week I installed a WD
>> Caviar 102AA 10 gig hard drive and made four partitions: hdc1 four gig,
>> hdc2 100 meg, hdc3 three gig, and hdc4 three gig. I installed RH 6.2 on
>> hdc1 and used hdc2 for the swap file. The drives hda and hdb are used
>> for DOS and Win95.
>> 
>> After a two days on boot up I get a prompt to enter root password for
>> maintenance to run fsck manually. When I do, I get a series of errors
>> for inodes 2 through 283692:
>> 
>There's a recall on of a certain serial number range of those drives. Go
>to www.westerndigital.com to see if yours is one of them.

John,

First off, it could be the way your BIOS is looking at the
drive.  I use to work for WD in tech support.  If I remember
correctly this was not one of the drives that was recalled.
But again it has been a while so I could be wrong.  You can
find the serial number on the drive by using something
called diag.  Go to this link and you will be able to dnload
the util.  http://www.wdc.com/service/ftp/drives.html It is
called datalife guard tools.  Run that on the drive, there
is two option that you can run.  One is going to clean off
the drive, and the other is just going to check it.  I would
run the check first, then may be the one that will destroy
the data.  Also there should be a check to see if the serial
number has been recalled.  If so call the tech support
number on the form and they can help.  Tell them you want to
do an advance replacement, it will REQUIRE a credit card.

If you still need help write me off the list.
steven_pierce@powerinter.net

===


the rest of The Pile (a partial mailing list archive)

doom@kzsu.stanford.edu