weird_scsi_harddisk_problem

This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.



Subject: Problem with SCSI hard disc
From: Nikolaos Margaritis <nmargar@mail.demokritos.gr>
Date: Mon, 20 Dec 1999 11:32:35 +0200

I have a problem with my hard disc when I try to upgrade or
install an RPM whose files go in /usr. For many RPMs I can
hear a very intence noise coming from the hd.  After the
installation, the partition where the /usr lies (sda5) has
some inconsistencies. This is exactly what happened
recently, when I downloaded glibc-2.1.2-17.i386.rpm. My SCSI
controller is AIC-7890, on a ASUS P2B-DS motherboard. The
disc model is a Seagate ST34520W.

So when I type:

[root@iokasti][glibc]# rpm -ihv  glibc-2.1.2-17.i386.rpm
glibc                       ##################################################
scsi0: MEDIUM ERROR on channel 0, id 1, lun 0, CDB: Read (10) 00 00 39 b0 40 00 00 02 00
Info fld=0x39b041, Current sd08:05: sense key Medium Error
Additional sense indicates Unrecovered read error
scsidisk I/O error: dev 08:05, sector 3571764
EXT2-fs error (device sd(8,5)): ext2_write_inode: unable to read inode block - inode=448321, block=1785882
scsi0: MEDIUM ERROR on channel 0, id 1, lun 0, CDB: Read (10) 00 00 39 b0 40 00 00 02 00
Info fld=0x39b041, Current sd08:05: sense key Medium Error
Additional sense indicates Unrecovered read error
scsidisk I/O error: dev 08:05, sector 3571764
EXT2-fs error (device sd(8,5)): ext2_write_inode: unable to read inode block - inode=448322, block=1785882
scsi0: MEDIUM ERROR on channel 0, id 1, lun 0, CDB: Read (10) 00 00 39 b0 40 00 00 02 00
Info fld=0x39b041, Current sd08:05: sense key Medium Error
Additional sense indicates Unrecovered read error
scsidisk I/O error: dev 08:05, sector 3571764
EXT2-fs error (device sd(8,5)): ext2_write_inode: unable to read inode block - inode=448323, block=1785882
scsi0: MEDIUM ERROR on channel 0, id 1, lun 0, CDB: Read (10) 00 00 39 b0 40 00 00 02 00
Info fld=0x39b041, Current sd08:05: sense key Medium Error
Additional sense indicates Unrecovered read error
scsidisk I/O error: dev 08:05, sector 3571764
EXT2-fs error (device sd(8,5)): ext2_write_inode: unable to read inode block - inode=448324, block=1785882
scsi0: MEDIUM ERROR on channel 0, id 1, lun 0, CDB: Read (10) 00 00 39 b0 40 00 00 02 00
Info fld=0x39b041, Current sd08:05: sense key Medium Error
Additional sense indicates Unrecovered read error
scsidisk I/O error: dev 08:05, sector 3571764
EXT2-fs error (device sd(8,5)): ext2_write_inode: unable to read inode block - inode=448325, block=1785882
scsi0: MEDIUM ERROR on channel 0, id 1, lun 0, CDB: Read (10) 00 00 39 b0 40 00 00 02 00
Info fld=0x39b041, Current sd08:05: sense key Medium Error
Additional sense indicates Unrecovered read error
scsidisk I/O error: dev 08:05, sector 3571764
EXT2-fs error (device sd(8,5)): ext2_write_inode: unable to read inode block - inode=448326, block=1785882
scsi0: MEDIUM ERROR on channel 0, id 1, lun 0, CDB: Read (10) 00 00 39 b0 40 00 00 02 00
Info fld=0x39b041, Current sd08:05: sense key Medium Error
Additional sense indicates Unrecovered read error
scsidisk I/O error: dev 08:05, sector 3571764
EXT2-fs error (device sd(8,5)): ext2_write_inode: unable to read inode block - inode=448327, block=1785882
scsi0: MEDIUM ERROR on channel 0, id 1, lun 0, CDB: Read (10) 00 00 39 b0 40 00 00 02 00
Info fld=0x39b041, Current sd08:05: sense key Medium Error
Additional sense indicates Unrecovered read error
scsidisk I/O error: dev 08:05, sector 3571764
EXT2-fs error (device sd(8,5)): ext2_write_inode: unable to read inode block - inode=448328, block=1785882

   --------------There is more ------------------------

Afterwards, I went in single user mode (telinit 1), unmount /usr and typed:

[root@iokasti][/]# fsck /usr
Parallelizing fsck version 1.18 (11-Nov-1999)
e2fsck 1.18, 11-Nov-1999 for EXT2 FS 0.5b, 95/08/09
/dev/sda5 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
scsi0: MEDIUM ERROR on channel 0, id 1, lun 0, CDB: Read (10) 00 00 39 b0 34 00 00 1c 00
Info fld=0x39b041, Current sd08:05: sense key Medium Error
Additional sense indicates Unrecovered read error
scsidisk I/O error: dev 08:05, sector 3571764
scsi0: MEDIUM ERROR on channel 0, id 1, lun 0, CDB: Read (10) 00 00 39 b0 40 00 00 02 00
Info fld=0x39b041, Current sd08:05: sense key Medium Error
Additional sense indicates Unrecovered read error
scsidisk I/O error: dev 08:05, sector 3571764
Error reading block 1785882 (Attempt to read block from filesystem resulted in short read) while doing inode scan.  Ignore error<y>? yes

Pass 2: Checking directory structure
Entry 'LC_CTYPE' in /share/locale/et_EE (448403) has deleted/unused inode 448321.  Clear<y>? yes

Entry 'LC_MONETARY' in /share/locale/et_EE (448403) has deleted/unused inode 448322.  Clear<y>? yes

Entry 'LC_NUMERIC' in /share/locale/et_EE (448403) has deleted/unused inode 448323.  Clear<y>? yes

Entry 'LC_TIME' in /share/locale/et_EE (448403) has deleted/unused inode 448324.  Clear<y>? yes

Entry 'SYS_LC_MESSAGES' in /share/locale/fr_CA/LC_MESSAGES (448409) has deleted/unused inode 448325.  Clear<y>? yes

Entry 'LC_COLLATE' in /share/locale/fr_CH (448411) has deleted/unused inode 448326.  Clear<y>? yes

Entry 'LC_CTYPE' in /share/locale/fr_CH (448411) has deleted/unused inode 448327.  Clear<y>? yes

Entry 'LC_MONETARY' in /share/locale/fr_CH (448411) has deleted/unused inode 448328.  Clear<y>? yes

Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
scsi0: MEDIUM ERROR on channel 0, id 1, lun 0, CDB: Read (10) 00 00 39 b0 40 00 00 02 00
Info fld=0x39b041, Current sd08:05: sense key Medium Error
Additional sense indicates Unrecovered read error
scsidisk I/O error: dev 08:05, sector 3571764
Block bitmap differences:  -1786644 -1786645 -1786646 -1786647 -1786648 -1786649 -1786650 -1786651 -1786652 -1786653 -1786654 -1786660 -1786661 -1786662 -1786663 -1786664 -1786665 -1786666 -1786667 -1786668 -1786669 -1786670 -1786671 -1786672 -1786673 -1786674 -1786675 -1786676 -1786677 -1786678 -1786679 -1786680 -1786681 -1786682 -1786683 -1786684 -1786685 -1786686 -1786687 -1786688 -1786689 -1786690 -1786691 -1786692 -1786734 -1786735 -1786736 -1786737 -1786738 -1786739 -1786740 -1786741 -1786742 -1786743 -1786744 -1786745 -1786746 -1787280
Fix<y>?
...................
etc etc etc


Eventually the problem seems to dissappear and sda5 seems fixed. But when I
reinstall the package in order to replace the lost files the same problem
arises...

So, is there a problem with the surface of the hard dics, or a problem with the
SCSI controller, or the kernel, or the ext2 filesystem, or e2fsck?????

Any help will be greatly appreciated,

===

Subject: Re: Problem with SCSI hard disc
From: Alan Cox <alan@lxorguk.ukuu.org.uk>
Date: Mon, 20 Dec 1999 12:50:51 +0000 (GMT)


> downloaded  glibc-2.1.2-17.i386.rpm. My SCSI controller is AIC-7890, on a ASUS
> P2B-DS motherboard. The disc model is a Seagate ST34520W.

The disk seems to have a problem

> scsi0: MEDIUM ERROR on channel 0, id 1, lun 0, CDB: Read (10) 00 00 39 b0 40 00 00 02 00
> Info fld=0x39b041, Current sd08:05: sense key Medium Error
> Additional sense indicates Unrecovered read error

This is a failure from the disk itself. Medium error indicates the media (the
disk) is the problem, in your case its an unrecovered read error.  If the disk
is under guarantee you might want to return it if possible, if not then a
scsi verify from the bios should remap the bad blocks. Often however you
just keep getting more bad blocks once a disk starts to fail.


===

Subject: Re: Problem with SCSI hard disc
From: Uncle George <gatgul@voicenet.com>
Date: Mon, 20 Dec 1999 09:35:56 -0500

Who whould be responsible for the sd.c driver ?

gat

Uncle George wrote:
>
> on cpu(i386)/card hardware initialization via the scsi bios, I do not
> allow the scsi card to spin up my scsi drives.
> with 2.0.36 ( as from rh5.2 ) the drives are spun up by the
> initialization of the kernel/scsi drivers
>
> with 2.2.12pre12 the scsi drives do spin up. But apparently they get
> hosed. They give device timeouts for the drive.  Hitting the reset
> button does nothing. I have to turn off the machine, and load the 2.0.36
> system
>
> If i load up 2.0.26 first ( to spin up the drives ), then I can reboot
> ( witout shutting down the system ) with the 2.2.12 kernel without
> incident. This happens fairly consistently.
>
> gat
> Are u the person loooking into the adaptec stuff ?

This sounds like a problem in the 2.2.12 sd.c driver code, not the aic7xxx
driver code (at least it shouldn't be in the aic7xxx driver code because it
doesn't spin the drives up).


===

Subject: Re: Problem with SCSI hard disc
From: Alan Cox <alan@lxorguk.ukuu.org.uk>
Date: Mon, 20 Dec 1999 17:55:16 +0000 (GMT)


> This sounds like a problem in the 2.2.12 sd.c driver code, not the aic7xxx
> driver code (at least it shouldn't be in the aic7xxx driver code because it
> doesn't spin the drives up).

There are no known spin up problems in the 2.2.12 scsi code. There is one
in a few 2.2.14pre releases but that is all

===

Subject: Re: Problem with SCSI hard disc
From: Alan Cox <alan@lxorguk.ukuu.org.uk>
Date: Mon, 20 Dec 1999 20:12:14 +0000 (GMT)


>     So how do I make this unknown SCSI spin-up problem known, and possibly
> resolved.
> gat

Post a report to linux-scsi@vger.rutgers.edu. Preferably rebuild your kernel
with ikd if the kernel is hanging so you get a deadlock trace and try some
of the scsi debugging options

[If you are thinking help what is ikd, whats a scsi debug option just post
 the hw info and errors]

===




the rest of The Pile (a partial mailing list archive)

doom@kzsu.stanford.edu