Hi, I am having a hell of a time with a new AIT3 drive we have. I have a folder with 21GB of files, all around 12MB, that I want to write to tape. I am running Debian 3.0 with the 2.4.18 kernel. The machine is Dual Athlon 1500's on a Tyan TigerMP motherboard with 1.5GB of ram (dmesg is below). The tape drive is a Sony SDX-700C connected with a very high quality Granite Digital U320 certified SCSI cable and terminator. It is the only device on the controller . The machine also has a 3Ware IDE Raid controller with 8-80GB Maxtor drives in a raid 5 array.
It startes writing out fine then it dies after a random number of files. The exact command I am using is....
pinky:/# tar -b 1024 -cvf /dev/nst0 /Rogue/Renders/CorePost/CO-202/ > tape0001_CO-202.log tar: Removing leading `/' from member names tar: /dev/nst0: Wrote only 65536 of 524288 bytes tar: Error is not recoverable: exiting now
The tapes are 100GB tapes and I am only trying to write 21GB, so it's not that.
pinky:/# du -hs /Rogue/Renders/CorePost/CO-202/ 21G /ntfs/Rogue/Renders/CorePost/CO-202
In syslog I get....
Dec 13 12:05:24 pinky kernel: (scsi0:A:6:0): Unexpected busfree in Data-out phase Dec 13 12:05:24 pinky kernel: SEQADDR == 0x8a Dec 13 12:05:24 pinky kernel: st0: Error 70000 (sugg. bt 0x0, driver bt 0x0, host bt 0x7). Dec 13 12:05:24 pinky kernel: st0: Error 8 (sugg. bt 0x0, driver bt 0x0, host bt 0x0). Dec 13 12:05:24 pinky kernel: st0: Error 8 (sugg. bt 0x0, driver bt 0x0, host bt 0x0). Dec 13 12:05:24 pinky kernel: st0: Error on write filemark.
At one point I got a much nastier message, but it's pretty huge, so it's at the bottom of the email.
I did some google'ing and I didn't find much (well, I did find lots of posts with the 'Error on write filemark" but they all had a Medium Sense error first.). I have tried all kinds of different block sizes for tar (1024 is the highest I have tried), I have tried different SCSI cards/cables/terminators. Pretty much everything I can think of.
After talking with Tier1 support at Sony a week ago, they thought the drive was dead, so they sent out anther one, which does the exact same thing. I spent an hour on the phone with Tier 1 again today (he didn't even know what tar was), I think I have finally been bumped up to Tier 2, but I am waiting to hear back from them.
Does anyone have any ideas? I really need to get this thing working ( I have 3.5TB to archive, ideally before I leave next weekend).
thanks shawn
shawn@pinky:~$ dmesg Linux version 2.4.18-200211301 (root@pinky) (gcc version 2.95.4 20011002 (Debian prerelease)) #1 SMP Sun Dec 1 13:37:47 CST 2002 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009f000 (usable) BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved) BIOS-e820: 00000000000e4800 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000003fff0000 (usable) BIOS-e820: 000000003fff0000 - 000000003ffffc00 (ACPI data) BIOS-e820: 000000003ffffc00 - 0000000040000000 (ACPI NVS) BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved) BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved) BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved) 127MB HIGHMEM available. found SMP MP-table at 000f7510 hm, page 000f7000 reserved twice. hm, page 000f8000 reserved twice. hm, page 0009f000 reserved twice. hm, page 000a0000 reserved twice. On node 0 totalpages: 262128 zone(0): 4096 pages. zone(1): 225280 pages. zone(2): 32752 pages. Intel MultiProcessor Specification v1.4 Virtual Wire compatibility mode. OEM ID: TYAN Product ID: GUINNESS APIC at: 0xFEE00000 Processor #1 Pentium(tm) Pro APIC version 16 Processor #0 Pentium(tm) Pro APIC version 16 I/O APIC #2 Version 17 at 0xFEC00000. Processors: 2 Kernel command line: BOOT_IMAGE=Linux ro root=302 Initializing CPU#0 Detected 1533.419 MHz processor. Console: colour VGA+ 80x25 Calibrating delay loop... 3060.53 BogoMIPS Memory: 1028468k/1048512k available (2068k kernel code, 19656k reserved, 504k da ta, 232k init, 131008k highmem) Dentry-cache hash table entries: 131072 (order: 8, 1048576 bytes) Inode-cache hash table entries: 65536 (order: 7, 524288 bytes) Mount-cache hash table entries: 16384 (order: 5, 131072 bytes) Buffer-cache hash table entries: 65536 (order: 6, 262144 bytes) Page-cache hash table entries: 262144 (order: 8, 1048576 bytes) CPU: Before vendor init, caps: 0383fbff c1cbfbff 00000000, vendor = 2 CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 256K (64 bytes/line) CPU: After vendor init, caps: 0383fbff c1cbfbff 00000000 00000000 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. CPU: After generic, caps: 0383fbff c1cbfbff 00000000 00000000 CPU: Common caps: 0383fbff c1cbfbff 00000000 00000000 Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Checking 'hlt' instruction... OK. POSIX conformance testing by UNIFIX CPU: Before vendor init, caps: 0383fbff c1cbfbff 00000000, vendor = 2 CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 256K (64 bytes/line) CPU: After vendor init, caps: 0383fbff c1cbfbff 00000000 00000000 Intel machine check reporting enabled on CPU#0. CPU: After generic, caps: 0383fbff c1cbfbff 00000000 00000000 CPU: Common caps: 0383fbff c1cbfbff 00000000 00000000 CPU0: AMD Athlon(tm) MP Processor 1800+ stepping 02 per-CPU timeslice cutoff: 731.39 usecs. enabled ExtINT on CPU#0 ESR value before enabling vector: 00000000 ESR value after enabling vector: 00000000 Booting processor 1/0 eip 2000 Initializing CPU#1 masked ExtINT on CPU#1 ESR value before enabling vector: 00000000 ESR value after enabling vector: 00000000 Calibrating delay loop... 3060.53 BogoMIPS CPU: Before vendor init, caps: 0383fbff c1cbfbff 00000000, vendor = 2 CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 256K (64 bytes/line) CPU: After vendor init, caps: 0383fbff c1cbfbff 00000000 00000000 Intel machine check reporting enabled on CPU#1. CPU: After generic, caps: 0383fbff c1cbfbff 00000000 00000000 CPU: Common caps: 0383fbff c1cbfbff 00000000 00000000 CPU1: AMD Athlon(tm) Processor stepping 02 Total of 2 processors activated (6121.06 BogoMIPS). ENABLING IO-APIC IRQs Setting 2 in the phys_id_present_map ...changing IO-APIC physical APIC ID to 2 ... ok. init IO_APIC IRQs IO-APIC (apicid-pin) 2-0, 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23 not co nnected. ..TIMER: vector=0x31 pin1=2 pin2=0 number of MP IRQ sources: 16. number of IO-APIC #2 registers: 24. testing the IO APIC.......................
IO APIC #2...... .... register #00: 02000000 ....... : physical APIC id: 02 .... register #01: 00170011 ....... : max redirection entries: 0017 ....... : PRQ implemented: 0 ....... : IO APIC version: 0011 .... register #02: 00000000 ....... : arbitration: 00 .... IRQ redirection table: NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: 00 000 00 1 0 0 0 0 0 0 00 01 003 03 0 0 0 0 0 1 1 39 02 003 03 0 0 0 0 0 1 1 31 03 003 03 0 0 0 0 0 1 1 41 04 003 03 0 0 0 0 0 1 1 49 05 003 03 1 1 0 1 0 1 1 51 06 003 03 0 0 0 0 0 1 1 59 07 003 03 0 0 0 0 0 1 1 61 08 003 03 0 0 0 0 0 1 1 69 09 003 03 0 0 0 0 0 1 1 71 0a 003 03 1 1 0 1 0 1 1 79 0b 003 03 1 1 0 1 0 1 1 81 0c 003 03 0 0 0 0 0 1 1 89 0d 003 03 0 0 0 0 0 1 1 91 0e 003 03 0 0 0 0 0 1 1 99 0f 003 03 0 0 0 0 0 1 1 A1 10 000 00 1 0 0 0 0 0 0 00 11 000 00 1 0 0 0 0 0 0 00 12 000 00 1 0 0 0 0 0 0 00 13 000 00 1 0 0 0 0 0 0 00 14 000 00 1 0 0 0 0 0 0 00 15 000 00 1 0 0 0 0 0 0 00 16 000 00 1 0 0 0 0 0 0 00 17 000 00 1 0 0 0 0 0 0 00 IRQ to pin mappings: IRQ0 -> 0:2 IRQ1 -> 0:1 IRQ3 -> 0:3 IRQ4 -> 0:4 IRQ5 -> 0:5 IRQ6 -> 0:6 IRQ7 -> 0:7 IRQ8 -> 0:8 IRQ9 -> 0:9 IRQ10 -> 0:10 IRQ11 -> 0:11 IRQ12 -> 0:12 IRQ13 -> 0:13 IRQ14 -> 0:14 IRQ15 -> 0:15 .................................... done. Using local APIC timer interrupts. calibrating APIC timer ... ..... CPU clock speed is 1533.3676 MHz. ..... host bus clock speed is 266.6726 MHz. cpu: 0, clocks: 2666726, slice: 888908 CPU0T0:2666720,T1:1777808,D:4,S:888908,C:2666726 cpu: 1, clocks: 2666726, slice: 888908 CPU1T0:2666720,T1:888896,D:8,S:888908,C:2666726 checking TSC synchronization across CPUs: passed. Waiting on wait_init_idle (map = 0x2) All processors have done init_idle PCI: PCI BIOS revision 2.10 entry at 0xfd7e0, last bus=3 PCI: Using configuration type 1 PCI: Probing PCI hardware Unknown bridge resource 0: assuming transparent Unknown bridge resource 0: assuming transparent Unknown bridge resource 2: assuming transparent BIOS failed to enable PCI standards compliance, fixing this error. I/O APIC: AMD Errata #22 may be present. In the event of instability try : booting with the "noapic" option. Linux NET4.0 for Linux 2.4 Based upon Swansea University Computer Society NET3.039 Initializing RT netlink socket Starting kswapd allocated 32 pages and 32 bhs reserved for the highmem bounces Journalled Block Device driver loaded NTFS driver v1.1.22 [Flags: R/O] SGI XFS with ACLs, quota, no debug enabled pty: 256 Unix98 ptys configured Serial driver version 5.05c (2001-07-08) with MANY_PORTS SHARE_IRQ SERIAL_PCI en abled ttyS00 at 0x03f8 (irq = 4) is a 16550A ttyS01 at 0x02f8 (irq = 3) is a 16550A block: 128 slots per queue, batch=32 Uniform Multi-Platform E-IDE driver Revision: 6.31 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx AMD7411: IDE controller on PCI bus 00 dev 39 AMD7411: chipset revision 1 AMD7411: not 100% native mode: will probe irqs later ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:pio ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:pio, hdd:pio hda: MAXTOR 6L020J1, ATA DISK drive hdc: HL-DT-ST CD-ROM GCR-8520B, ATAPI CD/DVD-ROM drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 ide1 at 0x170-0x177,0x376 on irq 15 hda: 40132503 sectors (20548 MB) w/1818KiB Cache, CHS=2498/255/63 hdc: ATAPI 52X CD-ROM drive, 128kB Cache, DMA Uniform CD-ROM driver Revision: 3.12 Partition check: hda: hda1 hda2 hda3 hda4 < hda5 hda6 hda7 > FDC 0 is a post-1991 82077 eepro100.c:v1.09j-t 9/29/99 Donald Becker http://www.scyld.com/network/eepro100. html eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin <saw@sa w.sw.com.sg> and others eth0: Intel Corp. 82557 [Ethernet Pro 100], 00:03:47:00:49:AF, IRQ 10. Receiver lock-up bug exists -- enabling work-around. Board assembly 711269-005, Physical connectors present: RJ45 Primary interface chip i82555 PHY #1. General self-test: passed. Serial sub-system self-test: passed. Internal registers self-test: passed. ROM checksum self-test: passed (0x24c9f043). Receiver lock-up workaround activated. eth1: Intel Corp. 82557 [Ethernet Pro 100] (#2), 00:03:47:00:49:B0, IRQ 11. Receiver lock-up bug exists -- enabling work-around. Board assembly 711269-005, Physical connectors present: RJ45 Primary interface chip i82555 PHY #1. General self-test: passed. Serial sub-system self-test: passed. Internal registers self-test: passed. ROM checksum self-test: passed (0x24c9f043). Receiver lock-up workaround activated. Linux agpgart interface v0.99 (c) Jeff Hartmann agpgart: Maximum main memory to use for agp memory: 816M agpgart: Detected AMD AMD 760MP chipset agpgart: AGP aperture is 64M @ 0xf8000000 SCSI subsystem driver Revision: 1.00 scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.4 <Adaptec 29160 Ultra160 SCSI adapter> aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs
Vendor: SONY Model: SDX-700C Rev: 0101 Type: Sequential-Access ANSI SCSI revision: 02 (scsi0:A:2): 80.000MB/s transfers (40.000MHz, offset 127, 16bit) 3ware Storage Controller device driver for Linux v1.02.00.016. scsi1 : Found a 3ware Storage Controller at 0x1450, IRQ: 11, P-chip: 1.3 scsi1 : 3ware Storage Controller Vendor: 3ware Model: 3w-xxxx Rev: 1.0 Type: Direct-Access ANSI SCSI revision: 00 st: Version 20020205, bufsize 32768, wrt 30720, max init. bufs 4, s/g segs 16 Attached scsi tape st0 at scsi0, channel 0, id 2, lun 0 Attached scsi disk sda at scsi1, channel 0, id 4, lun 0 SCSI device sda: 1120591360 512-byte hdwr sectors (-525768 MB) sda: sda1 usb.c: registered new driver usbdevfs usb.c: registered new driver hub uhci.c: USB Universal Host Controller Interface driver v1.1 usb-ohci.c: USB OHCI at membase 0xc00dc000, IRQ 11 usb-ohci.c: usb-00:07.4, Advanced Micro Devices [AMD] AMD-765 [Viper] USB usb.c: new USB bus registered, assigned bus number 1 hub.c: USB hub found hub.c: 4 ports detected usb-ohci.c: USB OHCI at membase 0xf882f000, IRQ 5 usb-ohci.c: usb-03:08.0, NEC Corporation USB usb.c: new USB bus registered, assigned bus number 2 hub.c: USB hub found hub.c: 3 ports detected usb-ohci.c: USB OHCI at membase 0xf8831000, IRQ 10 usb-ohci.c: usb-03:08.1, NEC Corporation USB (#2) usb.c: new USB bus registered, assigned bus number 3 hub.c: USB hub found hub.c: 2 ports detected Initializing USB Mass Storage driver... usb.c: registered new driver usb-storage USB Mass Storage support registered. NET4: Linux TCP/IP 1.0 for NET4.0 IP Protocols: ICMP, UDP, TCP, IGMP IP: routing cache hash table of 8192 buckets, 64Kbytes TCP: Hash tables configured (established 262144 bind 65536) NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. VFS: Mounted root (ext2 filesystem) readonly. Freeing unused kernel memory: 232k freed Adding Swap: 498004k swap-space (priority -1) _____________________________________________________
Nasty SCSI Error: Dec 10 15:00:55 pinky kernel: scsi0:0:6:0: Attempting to queue an ABORT message Dec 10 15:00:55 pinky kernel: scsi0: Dumping Card State in Command phase, at SEQADDR 0x168 Dec 10 15:00:55 pinky kernel: ACCUM = 0x80, SINDEX = 0xa0, DINDEX = 0xe4, ARG_2 = 0x0 Dec 10 15:00:55 pinky kernel: HCNT = 0x0 Dec 10 15:00:55 pinky kernel: SCSISEQ = 0x12, SBLKCTL = 0xa Dec 10 15:00:55 pinky kernel: DFCNTRL = 0x4, DFSTATUS = 0x89 Dec 10 15:00:55 pinky kernel: LASTPHASE = 0x80, SCSISIGI = 0x84, SXFRCTL0 = 0x88 Dec 10 15:00:55 pinky kernel: SSTAT0 = 0x7, SSTAT1 = 0x0 Dec 10 15:00:55 pinky kernel: SCSIPHASE = 0x0 Dec 10 15:00:55 pinky kernel: STACK == 0x175, 0x160, 0x0, 0x34 Dec 10 15:00:55 pinky kernel: SCB count = 4 Dec 10 15:00:55 pinky kernel: Kernel NEXTQSCB = 3 Dec 10 15:00:55 pinky kernel: Card NEXTQSCB = 3 Dec 10 15:00:55 pinky kernel: QINFIFO entries: Dec 10 15:00:55 pinky kernel: Waiting Queue entries: Dec 10 15:00:55 pinky kernel: Disconnected Queue entries: Dec 10 15:00:55 pinky kernel: QOUTFIFO entries: Dec 10 15:00:55 pinky kernel: Sequencer Free SCB List: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Dec 10 15:00:55 pinky kernel: Pending list: 2 Dec 10 15:00:55 pinky kernel: Kernel Free SCB list: 1 0 Dec 10 15:00:55 pinky kernel: Untagged Q(6): 2 Dec 10 15:00:55 pinky kernel: DevQ(0:6:0): 0 waiting Dec 10 15:00:55 pinky kernel: scsi0:0:6:0: Device is active, asserting ATN Dec 10 15:00:55 pinky kernel: Recovery code sleeping Dec 10 15:01:00 pinky kernel: Recovery code awake Dec 10 15:01:00 pinky kernel: Timer Expired Dec 10 15:01:00 pinky kernel: aic7xxx_abort returns 0x2003
I got this before I had the new SCSI cable and terminator (I was using the built in termination on the drive).
Shawn ... some thoughts ... look at your device configuration ... where does nst0 point to, the device/channel configuration and the error message don't seem to coincide ... at least on a cursory look -- A vs 0 ... are you sure you're specifying the correct device for the channel that it's on?? Or failing that, after reviewing some of the kernel, there does appear to be some significant changes in the aix7xxx driver family in the past few (and recent) develpment kernels ...
it does appear to be some question on the ability to recognize the driver and or issue it buffers and SCB's ...
My $0.02 worth ... :-)
Dan.
swallbri@mail.synack-hosting.com wrote:
pinky:/# tar -b 1024 -cvf /dev/nst0 /Rogue/Renders/CorePost/CO-202/ > tape0001_CO-202.log tar: Removing leading `/' from member names tar: /dev/nst0: Wrote only 65536 of 524288 bytes tar: Error is not recoverable: exiting now
what is the device specifics for /dev/nst0 (ie: major/minor)
SCSI subsystem driver Revision: 1.00 scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.4 <Adaptec 29160 Ultra160 SCSI adapter> aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs
Vendor: SONY Model: SDX-700C Rev: 0101 Type: Sequential-Access ANSI SCSI revision: 02 (scsi0:A:2): 80.000MB/s transfers (40.000MHz, offset 127, 16bit)
st: Version 20020205, bufsize 32768, wrt 30720, max init. bufs 4, s/g segs 16 Attached scsi tape st0 at scsi0, channel 0, id 2, lun 0
Nasty SCSI Error: Dec 10 15:00:55 pinky kernel: scsi0:0:6:0: Attempting to queue an ABORT message ... Dec 10 15:00:55 pinky kernel: DevQ(0:6:0): 0 waiting Dec 10 15:00:55 pinky kernel: scsi0:0:6:0: Device is active, asserting ATN Dec 10 15:00:55 pinky kernel: Recovery code sleeping Dec 10 15:01:00 pinky kernel: Recovery code awake Dec 10 15:01:00 pinky kernel: Timer Expired Dec 10 15:01:00 pinky kernel: aic7xxx_abort returns 0x2003