Date: Mon, 23 May 2022 00:31:56 -0500 Device: /dev/sdg [SAT], 2 Currently unreadable (pending) sectors Hitachi HDS723020BLA642, S/N:MN1220F317TJTD, WWN:5-000cca-369d1a23a, FW:MN6OA580, 2.00 TB
Date: Mon, 23 May 2022 00:23:32 -0500 Device: /dev/sdh [SAT], 2 Currently unreadable (pending) sectors Hitachi HDS723020BLA642, S/N:MN1220F317TJTD, WWN:5-000cca-369d1a23a, FW:MN6OA580, 2.00 TB
No reboot in between... Note the SNs are identical. That's impossible. The drive models are probably truly the same, but the SNs certainly are not. I think I found a bug in smart, unless someone can think of a reason why I'd be seeing this.
The problem is, I just relied on a smart email like this to decide which drive to pull and replace. Now I'm not so sure the smart reporting is telling me anything correct.
This is on a fairly new Fedora and smartmontools-7.2-11
From today forward I would recommend double-checking smart reports
before acting on them. (smartctl -a, hdparm -i, etc)
One scenario where you can see this is e.g. on the muug.ca server, where the drives are multipathed - i.e. two physical SAS channels reaching each drive. Linux handles this by having two sdX nodes, then multipathd creates a single /dev/mapper/XXX device for you to use.
On a non-multipath box, this could happen if the drive went offline and then recovered. I've seen it happen, but I don't know how to reproduce it.
My guess is it's the same drive, and the kernel decided it needed a new device name for some reason. "dmesg|grep sd[gh]" might show you something useful?
-Adam
-----Original Message----- From: Roundtable roundtable-bounces@muug.ca On Behalf Of Trevor Cordes Sent: Monday, May 23, 2022 1:43 PM To: MUUG RndTbl roundtable@muug.ca Subject: [RndTbl] Bug in smart reporting?
Date: Mon, 23 May 2022 00:31:56 -0500 Device: /dev/sdg [SAT], 2 Currently unreadable (pending) sectors Hitachi HDS723020BLA642, S/N:MN1220F317TJTD, WWN:5-000cca-369d1a23a, FW:MN6OA580, 2.00 TB
Date: Mon, 23 May 2022 00:23:32 -0500 Device: /dev/sdh [SAT], 2 Currently unreadable (pending) sectors Hitachi HDS723020BLA642, S/N:MN1220F317TJTD, WWN:5-000cca-369d1a23a, FW:MN6OA580, 2.00 TB
No reboot in between... Note the SNs are identical. That's impossible. The drive models are probably truly the same, but the SNs certainly are not. I think I found a bug in smart, unless someone can think of a reason why I'd be seeing this.
The problem is, I just relied on a smart email like this to decide which drive to pull and replace. Now I'm not so sure the smart reporting is telling me anything correct.
This is on a fairly new Fedora and smartmontools-7.2-11
From today forward I would recommend double-checking smart reports
before acting on them. (smartctl -a, hdparm -i, etc)
_______________________________________________ Roundtable mailing list Roundtable@muug.ca https://muug.ca/mailman/listinfo/roundtable
On 2022-05-23 Adam Thompson wrote:
It is the old muug server, but I don't see anything in mapper and I don't think it's multipath'd(?). Each drive gets its own sata cable direct to the board.
Almost certainly not the case in this instance. The drives are very stable, with just these semi-bad smart errors happening off and on for months. The array never went degraded nor resynced. I get panic phone alerts if that happens. :-)
I'll try that next time it happens, as I've since rebooted and /v/l/messages doesn't seem to be doing all the kernel logs on this box for some reason (even with trying to defeat all the journald stuff).
I'm sure I won't have to wait long... I'm just miffed I may have replaced the wrong drive in my RAID6 last night... but the resync was 100% ok, so no lasting harm done.
Yeah, systemd really messes up logging. You just have to rely on "journalctl" instead of /var/log/messages, at some point, no matter what you've done to make it look like the old way. ☹
...or run Devuan, I suppose? I think you can also still build Gentoo without systemd, and there's always *BSD. OpenBSD has a partial systemd-compatibility layer now, not sure about the others, but they all still use honest-to-god dmesg & syslog. Actually, since *BSD all implement SMART slightly differently (and all VERY differently from Linux), you could make a bootable OpenBSD USB stick and use its SMART utilities to cross-check what Linux's smartctl says if you wanted?
I think systemd separates kernel stuff into /var/log/dmesg.log, at least on the system I'm looking at right now. Fedora could be different. And you've customized things anyway, so YMMV here.
Good luck, anyway.
-Adam
-----Original Message----- From: Trevor Cordes trevor@tecnopolis.ca Sent: Monday, May 23, 2022 2:29 PM To: Adam Thompson athompso@athompso.net Cc: Continuation of Round Table discussion roundtable@muug.ca Subject: Re: [RndTbl] Bug in smart reporting?
On 2022-05-23 Adam Thompson wrote:
It is the old muug server, but I don't see anything in mapper and I don't think it's multipath'd(?). Each drive gets its own sata cable direct to the board.
Almost certainly not the case in this instance. The drives are very stable, with just these semi-bad smart errors happening off and on for months. The array never went degraded nor resynced. I get panic phone alerts if that happens. :-)
I'll try that next time it happens, as I've since rebooted and /v/l/messages doesn't seem to be doing all the kernel logs on this box for some reason (even with trying to defeat all the journald stuff).
I'm sure I won't have to wait long... I'm just miffed I may have replaced the wrong drive in my RAID6 last night... but the resync was 100% ok, so no lasting harm done.