[RndTbl] Bug in smart reporting?
Adam Thompson
athompso at athompso.net
Mon May 23 14:43:54 CDT 2022
Yeah, systemd really messes up logging. You just have to rely on "journalctl" instead of /var/log/messages, at some point, no matter what you've done to make it look like the old way. ☹
...or run Devuan, I suppose? I think you can also still build Gentoo without systemd, and there's always *BSD. OpenBSD has a partial systemd-compatibility layer now, not sure about the others, but they all still use honest-to-god dmesg & syslog. Actually, since *BSD all implement SMART slightly differently (and all VERY differently from Linux), you could make a bootable OpenBSD USB stick and use its SMART utilities to cross-check what Linux's smartctl says if you wanted?
I think systemd separates kernel stuff into /var/log/dmesg.log, at least on the system I'm looking at right now. Fedora could be different. And you've customized things anyway, so YMMV here.
Good luck, anyway.
-Adam
-----Original Message-----
From: Trevor Cordes <trevor at tecnopolis.ca>
Sent: Monday, May 23, 2022 2:29 PM
To: Adam Thompson <athompso at athompso.net>
Cc: Continuation of Round Table discussion <roundtable at muug.ca>
Subject: Re: [RndTbl] Bug in smart reporting?
On 2022-05-23 Adam Thompson wrote:
> One scenario where you can see this is e.g. on the muug.ca server,
> where the drives are multipathed - i.e. two physical SAS channels
> reaching each drive. Linux handles this by having two sdX nodes,
> then multipathd creates a single /dev/mapper/XXX device for you to
> use.
It is the old muug server, but I don't see anything in mapper and I
don't think it's multipath'd(?). Each drive gets its own sata cable
direct to the board.
> On a non-multipath box, this could happen if the drive went offline
> and then recovered. I've seen it happen, but I don't know how to
> reproduce it.
Almost certainly not the case in this instance. The drives are very
stable, with just these semi-bad smart errors happening off and on for
months. The array never went degraded nor resynced. I get panic phone
alerts if that happens. :-)
> My guess is it's the same drive, and the kernel decided it needed a
> new device name for some reason. "dmesg|grep sd[gh]" might show you
> something useful?
I'll try that next time it happens, as I've since rebooted and
/v/l/messages doesn't seem to be doing all the kernel logs on this box
for some reason (even with trying to defeat all the journald stuff).
I'm sure I won't have to wait long... I'm just miffed I may have
replaced the wrong drive in my RAID6 last night... but the resync was
100% ok, so no lasting harm done.
More information about the Roundtable
mailing list