RHEL9.5 box, fully updated.
On a working system with no relevant (systemd/php/apache) updates in 2 weeks, all of a sudden php-fpm process(es) cannot write to its systemd PrivateTmp (forced by default) /tmp/systemd-private-*-php-fpm.service-*
This caused fatal errors in the lowest level of a php site during simple logging, as tempnam was returning false when it really never should.
I try really hard to work within the systemd paradigm on many things, especially when on paper it sounds helpful, like PrivateTmp. And it worked fine for many years. But now it suddenly does this and screws over a live web site.
So it's PrivateTmp=false for me now! I've written all scripts using proper tmp file naming techniques, like php's tempnam (which uses secure code under the hood, or so I'm told). And there's no local users on the box anyhow to start trying to race symlinks, and no user-supplied input is used to create file names. So the whole idea of PrivateTmp to me is "meh".
But I want to try to understand what happened here. There are a few oddities. systemd seems to make 2 privatetmp dirs in /tmp for php-fpm(?), yet when this bug hit only 1 existed(?). I suppose systemd-tmpfiles-clean or something may have screwed up and removed the dir? Everything is set to defaults for clean. And the dir seems to have existed via my in-script tests (see below).
Most weirdly, the bug only hit one of the php-fpm pools we run! The other main one seemed to be able to write just fine!!?? This is especially confusing since there is only one "php-fpm: master process". There does not seem to be a separate master for each pool, so how could one pool work fine and not the other? There are separate pool worker processes, however. I guess php-fpm could somehow "speak" to systemd to tell it it needs 2 privatetmps, one for each pool? But I don't see how that could happen. Maybe the 2 dirs is a red herring. My personal box doesn't have more than 1 dir for php-fpm, and I have 3 pools.
I added debugs to my scripts while the problem was still happening just to confirm the scripts were seeing the tmp dir properly and it was writable, etc. And it was: drwx as detected by php's is_writable, etc. And the subdirs systemd creates are rwxt (sticky, like normal /tmp). However, I couldn't figure out a way for the script to figure out what its real tmpdir was, as it is tricked by systemd into believe the subdir is just /tmp. However, the script saw a dir and thought it was writable and showed me its inode data with ls -l.
Since I cannot reproduce the bug easily now that I've restarted php-fpm, and now that I've disabled PrivateTmp because I need 100% trustworthy tools on a production box, I will probably never know what really happened. That is why I plead for MUUGers to pick their brains trying to figure out how a script can see a dir inode and detect it's drwx by the current user and yet any write to it fails.... I know systemd is doing crazy wacky new-fangled mount or namespaces or cgroup or whatever tricks to make PrivateTmp work at all. (And it's slightly evil that there can be a dir sitting right there I can ls at the command line and all perms show I can write to it and yet even root cannot!) _______________________________________________ Roundtable mailing list -- roundtable@muug.ca To unsubscribe send an email to roundtable-leave@muug.ca
Followup to this. It happened again, but while I was asleep this time. Had to just restart php-fpm in a panic, so still couldn't debug.
Something happens at 3am once in a blue moon that is triggering this.
I'm convinced it's systemd-tmpfiles-clean.timer now because before I rebooted I looked at /tmp and there was maybe 2-3 systemd "systemd-private-*" dirs... and after I rebooted there are 20.
So it looks like the default systemd cleaner is mental. It was never a problem before. Some update must have broken it. I had not configured a single thing about it: it was running on full defaults.
From what I gather it's only supposed to clean systemd-private dirs
that are stale from dead ps's. But clearly that it's hosing whole whacks of them. I'm shocked other daemons haven't been blowing up!
So I did what I did on my Fedora boxes ages ago: systemctl mask systemd-tmpfiles-clean.timer systemctl mask systemd-tmpfiles-clean.service
Note, you can't use disable, it must be mask. I guess there's a reason I learned to mask these on Fedora...
Have I mentioned I hate systemd? This is a huge, expensive, for-pay legit RHEL9 production box and I'm having to fight with this stuff?
P.S. I'm a bit puzzled why the cleaner dir hose affected php-fpm though since I had done PrivateTmp=false after the last time. _______________________________________________ Roundtable mailing list -- roundtable@muug.ca To unsubscribe send an email to roundtable-leave@muug.ca