On Sat, 2022-01-08 at 03:34 -0600, Trevor Cordes wrote:
Any PHP coders out there?
Hi, Trevor. PHP coder out here. A subscriber friend of mine pointed me to your message, and was kind enough to send it to me unmodified so as not to break threading (hopefully it works).
So PHP really did it in 8.0: They really made accessing undefined vars *and* array indices a full warning rather than a notice. I had written in past newsletters that I hoped this didn't mean what it sounded like it meant.
But it does.
They really did it.
https://wiki.php.net/rfc/engine_warnings
I finally had to face the music with the upgraded MUUG server which now uses PHP8. It was coming for me in Fedora 25 anyhow (maybe 4 more months to postpone that change). The MUUG server is about 80% code that Allan Pollard wrote (thanks Allan!) and 20% my own code. Neither wrote according to PHP8's magic new requirements.
Anyone here who does PHP and hasn't treated it as a typed / declared language, because it is not one, is in for a great deal of pain when they are forced to upgrade to PHP 8. I'm not sure I've ever come across existing PHP code where the dev treated it as a language where you must pre-declare everything. Of course, such people must exist, as exhibited by the voting record for this PHP RFC. Perhaps they are people who only recently took up PHP and/or are coming from stricter languages.
I sympathize with your plight as you're dealing with an older code- base, but I sympathize only so much. Notices are not and have never been spurious messages which are safe to ignore: they are indications of possible logic errors, and a well-disciplined PHP programmer ought to be using error_reporting(\E_ALL) when testing and addressing each and every notice (and deprecation warning!) they encounter. Moreover, recommended configuration today is to print no error messages whatsoever (they should all go to a log file) when deploying to the public, so the change really shouldn't have any effect on a host which is properly configured; you'll just have more warning to look at in logs instead of notices.
In my view it was a mistake for PHP deployments in the PHP 4 shared- host days to typically suppress notices; you had to go out of your way to even know whether you were writing robust code, and it's given the impression that a lot of coding behaviour which has always been considered suspect is actually perfectly fine.
And you just know that while, at the moment, this change is limited to making it a warning, the tyrants are itching to make this a full explosion error. You can see it in the voting ("Change undefined variable severity to?": vote 36 to 28). Luckily for us PHP requires a 2/3 majority to pass an RFC.
I disagree with your characterization of the 36 who voted in favour of an error exception as tyrants, though I do agree it would have been a step too far. The elimination of simple-majority votes on RFCs is, funnily enough, perhaps one of the best changes to PHP in recent memory: it holds its stewards to a higher standard of feature design, which means we've tended to get better features since.
I should point out (since I don't believe you did) that the RFC had separate votes for "undefined variable" and "undefined array index", and throwing an exception was never a voting option for the latter. Thus I suspect it was recognized that the former would be easier to adapt to, and the latter impractical. It was presumably also a recognition that a missing array index is more likely to be a data quality problem, whereas reading an undefined variable is (almost) always going to be a logic error on the part of the programmer.
To "fix" these errors in code, you can in most cases use a lot of tricks, like massive use of ?? everywhere. But that makes the code look really ugly and I start to question why? And if you use list() to capture results of a function which can return variable type of variable length arrays, you're really in trouble as there's no current syntax to combine ?? or isset() with that.
Your example with list() is a good one. At its core this is a run-of- the-mill input validation concern, though. Example:
[jking@odin ~]$ php -a Interactive shell
php > error_reporting(\E_ALL); php > $v = ["ook", "eek"]; php > [$ook, $eek, $ack] = $v; PHP Warning: Undefined array key 2 in php shell code on line 1 php > [$ook, $eek, $ack] = array_pad($v, 3, null); php >
That notices have historically been hidden does not excuse you from checking your inputs and/or correcting them where needed.
Sure, you can go in and turn your PHP into C by pre-declaring every variable, if you can spot them all in complex functions or global scope code. But that only helps you with the "undefined variables" thing, not the "undefined array index" thing. The latter is far more insidious because your index can be a variable itself! So for that case you must always use isset() or ?? or some other trick.
I could go into more examples of what I had to do to the MUUG code if anyone is interested.
I'll grant you that dealing with undefined array indices used to be kind of awkward, such that you had to pepper your code with a lot of isset() or array_key_exists() (depending on whether null is an expected value for an array member), but that's nevertheless what you had to do to have robust logic. The ?? operator is there to make correct code more concise, not just to make PHP shut up, and I was very happy about its introduction since I could eliminate a lot of wordy branches.
No tyrrany. Freedom and choice. Perl, like PHP has never been strictly typed or declared. Perl will never force you to use strict. Never. So why is PHP?
In my opinion you proceed from a false assumption. As I stated above, accessing an undefined variable has always been an error (at least since PHP 4, if not earlier), albeit a putatively mild one. It is now merely less-mild.
PHP is not strictly typed (though it does have a strict-type mode since PHP 7.0), but it has always preferred that one declares one's variables, and that preference has now been made stronger with PHP 8.0.
I have some projects with over 100k lines of PHP code. The simple fact that "fixing" this will make my code look extremely ugly, harder to read, less maintainable, and probably more prone to bugs is all the reason I need to decide that I will never "fix" my code. Not to mention it will be probably 20k lines of code that need to be "fixed", and it's a pointless pursuit because in the end I'll never find them all. (grep -P '[' ... ya right.)
So it will be log & ignore, or not log at all (though I still will need to see legit warnings somehow); or git clone php and change 2 lines of code and maintain my own version forever. In fact, it may be time to do a fork of PHP: sane-php or something.
Again, I sympathize since you're dealing with an old code-base. I nevertheless believe you've been doing things wrong by treating notices as no big deal, and you're probably setting yourself up for more tears by not adapting now.
Lastly, I'm willing to debate the philosophical points, especially using the RFC's wording as a basis; however, I'm not sure anyone is interested in taking the other side (please, hardcore PHP programmers only).
I present as my bona fides the following:
- A news aggregator which implements four different client-server sync protocols: https://thearsse.com/ - An HTML parser which actually conforms to the specification: https://packagist.org/packages/mensbeam/html-parser - A character decoder which decodes bytes in various encodings into code points or UTF-8 characters: https://packagist.org/packages/mensbeam/intl - A Content-Type header-field parser: https://packagist.org/packages/mensbeam/mimesniff
The latter two support the HTML parser, and the HTML parser will eventually support the news aggregator.
It may interest you to know that the character decoder does blindly access beyond the end of a string (itself a warning since time immemorial), but suppresses the error at the call site (see <https://github.com/mensbeam/intl/blob/07d26e3f45c3a3167eb6389572419d3bda7ff5...
). This is something I should probably change since it can be a
problem if a user of the library implements a custom error handler.
Further, the tokenizer of the HTML parser is one giant loop (see <https://github.com/mensbeam/HTML-Parser/blob/37f0fa8647ead67e5e9f356efe91df8...
), and not-yet-emitted tokens can be created in one iteration of the
loop and manipulated in others. In the tokenizer an access of an uninitialized variable is unquestionably an error: a branch which manipulates a token should never have been encountered before the branch which initializes the token. Not having an indication of this (be it a notice or warning) would have made debugging harder, not easier.
The news aggregator has required me to write tons of code to sanitize input, including a function to give me more reasonable type juggling (see <https://github.com/mensbeam/arsse/blob/b6605080096918c8d6f8378aa6c31db0dbc2b...
) and a function for each sync protocol which makes sure the input is
never missing any expected associative array keys and always has the correct type (see <https://github.com/mensbeam/arsse/blob/b6605080096918c8d6f8378aa6c31db0dbc2b...
for the Tiny Tiny RSS implementation)
All of this is validated by tons and tons of automated tests which cover every single line of code (unless impractical or impossible to cover with PHPUnit). None of it should ever emit a notice (though one or two have slipped through over the years), and this discipline has unquestionably made my code better.