File this under wacky !#%* bug that takes 1 hour to solve and you're #*%@# lucky you solved it...
I have some javascript that takes a textarea and turns it into arrays of GSM-7 chars for SMS texting purposes. Based on the nice (FLOSS) online SMS calculator: https://twiliodeved.github.io/message-segment-calculator/
If I input a space (unicode hex code 0x20, same as ascii) in the above page, it'd show up as one GSM-7 codepoint 0x0020. But if I input it in my version of a similar calculator it would always come out as 2 spaces! 0x0020 0x0020
Was I inputting a NBS by accident (which "smart" encoding changes to 2 spaces)? No. Was the js library at fault? Debugging an hour later, probably not.
While watching the js debug box in FF I noticed it moaning every page load about the page being loaded in "quirks mode" because of my DOCTYPE first line of all my HTML. With nothing else left to blame it on, I went and changed my DOCTYPE (which is from the first days of my program: 2010ish) to what FF suggests when you click on the quirk (basically eliminating the "transitional" stuff).
Boom. Bug goes away. One space is now one space. Doh.
Now my curiosity is wondering why a quirk mode needs to double all the spaces to maintain compatibility with something somewhere... That's probably a long story by itself. I'm lucky I even solved this one, as that seemed an unlikely culprit as I've literally never had any issue using my ancient DOCTYPE and IIABDFI. On the downside, I probably just broke my program for all ancient-browser users. Oh well.
P.S. GSM-7 char encoding is really really @#*%&ed up. Whoever came up with that travesty should be flogged.
I may have spoken too soon. Thinking I won, I removed all my debugging code and the bug came back. Even with the updated DOCTYPE. Doh.
So egg on my face, but for posterity I thought I'd post an update so someone doesn't curse this non-fix 5 years down the road. It appeared the bug was solved because my test input must have had an emoji or something in it. The bug doesn't appear when you have un-smart-able (long story) unicode. At least that's my only guess.
I solved it (again?) by checking the js library code and it appears they are straight up changing space to two spaces. Uh, ok. That code area is supposed to change non-ascii (i.e. non-0x20) unicode spaces to an ascii space (for a short space) or 2 (or more) ascii spaces (for a long space). But somehow it got a 0x20 in the rule. Or at least FF ^F search " " matched the character? I changed the 0x20 in the rule to 0xa0 and now everything is fixed. And why a single 0xa0 should ever be turned to 2 spaces in the first place is beyond me.
Looks like something messed up the js source and changed a 0xa0 to a 0x20. This may be FF... maybe when I saved the js source? If I cut a NBSP from a unicode sample web page and paste it into my form textarea, it always seems to turn into a 0x20! If I type it in place with CTRL-SHIFT then it properly shows up as a 0xa0. If I paste it in from a nano editor where I know for sure it's 0xa0 then it also works. I found some ancient bz's about FF doing bad things with NBSP's when c&p'ing... maybe there's still some bugs in there.
Anyhow, it's not a DOCTYPE problem: it's the wrong unicode char in the source file. And since it's just a bloody empty space character you can't really see it when debugging without spitting out hex codes somehow. Fun!
P.S. Quirks mode being off is screwing up some of my tables cosmetically... so I guess it really does something after all...