DOCTYPE holy smokes - Roundtable

25 Feb 2023


      File this under wacky !#%* bug that takes 1 hour to solve and you're #*%@#
lucky you solved it...
I have some javascript that takes a textarea and turns it into arrays of
GSM-7 chars for SMS texting purposes.  Based on the nice (FLOSS) online
SMS calculator:
https://twiliodeved.github.io/message-segment-calculator/
If I input a space (unicode hex code 0x20, same as ascii) in the above
page, it'd show up as one GSM-7 codepoint 0x0020.  But if I input it in my
version of a similar calculator it would always come out as 2 spaces!
0x0020 0x0020
Was I inputting a NBS by accident (which "smart" encoding changes to 2
spaces)?  No.  Was the js library at fault?  Debugging an hour later,
probably not.
While watching the js debug box in FF I noticed it moaning every page load
about the page being loaded in "quirks mode" because of my DOCTYPE first
line of all my HTML.  With nothing else left to blame it on, I went and
changed my DOCTYPE (which is from the first days of my program: 2010ish)
to what FF suggests when you click on the quirk (basically eliminating
the "transitional" stuff).
Boom.  Bug goes away.  One space is now one space.  Doh.
Now my curiosity is wondering why a quirk mode needs to double all the
spaces to maintain compatibility with something somewhere... That's
probably a long story by itself.  I'm lucky I even solved this one, as
that seemed an unlikely culprit as I've literally never had any issue
using my ancient DOCTYPE and IIABDFI.  On the downside, I probably just
broke my program for all ancient-browser users.  Oh well.
P.S. GSM-7 char encoding is really really @#*%&ed up.  Whoever came up
with that travesty should be flogged.