Adam thought someone might find this useful, so here it is. It's a perl program I wrote to "pretty" html for easy readability/debugging by applying indenting. The neat thing is, it's entirely contained in 1 regular expression; no loops! Well, except for the weird nl while loop. This ain't your father's regex!
Yes, there's a zillion html pretty programs out there but none did what I wanted in a few ways:
1. Just fire & forget, no 200 options to worry about.
2. Works on random html snippets not just whole pages, so you can view source from the web and just paste a few lines from the middle of any web page and it will pretty it up.
3. Challenge to do something like this only using regex!
There may be a few tags it doesn't catch yet (just add them to the main tag list), but that won't hurt the output very much.
Run like: html-pretty < html-file | less
html-pretty ======================= #!/usr/bin/perl -w # # Copyright 2010 Trevor E Cordes, Tecnopolis Enterprises # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see http://www.gnu.org/licenses/.
$l=-1; # indentation level
$s=join('',<>);
$s=~s#>\s+#>#g; # rm ws after tags $s=~s#\s+<#<#g; # rm ws before tags while ($s=~s#<([^>]*)\n([^>]*)>#<$1$2>#g) { 1; }; # rm nl in tags
$s=~s#>#>\n#g; # put newlines after all tags $s=~s!(?:<(/)?(?:(div|span|head|b|a|i|u|ul|ol|li|tr|td|th|form|table|p|style|script|body|html|head|title)\b)?([^>]*>\n)|([^<]+))! $4 ? # non-tag text ( (' 'x($l+1)).$4."\n" ) : ( ( $2 ? # a recognized triggers-indent tag ( $1 ? # tag starts with /, decrease indent ( $l--,($l<-1 and $l=-1) , ((' 'x($l+1)).'<'.(defined($1)?$1:'').$2.$3) ) : # tag is opening tag, increase indent ( $l++ , ((' 'x$l).'<'.(defined($1)?$1:'').$2.$3) ) ) : # a non-triggers-indent tag ( (' 'x($l+1)).'<'.(defined($1)?$1:'').$3 ) ) ) !gemx; # indent
print $s;