Thanks, Trevor, for your useful comments. As a result, I've spent some time in the PCRE regex documentation, and have discovered just how feeble the regex implementation is in my Vedit (no, not vi!) text editor. Even tonight, I've run into more problems.
Other than the lousy regex implementation, though, Vedit has served me well continuously since 1982 (with a large number of upgrades of course).
Hartmut W Sager - Tel +1-204-339-8331
On Sun, 5 Jan 2020 at 04:10, Trevor Cordes trevor@tecnopolis.ca wrote:
On 2020-01-04 Hartmut W Sager wrote:
It turns out, at least in this regex implementation, that a pair of enclosing parentheses can only serve one of two purposes, not both, at the same time. Those two purposes are:
- Mark a group that can then be referred to by a variable like "\3"
in the replacement string. 2. Enclose a group with alternation (regex terminology) containing several alternatives separated by the "or" operator "|".
That's just plain evil. Nasty!
The de facto standard is (obviously) PCRE and your program (you said vi?) is obviously not PCRE. I'd be shocked if vi doesn't offer you some way to replace the regex engine? Or at least out-source the regex work to a filter? Not sure, I don't use vi.
In PCRE each () serves both purposes, unless you use (?:) in which case you only get purpose #2 (and save CPU cycles).
The others are correct, using \s in the right hand side is not PCRE. In PCRE \s means "(most) any whitespace" in the regex, and will be just "s" in the substitution.
PCRE = One Ring^H^H^H^HRegex to rule them all. Most programs with regex use the PCRE library now, or give the option, and if you always use -P with grep you'll basically never have to touch another substandard regex engine again! :-) All the perl-haters might find it amusing that they use "perl" on a daily basis because of PCRE :-) (Well, sort of.)
I am a bit suspicious of the ([0-9][0-9]|\s[0-9]) group re operator precedence of the "or"
In most (all?) regex engines (especially PCRE; but pretty sure all!) the rule is "first, most". So the order you put your alternates may matter. In the above case, order probably doesn't matter because things surrounding that bit must be space/comma. Order matters in things where surrounding bits can match the same bits, and things like eating escaped chars, like escaped double-quotes in CSVs: /"(\"|[^"])+"/ works, but /"([^"]|\")+"/ doesn't.
As always, the O'Reilly regex book is an amazing way to fully understand exactly what is going on and will really open a lot of eyes!! _______________________________________________ Roundtable mailing list Roundtable@muug.ca https://muug.ca/mailman/listinfo/roundtable