After tons more experimenting, I figured it out! But I don't know whether it's a bug or a feature in Vedit, or proper regex behaviour (various online regex documentation didn't help at all).
It turns out, at least in this regex implementation, that a pair of enclosing parentheses can only serve one of two purposes, not both, at the same time. Those two purposes are:
1. Mark a group that can then be referred to by a variable like "\3" in the replacement string. 2. Enclose a group with alternation (regex terminology) containing several alternatives separated by the "or" operator "|".
Furthermore, at least in this regex implementation, even the type-2 usage (above) increments the "\nnn" counter for variables that can be used in the replacement string, even though the matching "\nnn" variable cannot actually be used in the replacement string!
The solution I figured out (and tested - it works): Enclose the search segment in double (nested) parentheses "((" and "))", and the outer parentheses are then a type-1 usage which can be referenced in the replacement string. But you have to make sure you use the correct "\nnn" variable by numbering the opening parentheses "(" strictly from left to right (which is normal in regex). This unfortunately exhausts the 9 variables "\1" thru "\9" more rapidly.
E.g. Search string: abc((def|ghi))jkl\s(mn[0-9])op((qrs|tuv))xy([0-9])z Replacement string: Can use variables \1, \3, \4, \6, but not \2, \5.
Hartmut W Sager - Tel +1-204-339-8331
On Sat, 4 Jan 2020 at 05:00, Hartmut W Sager hwsager@marityme.net wrote:
This might be the wrong time of night for doing regex (i.e., my mistake), or my trusty Vedit text editor has a bug in its regex implementation.
Original search string: ^(From AncientBBS[1-2])\s+(Sun|Mon|Tue|Wed|Thu|Fri|Sat)[\s,]+(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+([0-9][0-9]|\s[0-9])[\s,]+(19[0-9][0-9])[\s,]+([0-9][0-9]:[0-9][0-9]:[0-9][0-9])\s*$ Replacement string: <Nah, skip it>
The above search string gives a syntax error. I am a bit suspicious of the ([0-9][0-9]|\s[0-9]) group re operator precedence of the "or", and proceeded to stepwise simplification to narrow it down. I finally got down to:
Search string: (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+([0-9])[\s,]+ Replacement string: \1\s0\2\s
The new search works fine (as did some of the previous stepwise simplified ones), but the replacements are baffling me. The line From AncientBBS1 Thu Jan 2, 1986 20:50:00 gets changed to From AncientBBS1 Thu 02 1986 20:50:00
I.e., the variable \1 seems to get lost. In my previous stepwise simplified cases, multiple variables got lost when the search worked at all.
Why am I doing this? I need to massage some old BBS messages into the retarded mbox format, whose date format (on the "From " line) of "Tue Nov 05 19:02:00 1985" is particularly illogical. Be that as it may, The two sources of these messages I am processing had further sloppiness in their dates, done by some ancient BBS bozos. I did successfully fix a lot of that already with regex.
Hartmut W Sager - Tel +1-204-339-8331