Hi Dan,
"\s" is a single space, "0" is just "0", and "\1" and "\2" are variables that reference parts/segments of the search string.
Thanks for the rubular tip-off. Being a classic hard-core programmer, I'm not used to those kind of tools, but I might look at rubular. I did figure out the problem, and in my main reply (to myself), you'll see a detailed explanation.
Hartmut W Sager - Tel +1-204-339-8331
On Sat, 4 Jan 2020 at 10:59, Dan Martin dan@martinmedcorp.com wrote:
Hi Hartmut
I am not familiar with your replacement syntax \1\s0\2\s
Rubular shows the groups as: 1 From AncientBBS1 2 Thu 3 Jan and 3 others
and for the truncated expression: 1 Jan 2 2
I find rubular a convenient online tool for checking regex https://rubular.com/
-Dan
On Sat, Jan 4, 2020 at 10:27 AM Hartmut W Sager hwsager@marityme.net wrote:
This might be the wrong time of night for doing regex (i.e., my mistake), or my trusty Vedit text editor has a bug in its regex implementation.
Original search string: ^(From AncientBBS[1-2])\s+(Sun|Mon|Tue|Wed|Thu|Fri|Sat)[\s,]+(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+([0-9][0-9]|\s[0-9])[\s,]+(19[0-9][0-9])[\s,]+([0-9][0-9]:[0-9][0-9]:[0-9][0-9])\s*$ Replacement string: <Nah, skip it>
The above search string gives a syntax error. I am a bit suspicious of the ([0-9][0-9]|\s[0-9]) group re operator precedence of the "or", and proceeded to stepwise simplification to narrow it down. I finally got down to:
Search string: (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+([0-9])[\s,]+ Replacement string: \1\s0\2\s
The new search works fine (as did some of the previous stepwise simplified ones), but the replacements are baffling me. The line From AncientBBS1 Thu Jan 2, 1986 20:50:00 gets changed to From AncientBBS1 Thu 02 1986 20:50:00
I.e., the variable \1 seems to get lost. In my previous stepwise simplified cases, multiple variables got lost when the search worked at all.
Why am I doing this? I need to massage some old BBS messages into the retarded mbox format, whose date format (on the "From " line) of "Tue Nov 05 19:02:00 1985" is particularly illogical. Be that as it may, The two sources of these messages I am processing had further sloppiness in their dates, done by some ancient BBS bozos. I did successfully fix a lot of that already with regex.
Hartmut W Sager - Tel +1-204-339-8331
Roundtable mailing list Roundtable@muug.ca https://muug.ca/mailman/listinfo/roundtable
Roundtable mailing list Roundtable@muug.ca https://muug.ca/mailman/listinfo/roundtable