Hi Mark,
Actually, "\s" is a single space in a replacement string too, like in a search string. Almost all the escaped codings are quite fine in the replacement string too, though not nearly as many are needed there than are needed in the search string.
Thanks for your other thoughts too. I did figure out the problem, and in my main reply (to myself), you'll see a detailed explanation.
Hartmut W Sager - Tel +1-204-339-8331
On Sat, 4 Jan 2020 at 10:58, Mark Campbell nitrodist@gmail.com wrote:
I don't think you can use \s in the replacement regex as it has no special meaning there. In my local testing with perl, it seems to treat it as a literal escape for the letter s. What tool are you using to run the regex?
Substitute in a space, seems to work as expected:
2020-01-04 10:45:30 ~ TOR-M001 %: ccat test | perl -pe 's/(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+([0-9])[\s,]+/\1 0\2 /' From AncientBBS1 Thu Jan 07 1986 20:50:00 2020-01-04 10:45:35 ~ TOR-M001 %: ccat test From AncientBBS1 Thu Jan 7, 1986 20:50:00
What might be easier (and more readable) is if each line has a fixed length from the beginning, you can match perhaps a little more clearly by doing something like s/^(.{23}) (\d),/\1 0\2/ if I'm understanding what you want to do (prepend 0s to dates and remove the comma).
On Sat, Jan 4, 2020 at 10:27 AM Hartmut W Sager hwsager@marityme.net wrote:
This might be the wrong time of night for doing regex (i.e., my mistake), or my trusty Vedit text editor has a bug in its regex implementation.
Original search string: ^(From AncientBBS[1-2])\s+(Sun|Mon|Tue|Wed|Thu|Fri|Sat)[\s,]+(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+([0-9][0-9]|\s[0-9])[\s,]+(19[0-9][0-9])[\s,]+([0-9][0-9]:[0-9][0-9]:[0-9][0-9])\s*$ Replacement string: <Nah, skip it>
The above search string gives a syntax error. I am a bit suspicious of the ([0-9][0-9]|\s[0-9]) group re operator precedence of the "or", and proceeded to stepwise simplification to narrow it down. I finally got down to:
Search string: (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+([0-9])[\s,]+ Replacement string: \1\s0\2\s
The new search works fine (as did some of the previous stepwise simplified ones), but the replacements are baffling me. The line From AncientBBS1 Thu Jan 2, 1986 20:50:00 gets changed to From AncientBBS1 Thu 02 1986 20:50:00
I.e., the variable \1 seems to get lost. In my previous stepwise simplified cases, multiple variables got lost when the search worked at all.
Why am I doing this? I need to massage some old BBS messages into the retarded mbox format, whose date format (on the "From " line) of "Tue Nov 05 19:02:00 1985" is particularly illogical. Be that as it may, The two sources of these messages I am processing had further sloppiness in their dates, done by some ancient BBS bozos. I did successfully fix a lot of that already with regex.
Hartmut W Sager - Tel +1-204-339-8331
Roundtable mailing list Roundtable@muug.ca https://muug.ca/mailman/listinfo/roundtable
Roundtable mailing list Roundtable@muug.ca https://muug.ca/mailman/listinfo/roundtable