This really isn't a Unix question directly so I apologize for that. However, I have a feeling that others here might have done something like this.
I have a client who would like their data converted from Access to mySQL. I'm using PHP to do the import because it plays so nice with mySQL and I'm making some changes to the data while it is being imported. So far so good, we are 90% there.
The problem is, Access does not export to a valid CSV format. Specifically, if there are any text or memo fields in Access that happen to contain CR/LF Access does not strip them or escape them in any way. The results in a CSV file with end of line markers in the middle of data fields.
At this point nothing I've found will import these files. Excel, OpenOffice, even Access itself all treat the CR/LF as an end of record marker even if it is enclosed in quotes.
I'm not doing the data exporting from Access and I'm not that familiar with Access. Is there an option to strip CR/LF during export?
Any other suggestions on how to clean this data?
Thanks
You might want to look at the mdbtools package that I wrote about in MUUGlines a few months ago, it is a Unix application that can read from Access databases directly. You might be better off exporting the data with that rather than Access directly. I used it to import a table with 3/4 million records into mysql (I took the lazy way out and just used some scripts to alter the text file into a bunch of sql commands).
shawn
On 12-Jul-04, at 10:59 PM, John Lange wrote:
This really isn't a Unix question directly so I apologize for that. However, I have a feeling that others here might have done something like this.
I have a client who would like their data converted from Access to mySQL. I'm using PHP to do the import because it plays so nice with mySQL and I'm making some changes to the data while it is being imported. So far so good, we are 90% there.
The problem is, Access does not export to a valid CSV format. Specifically, if there are any text or memo fields in Access that happen to contain CR/LF Access does not strip them or escape them in any way. The results in a CSV file with end of line markers in the middle of data fields.
At this point nothing I've found will import these files. Excel, OpenOffice, even Access itself all treat the CR/LF as an end of record marker even if it is enclosed in quotes.
I'm not doing the data exporting from Access and I'm not that familiar with Access. Is there an option to strip CR/LF during export?
Any other suggestions on how to clean this data?
Thanks
-- John Lange
Roundtable mailing list Roundtable@muug.mb.ca http://www.muug.mb.ca/mailman/listinfo/roundtable
On Monday 12 July 2004 22:59, John Lange wrote:
if there are any text or memo fields in Access that happen to contain CR/LF Access does not strip them or escape them in any way. The results in a CSV file with end of line markers in the middle of data fields.
IIRC, Access always wraps double quotes around text/memo fields, or at least can be told to do so. If so, a CR/LF will give you a line with an odd number of double quotes. A little awk/perl/sed script could look for odd lines and paste the next line onto it...
Yup "a little awk/perl/sed" is a solution.
Unfortunately I'm no regexp-ert so its many hours for me to perfect something like that.
If there is an odd number of quotes, read the next line, paste it to the first and repeat until the line has an even number of quotes and then write it to a new file.
That does sound feasible.
I had been writing some perl but I hadn't thought of looking for an even number of quotes so perhaps that will simplify things. I will try that next.
The Access developer now says that he can write a function to substitute all cr/lf combinations with <br> which will solve the problem. If that fails I will fall back on the perl.
By the way, the mdbtools would have been the perfect solution but it didn't end up working because this is an Access2 database which isn't supported with mdbtools.
Thanks guys.
John
On Tue, 2004-07-13 at 08:56, Glen Ditchfield wrote:
On Monday 12 July 2004 22:59, John Lange wrote:
if there are any text or memo fields in Access that happen to contain CR/LF Access does not strip them or escape them in any way. The results in a CSV file with end of line markers in the middle of data fields.
IIRC, Access always wraps double quotes around text/memo fields, or at least can be told to do so. If so, a CR/LF will give you a line with an odd number of double quotes. A little awk/perl/sed script could look for odd lines and paste the next line onto it... _______________________________________________ Roundtable mailing list Roundtable@muug.mb.ca http://www.muug.mb.ca/mailman/listinfo/roundtable