Recently I had to do my taxes and having an online store, I had to go through 9000 transactions in Paypal that I needed categorize. Paypal lets you download your transaction history as a csv (comma-delimited text) file with double quotes (“) as a text qualifier which is a standard format. Unfortunately, you can still run into problems if you have double quotes and commas in a field’s value as follows:
“Confirmed”,”8″”, 10″”, 12″” AR550 Gong Round Steel Shooting Target Set 1/2″” Thick ( AR500)”,”300977807743″,”0.00″,”0.00″,”0.00″,””,””,””,””,”Ebay”, ….
Notice that the 2nd field contains double quotes and commas as follows:
8″”, 10″”, 12″” AR550 Gong Round Steel Shooting Target Set 1/2″” Thick ( AR500)
The original text for this field is
8″, 10″, 12″ AR550 Gong Round Steel Shooting Target Set 1/2″ Thick ( AR500)
where the double quotes represent inches as a unit of measure and the comma indicates that there are multiple sizes (8 inch, 10 inch, and 12 inch). The double quotes were escaped with double quotes to indicated that the double quotes are part of the field’s value. However, a double quote followed by a comma messes everything up because there’s no way to know whether that comma is part of the field value or a field delimiter.
The fact that this field contains double quotes and commas breaks the csv format because it appears now that this row has more columns that those in other rows. This is one of many things that can break a csv file. To fix this and other issues, I have found that it is most easy to use a tool called CSV Easy. It’s fast and shows you clearly which rows need to be fixed. Here’s an example of the problem I faced earlier.
Notice that columns F16 and F17 in row 2714 contain parts of the field value from column F15. To fix this, you can do many things like
- fixing the field by removing the commas in them
- changing the field delimiter to a character that you know isn’t being used in any field values
I’m going to do option 2 because I want to preserve each field value to be exactly as the original. I’m going to use Sublime Text 2 as my text editor because it’s fast and just very good. I’m going to search and replace all instances of “,” with “^” thereby changing the delimiter from a comma to a caret symbol because I know there are no caret symbols used in any of the field values.
Sublime Text does a nice job of highlighting the matched text so you can make sure you’re only going to replace the text you want to replace. I now save the file with a new name and open it in CSV Easy and choose the caret symbol as the delimiter and no text qualifiers.
This change fixed my csv file so now I can import it into a mySQL database which I am going to do using phpMyAdmin. Log into phpMyAdmin and click the Export button in a database you want to import the data into. Browse to the csv file, select “CSV” in the format dropdown, and enter your delimiters and qualifiers (a caret symbol and double quote, in this example).
After clicking the Go button, the csv file gets uploaded and parsed and a table is created.
Having your data in a database is good because you can run queries against it to quickly update field values, which is exactly what I needed to do. Otherwise, I’d have to manually go through each row and update fields one by one or write a function in Excel to do this which is still too time-consuming for my particular needs.
If you want to export your data from mySQL to csv, that’s easy too. Just click that Export button and specify your delimiters and text qualifiers as follows.
Notice that I entered the caret symbol (^) in the “Columns separated with” field and double quotes (“) in the other fields. After clicking the Go button and saving the csv file to disk, I open the file in Excel but Excel doesn’t recognize it as a csv file and so each field isn’t separated by column.
To fix this, select the first column and click the “Text to Columns” button (circled in red) and tell Excel what characters are used in the csv file and enter values as shown below.
I enter a caret symbol in the delimiter field and select double quotes in the text qualifier dropdown. You can preview how the fields are separated by column to make sure your selections are correct.
I then just accept the default data format …
and after clicking the Finish button I get my data nicely formatted.
And there you have it. fixing csv files and import and exporting data to/from mySQL and Excel.