This page provides an overview CSV (Comma Separated Values) format for data.
CSV is a very old, very simple and very common “standard” for (tabular) data. We say “standard” in quotes because there was never a formal standard for CSV, though in 2005 someone did put together a RFC for it.
CSV is supported by a huge number of tools from spreadsheets like Excel, OpenOffice and Google Docs to complex databases to almost all programming languages. As such it is probably the most widely supported structured data format in the world.
Key points are:
If you open up a CSV file in a text editor it would look something like:
A,B,C 1,2,3 4,"5,3",6
Here there are 3 rows each of 3 columns. Notice how the second column in the last line is “quoted” because the content of that value actually contains a “,” character. Without the quotes this character would be interpreted as a column separator. To avoid this confusion we put quotes around the whole value. The result is that we have 3 rows each of 3 columns (Note a CSV file does not have to have the same number of columns in each row).
As mentioned above, CSV files can have quite a bit of variation in structure. Key options are:
You can read more in the CSV Dialect Description Format which all defines a small JSON-oriented structure for specifying what options a CSV uses.
Specifications and overviews:
The great thing about CSV is the huge level of tool support. The following is not intended to be comprehensive but is more at the electic end of the spectrum.
All spreadsheet programmes including Excel, OpenOffice, Google Docs Spreadsheets supporting opening, editing and saving CSVs.
You can view a CSV file (saving you the hassle of downloading it and opening it). Options include:
You can use datapipes: http://datapipes.okfnlabs.org/csv/html
Just paste your CSV file and away you go.
Install this Chrome Browser Extension. This can be used both for online files and for files on your local disk (if you open them with your browser!)
This is heavily biased towards python!
Nothing in standard lib yet and best option seems to be:
Get git to handle CSV diffs in a sensible way (very useful if you are using git or another version control system to store data).
Make these changes to config files:
# ~/.config/git/attributes *.csv diff=csv # ~/.gitconfig [diff "csv"] wordRegex = [^,\n]+[,\n]|[,]
git diff --word-diff # make it even nicer git diff --word-diff --color-words