A CSV-iterator tool for Mac OS X
This is a tool that I use to iterate through the rows of a CSV file.
Requirements
- Mac OS X (tested on Ventura 13.0.1)
- OCaml (tested on version 4.13.1)
- Apple Numbers
Assumptions
I assume you have a CSV file generated from an Apple Numbers spreadsheet. I mention Apple Numbers because CSV is a surprisingly unstandardised format, and my tool is quite specialised for the dialect of CSV that Apple Numbers exports. Other CSV files might or might not work. In particular:
-
values are comma-separated (as you would expect),
-
values that do (or could) contain commas or newlines are wrapped in double-quotes,
-
but empty values are not wrapped in double-quotes,
-
double-quotes that appear inside values are replaced with two consecutive double-quotes (so "this is an ""example"" of a valid value")
Getting started
Run make.
This repo includes a sample CSV file. To use it to see how the tool works, run the following command:
./csv_iterator -csv database.csv -cmd "echo \$firstname got \$percent%."
You can also run make install to copy the executable into ~/bin. Then, if ~/bin is in your $PATH, you can run csv_iterator from any directory.
What the tool does
-
The tool creates a new file called
database.csv.tmpin which""has been globally replaced with”. This makes a Numbers-generated CSV file easier to parse (see note above). -
For each non-header row of
database.csv.tmp, the tool runsfield1=v1 ... fieldN=vN eval 'command', wherefield1, ...,fieldNare the column names of the CSV file andv1, ...,vNare the values taken by those fields in the current row. In other words, the commandcommandis run in a shell where the current row's values have been assigned to environment variables of the same name. -
You can set the
-dryrunflag so that the commands to be run are printed to the terminal but not actually executed. -
If you set the
-onlyfirstrowflag, the tool will stop after the first (non-header) row. This can be useful when testing.
Note:
-
I wrote
\$firstnamerather than$firstnameabove in order to prevent thefirstnamevariable from being expanded when callingcsv_iterator. It should only be expanded when the generated commands are executed. -
Best avoid having backticks in the CSV file, as Bash might see those as commands to be executed.
-
Column names in the CSV file mustn't begin with a digit (because environment variables can't begin with a digit).