If you have a csv dataset like this:
name, age, gender
john, 20, male
jane, 30, female
bob, 25, male
Can you get to this:
[
I had a little play and came up with this. But it may not be the best way, and I'd be interested to see what your attempts were like, because after all if we both came at a solution I'm sure it'd be twice as good!
But I would start from something like:
true as $doHeaders
| . / "\n"
| map(. / ", ")
| (if $doHeaders then .[0] else [range(0; (.[0] | length)) | tostring] end) as $headers
| .[if $doHeaders then 1 else 0 end:][]
| . as $values
| keys
| map({($headers[.]): $values[.]})
Working Example
The variable $doHeaders
controls whether to read the top line as a header line. In your case you want it as true, but I added it for future SO users and because, well, I had an excellent breakfast today and the weather is lovely, so why not?
Little explanation:
1) . / "\n"
Split by line...
2) map(. / ", ")
... and comma (Big gotcha: In your version, you'll want to use a regex based split because like this you'll split on commas inside quotation marks too. I just used this because it's terse, and that makes my solution look cool right?)
3) if $doHeaders then...
Here we create an array of strings keys or numbers depending on the number of elements in the first row and whether the first row is a header row
4) .[if $doHeaders then 1 else 0 end:]
Ok, so trim off the top line if it's a header
5) map({($headers[.]): $values[.]})
Above we go over each row in the former csv, and put the $values
into a variable and the keys into a pipe. Then we construct your desired object.
Of course you'll want to use a few regexes to fill in the gotchas, but I hope that starts you on the way.
As of 2018, a modern no code solution would be to use Python tool csvkit
has csvjson data.csv > data.json
.
See their documentation https://csvkit.readthedocs.io/en/1.0.2/
The toolkit is also very handy and complementary to jq
if your script has to debug both csv
and json
formats.
You might also want to check a powerful tool called visidata. Here is a screencast case study that is similar to the original poster's. You can also generate script from visidata
Here is a solution that assumes you run jq with -s
and -R
options.
[
[
split("\n")[] # transform csv input into array
| split(", ") # where first element has key names
| select(length==3) # and other elements have values
]
| {h:.[0], v:.[1:][]} # {h:[keys], v:[values]}
| [.h, (.v|map(tonumber?//.))] # [ [keys], [values] ]
| [ transpose[] # [ [key,value], [key,value], ... ]
| {key:.[0], value:.[1]} # [ {"key":key, "value":value}, ... ]
]
| from_entries # { key:value, key:value, ... }
]
Sample run:
jq -s -R -f filter.jq data.csv
Sample output
[
{
"name": "john",
"age": 20,
"gender": "male"
},
{
"name": "jane",
"age": 30,
"gender": "female"
},
{
"name": "bob",
"age": 25,
"gender": "male"
}
]
with Miller (http://johnkerl.org/miller/doc/) is very simple. Using this input.csv file
name,age,gender
john,20,male
jane,30,female
bob,25,male
and running
mlr --c2j --jlistwrap cat input.csv
You will have
[
{ "name": "john", "age": 20, "gender": "male" }
,{ "name": "jane", "age": 30, "gender": "female" }
,{ "name": "bob", "age": 25, "gender": "male" }
]
In short - yes, except maybe for the one-liner bit.
jq is often well-suited to text wrangling, and this is especially true of versions with regex support. With regex support, for example, the trimming required by the given problem statement is trivial.
Since jq 1.5rc1 includes regex support and has been available since Jan 1, 2015, the following program assumes a version of jq 1.5; if you wish to make it work with jq 1.4, then see the two "For jq 1.4" comments.
Please also note that this program does not handle CSV in all its generality and complexity. (For a similar approach that does handle CSV more generally, see https://github.com/stedolan/jq/wiki/Cookbook#convert-a-csv-file-with-headers-to-json)
# objectify/1 takes an array of string values as inputs, converts
# numeric values to numbers, and packages the results into an object
# with keys specified by the "headers" array
def objectify(headers):
# For jq 1.4, replace the following line by: def tonumberq: .;
def tonumberq: tonumber? // .;
. as $in
| reduce range(0; headers|length) as $i ({}; .[headers[$i]] = ($in[$i] | tonumberq) );
def csv2table:
# For jq 1.4, replace the following line by: def trim: .;
def trim: sub("^ +";"") | sub(" +$";"");
split("\n") | map( split(",") | map(trim) );
def csv2json:
csv2table
| .[0] as $headers
| reduce (.[1:][] | select(length > 0) ) as $row
( []; . + [ $row|objectify($headers) ]);
csv2json
Example (assuming csv.csv is the given CSV text file):
$ jq -R -s -f csv2json.jq csv.csv
[
{
"name": "john",
"age": 20,
"gender": "male"
},
{
"name": "jane",
"age": 30,
"gender": "female"
},
{
"name": "bob",
"age": 25,
"gender": "male"
}
]