I made a little test case to compare YAML and JSON speed :
import json
import yaml
from datetime import datetime
from random import randint
NB_ROW=1024
pri
Yes, I also noticed that JSON is way faster. So a reasonable approach would be to convert YAML to JSON first. If you don't mind ruby, then you can get a big speedup and ditch the yaml
install altogether:
import commands, json
def load_yaml_file(fn):
ruby = "puts YAML.load_file('%s').to_json" % fn
j = commands.getstatusoutput('ruby -ryaml -rjson -e "%s"' % ruby)
return json.loads(j[1])
Here is a comparison for 100K records:
load_yaml_file: 0.95 s
yaml.load: 7.53 s
And for 1M records:
load_yaml_file: 11.55 s
yaml.load: 77.08 s
If you insist on using yaml.load anyway, remember to put it in a virtualenv to avoid conflicts with other software.