Merge Many json strings with python pandas inputs

ぃ、小莉子 提交于 2019-12-06 08:51:22

In the end, the fastest way was to write a simple string concat-er. Here were the two best solutions, (one provided by @Skorp)) and their respective %timeit times in graphical form

Method 1. String-Merge

def panel_to_json_string(panel):
    def __merge_stream(key, stream):
        return '"' + key + '"' + ': ' + stream + ', '

    try:
        stream = '{ "__type__": "panel", '
        for item in panel.items:
            stream += __merge_stream(item, panel.loc[item, :, :].to_json()) 

        # take out extra last comma
        stream = stream[:-2] 

        # add the final paren
        stream += '}'
    except:
        logging.exception('Panel Encoding did not work')
return stream

Method 2. Loads-Dumps

def panel_to_json_loads(panel):
    try:
        d = {'__type__' : 'panel'}

        for item in panel.items:
            d[item] = json.loads(panel.loc[item ,: , :].to_json())
        return json.dumps(d)
    except:
        logging.exception('Panel Encoding did not work')

Problem Setup

import timeit
import pandas
import numpy

setup = ("import strat_check.io as sio; import pandas; import numpy;" 
     "panel = pandas.Panel(numpy.random.randn(5, {0}, 4), "
     "items = ['a', 'b', 'c', 'd', 'e'], " 
     "major_axis = pandas.DatetimeIndex(start = '01/01/1990',"
                                        "freq = 's', "
                                        "periods = {0}), "
                                        "minor_axis = numpy.arange(4))")

vals = [10, 100, 1000, 10000, 100000]

d = {'string-merge': [], 
     'loads-dumps': []
     }

for n in vals:
    number = 10

d['string-merge'].append(
    timeit.timeit(stmt = 'panel_to_json_string(panel)', 
                  setup = setup.format(n), 
                  number = number)
)

d['loads-dumps'].append(
    timeit.timeit(stmt = 'sio.panel_to_json_loads(panel)', 
                  setup = setup.format(n), 
                  number = number)
)

if all you need to do is get rid of "\\" you could use
.str.strip("\\") #or

`.str.replace("\\","") `

you should read up on string methods, vectorized string methods and regular expressions. Here's pandas specific info link:

http://pandas.pydata.org/pandas-docs/stable/text.html#text-string-methods

Have you considered merging the dataframes and then "to_json" that frame? you could use pd.merge(masterdf, panel[item], how="outer"). Just a thought, I haven't worked with panels so not sure if the json representation would be accurate. You could also try using this in your loop. You should also consider using the iteritems() method.

masterdf = pd.concat([masterdf, panel[item]], axis = 1, keys =[list(masterdf.columns.values), item]) and then make that into a json.  

You may be even able to do something sexier like:

pd.concat([lamda x: x for panel.items], axis = 1, keys = list(panel.keys())).to_json
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!