Bokeh - Pandas not able to read bytesIO object of excel file from JS

十年热恋 提交于 2020-01-05 04:19:10

问题


I need your inputs on some challenges I'm facing since few days.

My target is to have a upload button, from which I share .xlsx file with 2 sheets. Once I load this data and read it into pandas DataFrame, I perform some pythonic calculations/optimisation code etc and get few results(tabular summarised). Now based on the number of unique 'levels'/'groups' I will create that many tabs and then display this summarise results in each tab. Also have a general(common) plot in the main page.

Below is my effort(not mine but community's :) ) :

1. Upload Button code : (from here)

## Load library
###########################################################################
import pandas as pd
import numpy as np
from xlrd import XLRDError
import io
import base64
import os

from bokeh.layouts import row, column, widgetbox, layout
from bokeh.models import ColumnDataSource, CustomJS, LabelSet
from bokeh.models.widgets import Button, Div, TextInput, DataTable, TableColumn, Panel, Tabs
from bokeh.io import curdoc
from bokeh.plotting import figure
###########################################################################
## Upload Button Widget
file_source = ColumnDataSource({'file_contents':[], 'file_name':[]})
cds_test = ColumnDataSource({'v1':[], 'v2':[]})


def file_callback(attr,old,new):

    global tabs, t
    print('filename:', file_source.data['file_name'])
    raw_contents = file_source.data['file_contents'][0]
    prefix, b64_contents = raw_contents.split(",", 1)
    file_contents = base64.b64decode(b64_contents)
    file_io = io.BytesIO(file_contents)

# Here it errors out when trying '.xlsx' file but work for .csv Any Idea ???? 
    #df1 = pd.read_excel(file_io, sheet = 'Sheet1')
    #df2 = pd.read_excel(file_io, sheet = 'Sheet2')

    # call some python functions for analysis
    # returns few results
    # for now lets assume main_dt has all the results of analysis


    df1 = pd.read_excel(file_path, sheet_name = 'Sheet1')
    df2 = pd.read_excel(file_path, sheet_name = 'Sheet2')

    main_dt = pd.DataFrame({'v1':df1['v1'], 'v2': df2['v2']})
    level_names = main_dt['v2'].unique().tolist()

    sum_v1_level = []
    for i in level_names:
        csd_temp = ColumnDataSource(main_dt[main_dt['v2'] == i])
        columns = [TableColumn(field=j, title="First") for j in main_dt.columns] 
        dt = DataTable(source = csd_temp, columns = columns, width=400, height=280)
        temp = Panel(child = dt, title = i)
        t.append(temp)
        sum_v1_level.append(sum(csd_temp.data['v1']))

    tabs = Tabs(tabs = t)
    cds_plot = ColumnDataSource({'x':level_names, 'y':sum_v1_level})

    p_o = figure(x_range = level_names, plot_height=250, title="Plot")
    p_o.vbar(x='x', top = 'y', width=0.9, source = cds_plot)
    p_o.xgrid.grid_line_color = None
    p_o.y_range.start = 0
    p_o.y_range.end = max(sum_v1_level)*1.2
    labels_o = LabelSet(x='x', y = 'y', text='y', level='glyph',
        x_offset=-13.5, y_offset=0, render_mode='canvas', source = cds_plot)
    p_o.add_layout(labels_o)

    curdoc().add_root(p_o)
    curdoc().add_root(tabs)
    print('successful upload')

file_source.on_change('data', file_callback)

button = Button(label="Upload Data", button_type="success")
# when butotn is clicked, below code in CustomJS will be called
button.callback = CustomJS(args=dict(file_source=file_source), code = """
function read_file(filename) {
    var reader = new FileReader();
    reader.onload = load_handler;
    reader.onerror = error_handler;
    // readAsDataURL represents the file's data as a base64 encoded string
    reader.readAsDataURL(filename);
}

function load_handler(event) {
    var b64string = event.target.result;
    file_source.data = {'file_contents' : [b64string], 'file_name':[input.files[0].name]};
    file_source.trigger("change");
}

function error_handler(evt) {
    if(evt.target.error.name == "NotReadableError") {
        alert("Can't read file!");
    }
}

var input = document.createElement('input');
input.setAttribute('type', 'file'); 
input.onchange = function(){
    if (window.FileReader) {
        read_file(input.files[0]);
    } else {
        alert('FileReader is not supported in this browser');
    }
}
input.click();
""")

Bdw : Any way to suppress this warning or am I doing it the wrong way ?(while inserting read column into CDS)

BokehUserWarning: ColumnDataSource's columns must be of the same length. Current lengths: ('v1', 19), ('v2', 0)

2. Adding to the layout

curdoc().title = 'Test Joel'
curdoc().add_root(button)

Below is the output :

This was the original data : Note : All data shared here are dummy and real case has more sheets and more dimensions .

So to summarise :

  1. Not able to read .xlsx file through the upload button

  2. Is it correct to do all steps in the button callback function itself ?


回答1:


For those you would infuture reference this thread : Here is a solution to have a upload button to handle .xlsx file . This was for python3

I am sharing only the main data handling code. Rest everything is as it is above.

import pandas as pd
import io
import base64


def file_callback_dt1(attr,old,new):
    print('filename:', file_source_dt1.data['file_name'])
    raw_contents = file_source_dt1.data['file_contents'][0]
    prefix, b64_contents = raw_contents.split(",", 1)
    file_contents = base64.b64decode(b64_contents)
    file_io = io.BytesIO(file_contents)
    excel_object = pd.ExcelFile(file_io, engine='xlrd')
    dt_1 = excel_object.parse(sheet_name = 'Sheet1', index_col = 0)
    # rest is upto you :)



回答2:


The general idea of using a CDS in this way is reasonable for now. In the future, there should be better mechanisms, but I can't speculate when they might be implemented. Regarding the error with read_excel, that is a Pandas issue/question, not a Bokeh one.

As for the warning about column lengths, that almost certainly indicates a usage problem. It is telling you that your v2 column is empty, which doesn't seem to be what you intend, and also violates the fundamental assumption that all CDS columns always be the same length. I'm not sure why you are generating an empty list for v2 though, without being able to run code.

Edit: For instance, adding one by one works fine if everything is the same length:

In [4]: s.add([1,2,3], 'foo')
Out[4]: 'foo'

In [5]: s.add([1,2,3], 'bar')
Out[5]: 'bar'

It's only a problem when the thing you add is not the right length, which is what the error message states:

In [6]: s.add([], 'baz')
/Users/bryanv/work/bokeh/bokeh/models/sources.py:138: BokehUserWarning: ColumnDataSource's columns must be of the same length. Current lengths: ('bar', 3), ('baz', 0), ('foo', 3)
  "Current lengths: %s" % ", ".join(sorted(str((k, len(v))) for k, v in data.items())), BokehUserWarning))
Out[6]: 'baz'

If you don't have the data for a column up front don't put in an "empty list" as a placeholder, or as soon as you put in a real column, you have inconsistent lengths. That is the cause of your problem.



来源:https://stackoverflow.com/questions/48295284/bokeh-pandas-not-able-to-read-bytesio-object-of-excel-file-from-js

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!