pyspark show dataframe as table with horizontal scroll in ipython notebook

北战南征 提交于 2019-11-29 06:55:47

问题


a pyspark.sql.DataFrame displays messy with DataFrame.show() - lines wrap instead of a scroll.

but displays with pandas.DataFrame.head

I tried these options

import IPython
IPython.auto_scroll_threshold = 9999

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
from IPython.display import display

but no luck. Although the scroll works when used within Atom editor with jupyter plugin:


回答1:


this is a workaround

spark_df.limit(5).toPandas().head()

although, I do not know the computational burden of this query. I am thinking limit() is not expensive. corrections welcome.




回答2:


I created below li'l function and it works fine:

def printDf(sprkDF): 
    newdf = sprkDF.toPandas()
    from IPython.display import display, HTML
    return HTML(newdf.to_html())

you can use it straight on your spark queries if you like, or on any spark data frame:

printDf(spark.sql('''
select * from employee
'''))



回答3:


I'm not sure if anyone's still facing the issue. But it could be resolved by tweaking some website settings using developer tools.

WHen you do

Open developer setting (F12). and then inspect element (ctrl+shift+c) and click on the output. and uncheck whitespace attribute (see snapshot below)

You just need to do this estting once. (unless you refresh the page)

This will show you the exact data natively as is. No need to convert to pandas.



来源:https://stackoverflow.com/questions/43427138/pyspark-show-dataframe-as-table-with-horizontal-scroll-in-ipython-notebook

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!