问题
a pyspark.sql.DataFrame
displays messy with DataFrame.show()
- lines wrap instead of a scroll.
but displays with pandas.DataFrame.head
I tried these options
import IPython
IPython.auto_scroll_threshold = 9999
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
from IPython.display import display
but no luck. Although the scroll works when used within Atom editor with jupyter plugin:
回答1:
this is a workaround
spark_df.limit(5).toPandas().head()
although, I do not know the computational burden of this query. I am thinking limit()
is not expensive. corrections welcome.
回答2:
I created below li'l function and it works fine:
def printDf(sprkDF):
newdf = sprkDF.toPandas()
from IPython.display import display, HTML
return HTML(newdf.to_html())
you can use it straight on your spark queries if you like, or on any spark data frame:
printDf(spark.sql('''
select * from employee
'''))
回答3:
I'm not sure if anyone's still facing the issue. But it could be resolved by tweaking some website settings using developer tools.
WHen you do
Open developer setting (F12). and then inspect element (ctrl+shift+c) and click on the output. and uncheck whitespace attribute (see snapshot below)
You just need to do this estting once. (unless you refresh the page)
This will show you the exact data natively as is. No need to convert to pandas.
来源:https://stackoverflow.com/questions/43427138/pyspark-show-dataframe-as-table-with-horizontal-scroll-in-ipython-notebook