a pyspark.sql.DataFrame
displays messy with DataFrame.show()
- lines wrap instead of a scroll.
but displays with pandas.DataFrame.head
I tried these options
import IPython
IPython.auto_scroll_threshold = 9999
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
from IPython.display import display
but no luck. Although the scroll works when used within Atom editor with jupyter plugin:
this is a workaround
spark_df.limit(5).toPandas().head()
although, I do not know the computational burden of this query. I am thinking limit()
is not expensive. corrections welcome.
I created below li'l function and it works fine:
def printDf(sprkDF):
newdf = sprkDF.toPandas()
from IPython.display import display, HTML
return HTML(newdf.to_html())
you can use it straight on your spark queries if you like, or on any spark data frame:
printDf(spark.sql('''
select * from employee
'''))
来源:https://stackoverflow.com/questions/43427138/pyspark-show-dataframe-as-table-with-horizontal-scroll-in-ipython-notebook