问题
I want to display my results in the form of a histogram in Zeppelin. I came across plotly. My code is in scala and I would like to know the steps to incorporate plotly into zeppelin using scala. Or is there any better way(libraries) that can be used to draw a histogram in Zeppelin(Scala)?
回答1:
If you have a dataframe called plotTemp with columns "id","degree" then you can do the following:
- In a scala window register the dataframe as a temporary table
plotTemp.registerTempTable("plotTemp")
Then switch to the SQL interpreter in a new window
%sql select degree, count(1) nInBin from plotTemp group by degree order by degree
You can then click on the bar plot icon and you should see what you are looking for
Example of distribution plot done in Zeppelin
回答2:
After trying basically every available solution I eventually settled for vegas-viz. If you look at their project's page on GitHub, they claim to be "The Missing MatPlotLib for Scala + Spark". Although that sounds a little bit exaggerated to me at the moment, the library does its work and does it well.
This is the procedure I suggest for drawing a Bar Chart (that's what you need for histograms, basically) in the Zeppelin's Spark Interpreter:
import dependencies (please check the vegas maven repository for the latest versions)
%dep z.load("org.vegas-viz:vegas_2.11:0.3.11") z.load("org.vegas-viz:vegas-spark_2.11:0.3.11")
Note that vegas-spark is needed only if you want to draw directly from a DataFrame, see below.
import packages
import vegas._ import vegas.render.WindowRenderer._
draw chart
val plot = Vegas("Sample Column Chart") .withData( Seq( Map("country" -> "USA", "population" -> 314), Map("country" -> "UK", "population" -> 64), Map("country" -> "DK", "population" -> 80) ) ) .encodeX("country", Nom) .encodeY("population", Quant) .mark(Bar) plot.show
The result should be similar to the image below:
you can even draw an image directly from a DataFrame if you have added vegas-spark among the dependencies (see point 1.) but you also need an extra import for this to work:
import vegas.sparkExt._ val df = Seq( ("USA", 314), ("UK", 64), ("DK", 80) ).toDF("country", "population") val plot = Vegas("Sample Column Chart", width=600, height=320) .withDataFrame(df) .encodeX("country", Nom) .encodeY("population", Quant) .mark(Bar) plot.show
The result should be the same as above.
回答3:
I just released spark-highcharts. With following code, you can create a histogram.
import com.knockdata.spark.highcharts._
import com.knockdata.spark.highcharts.model._
highcharts(bank
.series("x" -> "age", "y" -> count("*"))
.orderBy(col("age"))
)
.chart(Chart.column)
.plotOptions(new plotOptions.Column().groupPadding(0).pointPadding(0).borderWidth(0))
.plot()
来源:https://stackoverflow.com/questions/38323164/using-plotly-with-zeppellin-in-scala