data-science-experience

How to add spark packages to Spark R notebook on DSX?

橙三吉。 提交于 2019-12-24 10:56:46
问题 The spark documentation shows how a spark package can be added: sparkR.session(sparkPackages = "com.databricks:spark-avro_2.11:3.0.0") I believe this can only be used when initialising the session. How can we add spark packages for SparkR using a notebook on DSX? 回答1: Please use pixiedust package manager to install the avro package. pixiedust.installPackage("com.databricks:spark-avro_2.11:3.0.0") http://datascience.ibm.com/docs/content/analyze-data/Package-Manager.html Install it from python

Problems installing DSx Desktop

一曲冷凌霜 提交于 2019-12-24 10:37:56
问题 I have problems installing DSx Desktop in my laptop. I have Docker Running (Kitematic), do I have to run Docker daemon in a certain way? 回答1: Your version of DSX Desktop may not support the legacy Docker solution that was bundled with Kitematic and DSX Desktop is probably looking for Docker for Mac. The latest version of DSX Desktop does, however, provide support for Docker Toolbox, the version bundled into Kitematic. Upgrading to the latest version of DSX should allow you to complete the

how to access spark history server from DSX?

﹥>﹥吖頭↗ 提交于 2019-12-24 10:08:03
问题 I need to access the Spark History Server so that I performance tune a slow spark job. I was looking for a link within DSX but could not find one, so I have opened up the spark service in the Bluemix console and have navigated to the spark history server directly from there (Job History link) . Is there a way to access the spark history server directly from DSX? 回答1: It seems that you have to access the spark history server by logging in to the Bluemix console as I have been doing. There is a

How to prevent 'pip install …' running every time I run the whole notebook?

拜拜、爱过 提交于 2019-12-24 00:38:06
问题 Most of the python notebooks I run tend to need some setup for the initial run, using ! pip install ... Executing the setup code every time the notebook is run is inefficient, so I would prefer to avoid that. Also, I don't want to move the setup code to a different notebook because usually it is just a few lines of code. 回答1: The solution for me was to run a small one line python script that only tries to import the module. If the import was successful, the pip install command does not get

How do I read a parquet in PySpark written from Spark?

我是研究僧i 提交于 2019-12-20 09:06:11
问题 I am using two Jupyter notebooks to do different things in an analysis. In my Scala notebook, I write some of my cleaned data to parquet: partitionedDF.select("noStopWords","lowerText","prediction").write.save("swift2d://xxxx.keystone/commentClusters.parquet") I then go to my Python notebook to read in the data: df = spark.read.load("swift2d://xxxx.keystone/commentClusters.parquet") and I get the following error: AnalysisException: u'Unable to infer schema for ParquetFormat at swift2d:/

sc is not created automatically in notebook

点点圈 提交于 2019-12-20 05:25:12
问题 A notebook I created yesterday in DSX has stoped working - errors re can't find the sc object "NameError: global name 'sc' is not defined" I restarted the kernel but can't get it created. I have no other kernel running. I created a new notebook - Spark 2.0 with Python 2 and literally nothing in it except: sc And that comes back as blank. I am expecting details on my SparkContext object. In case I am going mad I double checked docs and it says it should be automatic: The SparkContext and

How do I implement the TensorFrames Spark package on Data Science Experience?

一曲冷凌霜 提交于 2019-12-13 18:05:22
问题 I've been able to import the package: import pixiedust pixiedust.installPackage("databricks:tensorframes:0") But when I try a simple example: import tensorflow as tf import tensorframes as tfs from pyspark.sql import Row data = [Row(x=[float(x), float(2 * x)], key=str(x % 2), z = float(x+1)) for x in range(1, 6)] df = spark.createDataFrame(data) tfs.print_schema(df) I get the following error: ... Py4JJavaError: An error occurred while calling o97.loadClass. : java.lang.NoClassDefFoundError

!pip install nltk -> permission denied

删除回忆录丶 提交于 2019-12-11 12:30:05
问题 I'm trying to install nltk with the following notebook command: !pip install nltk However, that throws the following error: error: could not create '/usr/local/src/bluemix_ipythonspark_141/notebook/lib/python2.7/site-packages/nltk': Permission denied How can I install nltk from the Jupyter notebook? Note that the spark environments on bluemix can only be accessed via the notebook. There isn't she'll access to the environment. 回答1: As the question is about IPython notebooks on Bluemix , the

matplotlib - ImportError: No module named _tkinter

我的未来我决定 提交于 2019-12-11 08:45:13
问题 I have a simple notebook with the following code: %matplotlib inline However, when running it I get the following error: ImportError: No module named _tkinter I have another notebook in the same project, and that one is able to run the statement without issue. The data science experience is a managed service so you don't have root access to install _tkinter. Full stacktrace: ImportErrorTraceback (most recent call last) <ipython-input-43-5f9c00ae8c2d> in <module>() ----> 1 get_ipython().magic

Problems when Installing Rpackages arulesViz and H2o in DSX cloud

99封情书 提交于 2019-12-11 07:17:16
问题 I'm using RStudio in DSX cloud and trying to install the packages using Packages / Install. The errors are below: installation of package ‘arulesViz’ had non-zero exit status installation of package ‘h2o’ had non-zero exit status Any solution. 回答1: Yes, you're right there is an issue installing "arulesViz". That's because that package is looking for a version of "arules" that won't work with R 3.3.2. What you'll also run into is that installing the latest version of "arules" also won't work