I am trying to run SparkSQL :
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
But the error i m getting is below:
its very difficult to find where your derby metastore_db is access by another thread, if you are able to find the process then you can kill it using kill command.
Best solutions to restart the system.
If you're running in spark shell, you shouldn't instantiate a HiveContext, there's one created automatically called sqlContext
(the name is misleading - if you compiled Spark with Hive, it will be a HiveContext). See similar discussion here.
If you're not running in shell - this exception means you've created more than one HiveContext in the same JVM, which seems to be impossible - you can only create one.
If you are facing issue during bringing up WAS application on windows machine:
db.lck
file present in WebSphere\AppServer\profiles\AppSrv04\databases\EJBTimers\server1\EJBTimerDB
(My DB is EJBTimerDB which was causing issue)I was facing the same issue while creating table.
sqlContext.sql("CREATE TABLE....
I could see many entries for ps -ef | grep spark-shell
so I killed all of them and restarted spark-shell
. It worked for me.
This happened when I was using pyspark ml Word2Vec. I was trying to load previously built model. Trick is, just create empty data frame of pyspark or scala using sqlContext. Following is the python syntax -
from pyspark.sql.types import StructType
schema = StructType([])`
empty = sqlContext.createDataFrame(sc.emptyRDD(), schema)
This is a workaround. My problem fixed after using this block. Note - It only occurs when you instantiate sqlContext from HiveContext, not SQLContext.
I was getting the same error while creating Data frames on Spark Shell :
Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /metastore_db.
Cause:
I found that this is happening as there were multiple other instances of Spark-Shell already running and holding derby DB already, so when i was starting yet another Spark Shell and creating Data Frame on it using RDD.toDF() it was throwing error:
Solution:
I ran the ps command to find other instances of Spark-Shell:
ps -ef | grep spark-shell
and i killed them all using kill command:
kill -9 Spark-Shell-processID ( example: kill -9 4848)
after all the SPark-Shell instances were gone, i started a new SPark SHell and reran my Data frame function and it ran just fine :)