I am getting errors from pyspark connecting to cassandra because it appears I am using a too old a cassandra:
[idf@node1 python]$ nodetool -h localhost version
ReleaseVersion: 2.0.17
[idf@node1 python]$ 
[idf@node1 cassandra]$ java --version Unrecognized option: --version Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit.
[idf@node1 cassandra]$ java -version
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
[idf@node1 cassandra]$ 
I want to upgrade to the latest version. However, I have already collected quite a bit of data and I don't want to lose it. I am using CentOS 7.2 with a single cassandra node. My questions are,
- where is the cassandra data stored on the local file system
- is it correct to assume that I can compress this directory and move it?
Then once I have the data backed up, what is the correct way to upgrade cassandra? Is it
- remove old version completely
- install new version
- copy data back
What is the best practice to do this?
I'm guessing you are using OSS version. Default location for data is /var/lib/cassandra and you can backup it if you wan't. Procedure for upgrade is simple:
- run nodetool drain
- stop cassandra
- save your cassandra.yaml
- remove old and install new version
- update new cassandra.yaml with your settings
- start cassandra
- run nodetool ugradesstables
This should leave you with your node running the new version of cassandra with all your schema and data in it. Be careful if you are upgrading past 2.1 because 2.2 and up require java8. Everything else is the same.
来源:https://stackoverflow.com/questions/37305158/best-practices-on-upgrading-cassandra