问题
How do you configure Cassandra to run in Azure?
According to the guide linked below you should make one Cloud Service per Cassandra node, and have one VM on each Cloud Service. These VMs should be on the same virtual network. http://blog.metricshub.com/2012/12/27/running-cassandra-on-azure-step-by-step-gotcha-by-gotcha/
It this still the recommended way?
In this set up each VM is exposed with a public IP from the Cloud Service and they does also have an internal IP for use on the virtual network.
And how should you set up the following values in the Cassandra yaml config file? The clients contacting the cluster is not located on the same virtual network as the nodes. And the cluster contains only a single data center.
What should be internal IP, external IP, 0.0.0.0, localhost etc?
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: ??
listen_address: ??
broadcast_address: ??
broadcast_rpc_address: ??
rpc_address: ??
endpoint_snitch: (SimpleSnitch?)
回答1:
It all depends on your Cassandra version. The configuration of v2 differs than v3. Normally, one shouldn't set up Cassandra using Azure's classic deployment mode as it will always be internet facing which compromises the security. Only VMs that require data from Cassandra should be able to connect to it.
Basically the yaml config does not have to be changed other than allowed it to listen to the ethernet inferace if you are building a Cassandra cluster. Such a configuration would be:
# listen_address:
listen_interface: eth0
If you needed more information, this blog post describes all that is needed to set up Cassandra on Azure.
回答2:
DataStax has made this guide in cooperation with Microsoft for setting up Cassandra on Azure: https://academy.datastax.com/content/deploying-datastax-enterprise-microsoft-azure-cloud
In this document they recommend more VMs (nodes) per cloud service. Azure supports up to 50 VMs per cloud service.
All VMs should be in the same availability set to ensure that they are located in different update domains. Azure will automatic assign new nodes in the one availability set to one of 5 update domains in a round robin manner. These update domains should be mapped to the Rack concept in Cassandra by using the GossipingPropertyFileSnitch and specifying the rack/update domain in the cassandra-rackdc.properties file. The Azure assigned update domain can be located in the Azure management console under the instance overview for a cloud service.
All VMs should be on the same VNet in Azure to enable the nodes to communicate with each other. If your client is not on this VNet you need to assign instance level public ips to each VM. See this guide: https://azure.microsoft.com/documentation/articles/virtual-networks-instance-level-public-ip/ When assigning public ips to the VMs you need to add some kind of security to Cassandra. User authentication and encryption.
At the current time the 2.0 branch is the branch recommended for production. The settings for the cassandra.yaml file for 2.0 in the described setup is:
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- {seeds: '<<INTERNAL VNET IPS FOR THE SEED NODES>>'}
listen_address: <<INTERNAL VNET IP FOR THE CURRENT NODE>>
broadcast_address: <<PUBLIC UP FOR THE CURRENT NODE>>
broadcast_rpc_address: <<NOT AVAILABLE IN Cassandra 2.0>>
rpc_address: 0.0.0.0
endpoint_snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
来源:https://stackoverflow.com/questions/29621268/how-to-configure-cassandra-on-azure