问题
I have a RDBMS database with the following tables:
Airport ( iata PK, airport, city, state, country, lat, long)
cancellation_cause ( cod_cancellation PK, description)
Manufaturer (id_manufacturer PK, manufacturer_name)
Model (id_model PK, model_name, id_manufacturer FK)
Airline( airline_code PK, description)
airplane_type (id_AirplaneType PK, airplane_type)
engine_type (id_engine PK, engine_type)
Aircraft_type (id_aircraft PK, aircraft_type)
Airplane (TailNumber PK, id_model FK, id_aircraft FK, airline_code FK, id_AirplaneType FK, id_engine FK, Issue_date, status, year)
Flight (id_flight PK, cod_cancellation FK, TailNumber FK, iata_origin FK, iata_destin FK, Year, Month, DayofMonth, DayofWeek, DepTime, CRSTime, ArrTime, CRSArrTime, FlightNum, AtualElapsedTime, CRSElapsedTime, AirTime, ArrDelay, DepDelay, distance, TaxiIn, TaiOut, Cancelled, Diverted)
Note: PK - Primary Key; FK - Foreign Key
I'm conducting a comparative study between RDBMS and Cassandra databases. My goal is to migrate this database to Cassandra and run some queries in both in order to compare the performance of both in a similar case.
Can anyone tell me the best way to do this? How I should model the database in Cassandra?
回答1:
Cassandra Query Language (CQL) ver. 3.3 offers semantics for creation of the almost exact replica of the relational tables. This is not the most canonical way to do things, but it can certainly help your situation that, based on comments, seems urgent.
So, using CQL you can create:
CREATE TABLE airport( id text PRIMARY KEY, airport text,
city text, state text, country text, lat float, long float);
then proceed to create other tables like this.
To load csv into a table, use:
COPY airport (id, airport, city, state, country, lat, long)
FROM 'airport.csv' WITH DELIMITER = ';' AND HEADER = TRUE;
Not that in Cassandra, you should probably use UUIDs and generate these as your are loading the values.
It looks to me like the airplane can be denormalized, so I would model it all as one entry, where airplane is a super-column, and engine and type are column. With CQL 3.3 you can model this either this way, or the traditional table-creation for each entity way.
See this post for details.
NOTES:
Regarding the relationships, you will probably need to abandon the concept of the relationships as you know them that are based on the primary key (PK), foreign key (FK) concepts and related hard constraints. To migrate data over to Cassandra NoSQL you will not be relying on hard links, but on the trust that these entities are linked. The only loss you will experience, just like in moving from strongly-typed to a weakly typed language, will be in the database assured linkages. You can still have your IDs, you can still maintain ID-based links, but there will be no database enforced integrity contraints. You will be able to perhaps abandon auto-generated IDs. Here are some other some very applicable reading materials:
Data modelling best practices from CQL 3.3 manual by Datastax.
This article describes the practice for migrating the relational schema design and introducing some general design concepts in Cassandra.
Migration best practices - from RDBMS to NoSQL/Cassandra.
The Datastax, a commercial company behind Cassandra has lots of great educational resources and practical advices here.
来源:https://stackoverflow.com/questions/32123310/migrate-rdbms-to-cassandra