MongoDB:
MongoDB is document database unlike Relational database. The document stores semi structured data like JSON object ( schema free)
Key features:
- Schema can change over evolution of application
- Full indexing
- Load balancing & Data sharding
- Data replication
- Consistency & Partitioning in CAP theory ( Consistency-Availability-Partitioning)
When to use:
- Real time analytics
- High speed logging
- Semi structured data management
When not to use:
- Highly transactional applications with strong ACID properties ( Atomicity, Consistency, Isolation & Durability). RDBMS is preferred in this use case.
- Operating on data sets involving relations - foreign keys etc
HBASE:
HBase is an open source, non-relational, distributed column family database
Key features:
- It provides a fault-tolerant way of storing large quantities of sparse data (small amounts of information caught within a large collection of empty or unimportant data, such as finding the 50 largest items in a group of 2 billion records, or finding the non-zero items representing less than 0.1% of a huge collection)
- Supports variable schema where each row is different
- Can serve as the input and output for MapReduce job
- Compression, in-memory operation, and Bloom filters on a per-column (A data structure designed to tell you, rapidly and memory-efficiently, whether an element is present in a set)
5.Achieve CP on CAP
When to use HBase:
- If you’re loading data by key, searching data by key (or range), serving data by key, querying data by key
- Storing data by row that doesn’t conform well to a schema (variable schema)
When not to use HBase:
- For relational analytics
- Full table scans
- Data to be aggregated, analyzed by rows instead of columns
Neo4j:
Neo4j is graph database using Property Graph Data Model (Data is stored as a graph and nodes & relationships with properties)
Key features:
- Supports full ACID(Atomicity, Consistency, Isolation and Durability) rules
- Supports Indexes by using Apache Lucence
- Schema free, bottom-up data model design
- High scalability has been achieved due to compact storage and memory caching available for graphs
When to use:
- Master data management
- Network and IT Operations
- Real time recommendations
- Fraud detection
- Social network (like facebook)
When not to use:
- Bulk queries/Scans
- If your application requires Partitioning & Sharding of data
Have a look at comparison of various NoSQL technologies in this article
Sources:
Wiki, Slide share, Cloudera,Tutorials Point,Neo4j