问题
I would like if someone had any experience with speed or optimization effects on the size of JSON keys in a document store database like mongodb or elasticsearch.
So for example: I have 2 documents
doc1: { keeeeeey1: 'abc', keeeeeeey2: 'xyz')
doc2: { k1: 'abc', k2: 'xyz')
Lets say I have 10 million records, then to store data in doc1 format would mean more db file size than to store in doc2.
Other than that would are the disadvantages or negative effects in terms of speed or RAM or any other optimization?
回答1:
You correctly noticed that the documents will have different size. So you will save at least 15 bytes
per document (60%
for similar documents) if you decide to adopt the second schema. This will end up in something like 140MB
for your 10 million
records. This will give you the following advantage:
- HDD savings. The only problem is that looking at the prices for current HDD this is mostly useless.
- RAM saving. In comparison with hard discs, this can be useful for indexing. In mongodb working set of indexes should fit in RAM to achieve a good performance. So if you will have indexes on these two fields, you will not only save
140MB
of HDD space but also140MB
of potential RAM space (which is actually noticable). - I/O. A lot of bottlenecks happens due to the limitation of input/output system (the speed of reading/writing from the disk is limited). For your documents, this means that with schema 2 you can potentially read/write
twice as many documents
per 1 second. - network. In a lot of situations network is even way slower then IO, and if you DB server is on different machine then you application server the data has to be sent over the wire. And you will also be able to send twice as much data.
After telling about advantages, I have to tell you a disadvantage for a small keys:
- readability of the database. When you do
db.coll.findOne()
and sees{_id: 1, t: 13423, a: 3, b:0.2}
it is pretty hard to understand what is exactly stored here. - readability of the application similar with the database, but at least here you can have a solution. With a mapping logic, which transforms
currentDate
toc
andprice
top
you can write a clean code and have a short schema.
来源:https://stackoverflow.com/questions/28492785/cost-of-keys-in-json-document-database-mongodb-elasticsearch