When creating the mappings for an index that can search through multiple books, is it preferable to use nested mappings like below, or using documents with a parent-child relationship
book: {
properties: {
isbn: { //- ISBN of the book
type: 'string' //- 9783791535661
},
title: { //- Title of the book
type: 'string' //- Alice in Wonderland
},
author: { //- Author of the book(maybe should be array)
type: 'string' //- Lewis Carroll
},
category: { //- Category of the book(maybe should be array)
type: 'string' //- Fantasy
},
toc: { //- Array of the chapters in the book
type: 'nested',
properties: {
html: { //- HTML Content of a chapter
type: 'string' //- <!DOCTYPE html><html>...</html>
},
title: { //- Title of the chapter
type: 'string' //- Down the Rabbit Hole
},
fileName: { //- File name of this chapter
type: 'string' //- chapter_1.html
},
firstPage: { //- The first page of this chapter
type: 'integer' //- 3
},
numberOfPages: { //- How many pages are in this chapter
type: 'integer' //- 27
},
sections: { //- An array of all of the sections within a chapter
type: 'nested',
properties: {
html: { //- The html content of a section
type: 'string' //- <section>...</section>
},
title: { //- The title of a section
type: 'string' //- section number 2 or something
},
figures: { //- Array of the figures within a section
type: 'nested',
properties: {
html: { //- HTML content of a figure
type: 'string' //- <figure>...</figure>
},
caption: { //- The name of a figure
type: 'string' //- Figure 1
},
id: { //- Id of a figure
type: 'string', // figure4
}
}
},
paragraphs: { //- Array of the paragraphs within a section
type: 'nested',
properties: {
html: { //- HTML content of a paragraph
type: 'string', //- <p>...</p>
}
id: { //- Id of a paragraph
type: 'string', // paragraph3
}
}
}
}
}
}
}
}
}
The size of an entire books html is approximately 250kB. I would want to query things such as
- the best matching paragraph including it's nearest paragraphs on either side
- the best matching section from a single book including any child sections
- the best figure given it is inside a section with a matching title
- etc
I don't really know the specifics of the queries I would want to perform, but it is important to have a lot of flexibility to be able to try out very weird ones without having to change all of my mappings too much.
If you use the nested
type, everything will be contained in the same _source
document, which for big books can be quite a mouthful.
Whereas if you use parent/child docs for each chapters and/or sections, you might end up with smaller chunks which are more chewable...
As always, it heavily depends on the queries you will want to make, so you should first think about all the use cases you will want to support and then you'll be better armed to figure out which approach is best.
There's another approach which uses neither nested nor parent/child, and which simply involves denormalization. Concretely, you pick the smallest "entity" you want to consider, e.g. a section, and then simply create standalone documents for each section. In those section documents, you'd have fields for the book title, author, chapter title, section title, etc.
You can try each approach in their own index and see how it goes for your use cases.
nested is basically a way of stuffing everything into the same document. That can be useful for searching, but it makes certain things considerably harder.
Like - for example - if you're trying to find a particular chapter section - your query will return the correct document - the whole book. I would imagine that's probably not what you're looking for, and thus a parent/child
relationship would be the appropriate way to go.
Or just don't bother, and treat book/chapter/section as separate types within an index which query and 'join' on demand.
来源:https://stackoverflow.com/questions/35153578/map-a-book-in-elasticsearch-with-many-levels-nested-vs-parent-child-relationshi