I have a set of docx files autogenerated from a pdf set
I further want to turn these documents to a specific json structures for future use
And I need indexed