What is the optimal way of merge few lines or few words in the large file using NodeJS?

∥☆過路亽.° 提交于 2021-02-11 14:54:20

问题


I would appreciate insight from anyone who can suggest the best or better solution in editing large files anyway ranges from 1MB to 200MB using nodejs.

Our process needs to merge lines to an existing file in the filesystem, we get the changed data in the following format which needs to be merged to filesystem file at the position defined in the changed details.

[{"range":{"startLineNumber":3,"startColumn":3,"endLineNumber":3,"endColumn":3},"rangeLength":0,"text":"\n","rangeOffset":4,"forceMoveMarkers":false},{"range":{"startLineNumber":4,"startColumn":1,"endLineNumber":4,"endColumn":1},"rangeLength":0,"text":"\n","rangeOffset":5,"forceMoveMarkers":false},{"range":{"startLineNumber":5,"startColumn":1,"endLineNumber":5,"endColumn":1},"rangeLength":0,"text":"\n","rangeOffset":6,"forceMoveMarkers":false},{"range":{"startLineNumber":6,"startColumn":1,"endLineNumber":6,"endColumn":1},"rangeLength":0,"text":"f","rangeOffset":7,"forceMoveMarkers":false},{"range":{"startLineNumber":6,"startColumn":2,"endLineNumber":6,"endColumn":2},"rangeLength":0,"text":"a","rangeOffset":8,"forceMoveMarkers":false},{"range":{"startLineNumber":6,"startColumn":3,"endLineNumber":6,"endColumn":3},"rangeLength":0,"text":"s","rangeOffset":9,"forceMoveMarkers":false},{"range":{"startLineNumber":6,"startColumn":4,"endLineNumber":6,"endColumn":4},"rangeLength":0,"text":"d","rangeOffset":10,"forceMoveMarkers":false},{"range":{"startLineNumber":6,"startColumn":5,"endLineNumber":6,"endColumn":5},"rangeLength":0,"text":"f","rangeOffset":11,"forceMoveMarkers":false},{"range":{"startLineNumber":6,"startColumn":6,"endLineNumber":6,"endColumn":6},"rangeLength":0,"text":"a","rangeOffset":12,"forceMoveMarkers":false},{"range":{"startLineNumber":6,"startColumn":7,"endLineNumber":6,"endColumn":7},"rangeLength":0,"text":"s","rangeOffset":13,"forceMoveMarkers":false},{"range":{"startLineNumber":6,"startColumn":8,"endLineNumber":6,"endColumn":8},"rangeLength":0,"text":"f","rangeOffset":14,"forceMoveMarkers":false},{"range":{"startLineNumber":6,"startColumn":9,"endLineNumber":6,"endColumn":9},"rangeLength":0,"text":"s","rangeOffset":15,"forceMoveMarkers":false},{"range":{"startLineNumber":6,"startColumn":10,"endLineNumber":6,"endColumn":10},"rangeLength":0,"text":"a","rangeOffset":16,"forceMoveMarkers":false},{"range":{"startLineNumber":6,"startColumn":11,"endLineNumber":6,"endColumn":11},"rangeLength":0,"text":"f","rangeOffset":17,"forceMoveMarkers":false},{"range":{"startLineNumber":6,"startColumn":12,"endLineNumber":6,"endColumn":12},"rangeLength":0,"text":"s","rangeOffset":18,"forceMoveMarkers":false}]

If we just open the full file and merge those details would work but it would break if we getting too many of those changed details very frequently that can cause out of memory issues as the file been opened many times which is also a very inefficient way.

There is a similar question aimed specifically at c# here. If we open the file in stream mode, is there similar example in nodejs?


回答1:


I would appreciate insight from anyone who can suggest the best or better solution in editing large files anyway ranges from 1MB to 200MB using nodejs.

Our process needs to merge lines to an existing file in the filesystem, we get the changed data in the following format which needs to be merged to filesystem file at the position defined in the changed details.

General OS file systems do not directly support the concept of inserting info into a file. So, if you have a flat file and you want to insert data into it starting at a particular line number, you have to do the following steps:

  1. Open the file and start reading from the beginning.
  2. As you read data from the file, count lines until you reach the desired linenumber.
  3. Then, if you're inserting new data, you need to read some more and buffer into memory the amount of data you intend to insert.
  4. Then do a write to the file at the position of insertion of the data to insert.
  5. Now using another buffer the size of the data you inserted, take turns reading another buffer, then writing out the previous buffer.
  6. Continue until the end of the file is reach and all data is written back to the file (after the newly inserted data).
  7. This has the effect of rewriting all the data after the insertion point back to the file so it will now correctly be in its new location in the file.

As you can tell, this is not efficient at all for large files as you have to read the entire file a buffer at a time and you have to write the insertion and everything after the insertion point.

In node.js, you can use features in the fs module to carry out all these steps, but you have to write the logic to connect them all together as there is no built-in feature to insert new data into a file while pushing the existing data after it.

There is a similar question aimed specifically at c# here. If we open the file in stream mode, is there similar example in nodejs?

The C# example you reference appears to just be appending new data onto the end of the file. That's trivial to do in pretty much any file system library. In node.js, you can do that with fs.appendFile() or you can open any file handle in append mode and then write to it.


To insert data into a file more efficiently, you would need to use a more efficient storage system than a single flat file for all the data. For example, if you stored the file in pieces in approximately 100 line blocks, then to insert data you'd only have to rewrite a portion of one block of data and then perhaps have some cleanup process that rebalances the block boundaries if a block gets way too big or too small.

For efficient line management, you would need to maintain an accurate index of how many lines each file piece contains and obviously what order the pieces should be in. This would allow you to insert data at a somewhat fixed cost no matter how big the entire file was as the most you would need to do is to rewrite one or two blocks of data, even if the entire content was hundreds of GB in size.

Note, you would essentially be building a new file system on top of the OS file system in order to give yourself more efficient inserts or deletions within the overall data. Obviously, the chunks of data could also be stored in a database too and managed there.


Note, if this project is really an editor, text editing a line-based structure is a very well studied problem and you could also study the architectures used in previous projects for further ideas. It's a bit beyond the scope of a typical answer here to study the pros and cons of various architectures. If your system is also a client/server editor where the change instructions are being sent from a client to a server, that also affects some of the desired tradeoffs in the design since you may desire differing tradeoffs in terms of the number of transactions or the amount of data to be sent between client and server.

If some other language uses an optimal way then I think it would be better to find that option as you saying nodejs might not have that option.

This doesn't really have anything to do with the language you choose. This is about how modern and typical operating systems store data in files.




回答2:


In fs module there is a function named appendFile. It would let you append data in your file. Link.



来源:https://stackoverflow.com/questions/58682879/what-is-the-optimal-way-of-merge-few-lines-or-few-words-in-the-large-file-using

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!