Most Robust way of reading a file or stream using Java (to prevent DoS attacks)

后端未结

关注

 7  460

长情又很酷 2020-12-04 13:51

Currently I have the below code for reading an InputStream. I am storing the whole file into a StringBuilder variable and processing this string af

7条回答

生来不讨喜 (楼主)

2020-12-04 14:08
There are basically 4 ways to do file processing:
1. Stream-Based Processing (the java.io.InputStream model): Optionally put a bufferedReader around the stream, iterate & read the next available text from the stream (if no text is available, block until some becomes available), process each piece of text independently as it's read (catering for widely-varying sizes of text pieces)
2. Chunk-Based Non-Blocking Processing (the java.nio.channels.Channel model): Create a set of fixed-sized buffers (representing the "chunks" to be processed), read into each of the buffers in turn without blocking (nio API delegates to native IO, using fast O/S-level threads), your main processing thread picks each buffer in turn once it is filled and processes the fixed-size chunk, as other buffers continue to be asynchronously loaded.
3. Part File Processing (including line-by-line processing) (can leverage (1) or (2) to isolate or build up each "part"): break your file format down into semantically meaningful sub-parts (if possible! breaking into lines could be possible!), iterate through stream pieces or chunks and build-up content in memory until the next part is completely built, process each part as soon as it's built.
4. Entire File Processing (the java.nio.file.Files model): Read the entire file into memory in one operation, process the complete contents
Which one should you use?
It depends - on your file contents and the type of processing you require.
From a resource-use efficiency perspective (best to worst) is: 1,2,3,4.
From a processing speed & efficiency perspective (best to worst) is: 2,1,3,4.
From an ease of programming perspective (best to worst): 4,3,1,2.
However, some types of processing might require more than the smallest piece of text (ruling out 1, and maybe 2) and some file formats may not have internal parts (ruling out 3).

You're doing 4. I suggest you shift to 3 (or lower), if you can.

Under 4, there's only one way to avoid DOS - limit the size before it's read into memory, (or for that matter copied to your file system). It's too late once it's read in. If this is not possible, then try 3, 2 or 1.

Limiting File Size

Often the file is uploaded via a HTML form.

If uploading using Servlet @MultipartConfig annotation and request.getPart().getInputStream(), you have control over how much data you read from the stream. Also, request.getPart().getSize() returns the file size in advance and if it's small enough, you can do request.getPart().write(path) to write the file to disk.

If uploading using JSF, then JSF 2.2 (very new) has the standard html component (javax.faces.component.html.InputFile), which has an attribute for maxLength; pre-JSF 2.2 implementations have similar custom components (e.g. Tomahawk has with maxLength attribute; PrimeFaces has with sizeLimit attribute).

Alternatives to Read Entire File

Your code which uses InputStream, StringBuilder, etc, is an efficient way to read the entire file, but is not necessarily the simplest way (least lines of code).

Junior/average developers could get the misapprehension that you're doing efficient stream-based processing, when you're processing the entire file - so include appropriate comments.

If you want less code, you could try one of the following:
```
 List stringList = java.nio.file.Files.readAllLines(path, charset);

 or 

 byte[] byteContents =  java.nio.file.Files.readAllBytes(path);
```
But they require care, or they could be inefficient in resource usage. If you use readAllLines and then concatenate the List elements into a single String, then you would consume double the memory (for the List elements + the concatenated String). Similarly, if you use readAllBytes, followed by encoding to String (new String(byteContents, charset)), then again, you're using "double" the memory. So best to process directly against List or byte[], unless you limit your files to a small enough size.
0 讨论(0)

查看其它7个回答
发布评论:

提交评论
- 加载中...