large-files

Are there any good workarounds to the GitHub 100MB file size limit for text files?

我的梦境 提交于 2019-11-30 08:09:07
I have a 190 MB plain text file that I want to track on github. The text file is a pronounciation lexicon file for our text-to-speech engine. We regularly add and modify lines in the text files, and the diffs are fairly small, so it's perfect for git in that sense. However, GitHub has a strict 100 MB file size limit in place. I have tried the GitHub Large File Storage service, but that uploads a new version of the entire 190 MB file every time it changes - so that would quickly grow to many gigabytes if I go down that path. I would like to keep the file as one file instead of splitting it

Streaming large images using ASP.Net Webapi

微笑、不失礼 提交于 2019-11-30 06:57:11
问题 We are trying to return large image files using ASP.Net WebApi and using the following code to stream the bytes to the client. public class RetrieveAssetController : ApiController { // GET api/retrieveasset/5 public HttpResponseMessage GetAsset(int id) { HttpResponseMessage httpResponseMessage = new HttpResponseMessage(); string filePath = "SomeImageFile.jpg"; MemoryStream memoryStream = new MemoryStream(); FileStream file = new FileStream(filePath, FileMode.Open, FileAccess.Read); byte[]

Computing MD5SUM of large files in C#

寵の児 提交于 2019-11-30 06:46:57
问题 I am using following code to compute MD5SUM of a file - byte[] b = System.IO.File.ReadAllBytes(file); string sum = BitConverter.ToString(new MD5CryptoServiceProvider().ComputeHash(b)); This works fine normally, but if I encounter a large file (~1GB) - e.g. an iso image or a DVD VOB file - I get an Out of Memory exception. Though, I am able to compute the MD5SUM in cygwin for the same file in about 10secs. Please suggest how can I get this to work for big files in my program. Thanks 回答1: I

Charting massive amounts of data

别来无恙 提交于 2019-11-30 06:42:32
We are currently using ZedGraph to draw a line chart of some data. The input data comes from a file of arbitrary size, therefore, we do not know what the maximum number of datapoints in advance. However, by opening the file and reading the header, we can find out how many data points are in the file. The file format is essentially [time (double), value (double)]. However, the entries are not uniform in the time axis. There may not be any points between say t = 0 sec and t = 10 sec, but there might be 100K entires between t = 10 sec and t = 11 sec, and so on. As an example, our test dataset

How best to use XPath with very large XML files in .NET?

纵饮孤独 提交于 2019-11-30 06:36:52
问题 I need to do some processing on fairly large XML files ( large here being potentially upwards of a gigabyte ) in C# including performing some complex xpath queries. The problem I have is that the standard way I would normally do this through the System.XML libraries likes to load the whole file into memory before it does anything with it, which can cause memory problems with files of this size. I don't need to be updating the files at all just reading them and querying the data contained in

Memory-efficent way to iterate over part of a large file

孤者浪人 提交于 2019-11-30 05:22:23
问题 I normally avoid reading files like this: with open(file) as f: list_of_lines = f.readlines() and use this type of code instead. f = open(file) for line in file: #do something Unless I only have to iterate over a few lines in a file (and I know which lines those are) then it think it is easier to take slices of the list_of_lines. Now this has come back to bite me. I have a HUGE file (reading it into memory is not possible) but I don't need to iterate over all of the lines just a few of them.

Large file upload in Flask

孤街浪徒 提交于 2019-11-30 05:17:26
I am attempting to implement a flask application for uploading files. This file could be very large. For example, almost 2G in size. I have finished the server side process function like this: @app.route("/upload/<filename>", methods=["POST", "PUT"]) def upload_process(filename): filename = secure_filename(filename) fileFullPath = os.path.join(application.config['UPLOAD_FOLDER'], filename) with open(fileFullPath, "wb") as f: chunk_size = 4096 while True: chunk = flask.request.stream.read(chunk_size) if len(chunk) == 0: return f.write(chunk) return jsonify({'filename': filename}) As for browser

Sort very large text file in PowerShell

拜拜、爱过 提交于 2019-11-30 04:19:09
问题 I have standard Apache log files, between 500Mb and 2GB in size. I need to sort the lines in them (each line starts with a date yyyy-MM-dd hh:mm:ss, so no treatment necessary for sorting. The simplest and most obvious thing that comes to mind is Get-Content unsorted.txt | sort | get-unique > sorted.txt I am guessing (without having tried it) that doing this using Get-Content would take forever in my 1GB files. I don't quite know my way around System.IO.StreamReader , but I'm curious if an

How can I process a large file via CSVParser?

筅森魡賤 提交于 2019-11-30 02:57:23
问题 I have a large .csv file (about 300 MB), which is read from a remote host, and parsed into a target file, but I don't need to copy all the lines to the target file. While copying, I need to read each line from the source and if it passes some predicate, add the line to the target file. I suppose that Apache CSV ( apache.commons.csv ) can only parse whole file CSVFormat csvFileFormat = CSVFormat.EXCEL.withHeader(); CSVParser csvFileParser = new CSVParser("filePath", csvFileFormat); List

How to efficiently write large files to disk on background thread (Swift)

烈酒焚心 提交于 2019-11-29 19:46:45
Update I have resolved and removed the distracting error. Please read the entire post and feel free to leave comments if any questions remain. Background I am attempting to write relatively large files (video) to disk on iOS using Swift 2.0, GCD, and a completion handler. I would like to know if there is a more efficient way to perform this task. The task needs to be done without blocking the Main UI, while using completion logic, and also ensuring that the operation happens as quickly as possible. I have custom objects with an NSData property so I am currently experimenting using an extension