large-files

Java: InputStream too slow to read huge files

最后都变了- 提交于 2019-11-27 16:05:27
问题 I have to read a 53 MB file character by character. When I do it in C++ using ifstream, it is completed in milliseconds but using Java InputStream it takes several minutes. Is it normal for Java to be this slow or am I missing something? Also, I need to complete the program in Java (it uses servlets from which I have to call the functions which process these characters). I was thinking maybe writing the file processing part in C or C++ and then using Java Native Interface to interface these

Extracting data between two tags in HTML file

自作多情 提交于 2019-11-27 15:52:21
I've got a HUUUGE HTML file here saved on my system, which contains data from a product catalogue. The data is structured such that for each product record the name is between two tags (name) and (/name) . Each product has up to 3 attributes: name, productID, and color, but not all products will have all these attributes. How would I go about extracting this data for each product without mixing up the product attributes? The file is also 50 megabyte! Code example .... <name>'hat'</name> blah blah blah <prodId>'1829493'</prodId> blah blah blah <color>'cyan'</color> blah blah blah blah blah blah

How can I insert large files in MySQL db using PHP?

感情迁移 提交于 2019-11-27 15:36:56
I want to upload a large file of maximum size 10MB to my MySQL database. Using .htaccess I changed PHP's own file upload limit to "10485760" = 10MB. I am able to upload files up to 10MB without any problem. But I can not insert the file in the database if it is more that 1 MB in size. I am using file_get_contents to read all file data and pass it to the insert query as a string to be inserted into a LONGBLOB field. But files bigger than 1 MB are not added to the database, although I can use print_r($_FILES) to make sure that the file is uploaded correctly. Any help will be appreciated and I

How Can I Efficiently Read The FIrst Few Lines of Many Files in Delphi

折月煮酒 提交于 2019-11-27 15:12:20
问题 I have a "Find Files" function in my program that will find text files with the .ged suffix that my program reads. I display the found results in an explorer-like window that looks like this: I use the standard FindFirst / FindNext methods, and this works very quickly. The 584 files shown above are found and displayed within a couple of seconds. What I'd now like to do is add two columns to the display that shows the "Source" and "Version" that are contained in each of these files. This

Parsing extremely large XML files in php

不羁岁月 提交于 2019-11-27 14:49:58
I need to parse XML files of 40GB in size, and then normalize, and insert to a MySQL database. How much of the file I need to store in the database is not clear, neither do I know the XML structure. Which parser should I use, and how would you go about doing this? hakre In PHP, you can read in extreme large XML files with the XMLReader Docs : $reader = new XMLReader(); $reader->open($xmlfile); Extreme large XML files should be stored in a compressed format on disk. At least this makes sense as XML files have a high compression ratio. For example gzipped like large.xml.gz . PHP supports that

Git lfs - “this exceeds GitHub's file size limit of 100.00 MB”

倖福魔咒の 提交于 2019-11-27 14:36:06
I have some csv files that are larger than github's file size limit of 100.00 MB. I have been trying to use the Git Large File Storage extension. https://git-lfs.github.com/ From LFS - "Large file versioning- Version large files—even those as large as a couple GB in size—with Git." I have applied the following on the folders of concern: git lfs track "*.csv" However, when I push: remote: error: File Time-Delay-ftn/Raw-count-data-minor-roads1.csv is 445.93 MB; this exceeds GitHub's file size limit of 100.00 MB remote: error: File Time-Delay-ftn/Raw-count-data-major-roads.csv is 295.42 MB; this

Fast Search to see if a String Exists in Large Files with Delphi

自作多情 提交于 2019-11-27 13:34:00
问题 I have a FindFile routine in my program which will list files, but if the "Containing Text" field is filled in, then it should only list files containing that text. If the "Containing Text" field is entered, then I search each file found for the text. My current method of doing that is: var FileContents: TStringlist; begin FileContents.LoadFromFile(Filepath); if Pos(TextToFind, FileContents.Text) = 0 then Found := false else Found := true; The above code is simple, and it generally works okay

Large file upload with WebSocket

旧时模样 提交于 2019-11-27 11:23:08
I'm trying to upload large files (at least 500MB, preferably up to a few GB) using the WebSocket API. The problem is that I can't figure out how to write "send this slice of the file, release the resources used then repeat". I was hoping I could avoid using something like Flash/Silverlight for this. Currently, I'm working with something along the lines of: function FileSlicer(file) { // randomly picked 1MB slices, // I don't think this size is important for this experiment this.sliceSize = 1024*1024; this.slices = Math.ceil(file.size / this.sliceSize); this.currentSlice = 0; this.getNextSlice

Binary search in a sorted (memory-mapped ?) file in Java

落花浮王杯 提交于 2019-11-27 10:28:17
I am struggling to port a Perl program to Java, and learning Java as I go. A central component of the original program is a Perl module that does string prefix lookups in a +500 GB sorted text file using binary search (essentially, "seek" to a byte offset in the middle of the file, backtrack to nearest newline, compare line prefix with the search string, "seek" to half/double that byte offset, repeat until found...) I have experimented with several database solutions but found that nothing beats this in sheer lookup speed with data sets of this size. Do you know of any existing Java library

Searching for a string in a large text file - profiling various methods in python

 ̄綄美尐妖づ 提交于 2019-11-27 10:28:06
This question has been asked many times. After spending some time reading the answers, I did some quick profiling to try out the various methods mentioned previously... I have a 600 MB file with 6 million lines of strings (Category paths from DMOZ project). The entry on each line is unique. I want to load the file once & keep searching for matches in the data The three methods that I tried below list the time taken to load the file, search time for a negative match & memory usage in the task manager 1) set : (i) data = set(f.read().splitlines()) (ii) result = search_str in data Load time ~ 10s