large-files

How can I process a large file via CSVParser?

眉间皱痕 提交于 2019-11-30 18:39:26
I have a large .csv file (about 300 MB), which is read from a remote host, and parsed into a target file, but I don't need to copy all the lines to the target file. While copying, I need to read each line from the source and if it passes some predicate, add the line to the target file. I suppose that Apache CSV ( apache.commons.csv ) can only parse whole file CSVFormat csvFileFormat = CSVFormat.EXCEL.withHeader(); CSVParser csvFileParser = new CSVParser("filePath", csvFileFormat); List<CSVRecord> csvRecords = csvFileParser.getRecords(); so I can't use BufferedReader . Based on my code, a new

R: How to quickly read large .dta files without RAM Limitations

百般思念 提交于 2019-11-30 17:52:59
问题 I have a 10 GB .dta Stata file and I am trying to read it into 64-bit R 3.3.1. I am working on a virtual machine with about 130 GB of RAM (4 TB HD) and the .dta file is about 3 million rows and somewhere between 400 and 800 variables. I know data.table() is the fastest way to read in .txt and .csv files, but does anyone have a recommendation for reading largeish .dta files into R? Reading the file into Stata as a .dta file requires about 20-30 seconds, although I need to set my working memory

How to use XMLReader/DOMDocument with large XML file and prevent 500 error

青春壹個敷衍的年華 提交于 2019-11-30 17:04:18
I have an XML file that is approximately 12mb which has about 16000 product's. I need to process it into a database; however, at about 6000 rows it dies with a 500 error. I'm using the Kohana framework (version 3) just in case that has anything to do with it. Here's my code that I have inside the controller: $xml = new XMLReader(); $xml->open("path/to/file.xml"); $doc = new DOMDocument; // Skip ahead to the first <product> while ($xml->read() && $xml->name !== 'product'); // Loop through <product>'s while ($xml->name == 'product') { $node = simplexml_import_dom($doc->importNode($xml->expand(),

Getting Exception when trying to upload a big files size

笑着哭i 提交于 2019-11-30 16:08:36
I'm using wshttpbinding for my service <wsHttpBinding> <binding name="wsHttpBinding_Windows" maxBufferPoolSize="9223372036854775807" maxReceivedMessageSize="2147483647"> <readerQuotas maxArrayLength="2147483647" maxBytesPerRead="2147483647" maxStringContentLength="2147483647" maxNameTableCharCount="2147483647"/> <security mode="Message"> <message clientCredentialType="Windows"/> </security> </binding> </wsHttpBinding> <behavior name="ServiceBehavior"> <dataContractSerializer maxItemsInObjectGraph="6553600"/> <serviceThrottling maxConcurrentCalls="2147483647" maxConcurrentInstances="2147483647"

Parsing large (9GB) file using python

耗尽温柔 提交于 2019-11-30 15:32:50
I have a large text file that I need to parse into a pipe delimited text file using python. The file looks like this (basically): product/productId: D7SDF9S9 review/userId: asdf9uas0d8u9f review/score: 5.0 review/some text here product/productId: D39F99 review/userId: fasd9fasd9f9f review/score: 4.1 review/some text here Each record is separated by two newline charters /n . I have written a parser below. with open ("largefile.txt", "r") as myfile: fullstr = myfile.read() allsplits = re.split("\n\n",fullstr) articles = [] for i,s in enumerate(allsplits[0:]): splits = re.split("\n.*?: ",s)

encrypting and/or decrypting large files (AES) on a memory and storage constrained system, with “catastrophe recovery”

梦想的初衷 提交于 2019-11-30 14:22:32
问题 I have a fairly generic question, so please pardon if it is a bit vague. So, let's a assume a file of 1GB, that needs to be encrypted and later decrypted on a given system. Problem is that the system has less than 512 mb of free memory and about 1.5 GB storage space (give or take), so, with the file "onboard" we have about ~500 MB of "hard drive scratch space" and less than 512 mb RAM to "play with". The system is not unlikely to experience an "unscheduled power down" at any moment during

iostream and large file support

狂风中的少年 提交于 2019-11-30 14:04:34
I'm trying to find a definitive answer and can't, so I'm hoping someone might know. I'm developing a C++ app using GCC 4.x on Linux (32-bit OS). This app needs to be able to read files > 2GB in size. I would really like to use iostream stuff vs. FILE pointers, but I can't find if the large file #defines (_LARGEFILE_SOURCE, _LARGEFILE64_SOURCE, _FILE_OFFSET_BITS=64) have any effect on the iostream headers. I'm compiling on a 32-bit system. Any pointers would be helpful. vladr This has already been decided for you when libstdc++ was compiled, and normally depends on whether or not _GLIBCXX_USE

How to read specific lines of a large csv file

瘦欲@ 提交于 2019-11-30 14:04:20
I am trying to read some specific rows of a large csv file, and I don't want to load the whole file into memory. The index of the specific rows are given in a list L = [2, 5, 15, 98, ...] and my csv file looks like this: Col 1, Col 2, Col3 row11, row12, row13 row21, row22, row23 row31, row32, row33 ... Using the ideas mentioned here I use the following command to read the rows with open('~/file.csv') as f: r = csv.DictReader(f) # I need to read it as a dictionary for my purpose for i in L: for row in enumerate(r): print row[i] I immediately get the following error: IndexError Traceback (most

Large file support in C++

£可爱£侵袭症+ 提交于 2019-11-30 09:53:23
64bit file API is different on each platform. in windows: _fseeki64 in linux: fseeko in freebsd: yet another similar call ... How can I most effectively make it more convenient and portable? Are there any useful examples? Most POSIX-based platforms support the " _FILE_OFFSET_BITS " preprocessor symbol. Setting it to 64 will cause the off_t type to be 64 bits instead of 32, and file manipulation functions like lseek() will automatically support the 64 bit offset through some preprocessor magic. From a compile-time point of view adding 64 bit file offset support in this manner is fairly

Stream parse 4 GB XML file in PHP

天大地大妈咪最大 提交于 2019-11-30 08:42:32
问题 I'm trying and need some help doing the following: I want to stream parse a large XML file ( 4 GB ) with PHP. I can't use simple XML or DOM because they load the entire file into memory, so I need something that can stream the file. How can I do this in PHP? What I am trying to do is to navigate through a series of <doc> elements. And write some of their children to a new xml file. The XML file I am trying to parse looks like this: <feed> <doc> <title>Title of first doc is here</title> <url