large-files

Read lines by number from a large file

此生再无相见时 提交于 2019-11-27 10:20:58
问题 I have a file with 15 million lines (will not fit in memory). I also have a small vector of line numbers - the lines that I want to extract. How can I read-out the lines in one pass? I was hoping for a C function that does it on one pass. 回答1: The trick is to use connection AND open it before read.table : con<-file('filename') open(con) read.table(con,skip=5,nrow=1) #6-th line read.table(con,skip=20,nrow=1) #27-th line ... close(con) You may also try scan , it is faster and gives more control

Bash - How to find the largest file in a directory and its subdirectories?

南笙酒味 提交于 2019-11-27 10:07:44
We're just starting a UNIX class and are learning a variety of Bash commands. Our assignment involves performing various commands on a directory that has a number of folders under it as well. I know how to list and count all the regular files from the root folder using: find . -type l | wc -l But I'd like to know where to go from there in order to find the largest file in the whole directory. I've seen somethings regarding a du command, but we haven't learned that, so in the repertoire of things we've learned I assume we need to somehow connect it to the ls -t command. And pardon me if my

Python Random Access File

别来无恙 提交于 2019-11-27 08:55:54
Is there a Python file type for accessing random lines without traversing the whole file? I need to search within a large file, reading the whole thing into memory wouldn't be possible. Any types or methods would be appreciated. This seems like just the sort of thing mmap was designed for. A mmap object creates a string-like interface to a file: >>> f = open("bonnie.txt", "wb") >>> f.write("My Bonnie lies over the ocean.") >>> f.close() >>> f.open("bonnie.txt", "r+b") >>> mm = mmap(f.fileno(), 0) >>> print mm[3:9] Bonnie In case you were wondering, mmap objects can also be assigned to: >>>

Python random N lines from large file (no duplicate lines)

主宰稳场 提交于 2019-11-27 08:09:33
问题 I need to use python to take N number of lines from large txt file. These files are basically tab delimited tables. My task has the following constraints: These files may contain headers (some have multi-line headers). Headers need to appear in the output in the same order. Each line can be taken only once. The largest file currently is about 150GB (about 60 000 000 lines). Lines are roughly the same length in a file, but may vary between different files. I will usually be taking 5000 random

How do I read a large CSV file with Scala Stream class?

大憨熊 提交于 2019-11-27 07:24:32
How do I read a large CSV file (> 1 Gb) with a Scala Stream? Do you have a code example? Or would you use a different way to read a large CSV file without loading it into memory first? Just use Source.fromFile(...).getLines as you already stated. That returns an Iterator, which is already lazy (You'd use stream as a lazy collection where you wanted previously retrieved values to be memoized, so you can read them again) If you're getting memory problems, then the problem will lie in what you're doing after getLines. Any operation like toList , which forces a strict collection, will cause the

Python: How to read huge text file into memory

流过昼夜 提交于 2019-11-27 06:38:16
I'm using Python 2.6 on a Mac Mini with 1GB RAM. I want to read in a huge text file $ ls -l links.csv; file links.csv; tail links.csv -rw-r--r-- 1 user user 469904280 30 Nov 22:42 links.csv links.csv: ASCII text, with CRLF line terminators 4757187,59883 4757187,99822 4757187,66546 4757187,638452 4757187,4627959 4757187,312826 4757187,6143 4757187,6141 4757187,3081726 4757187,58197 So each line in the file consists of a tuple of two comma separated integer values. I want to read in the whole file and sort it according to the second column. I know, that I could do the sorting without reading the

Working with huge files in VIM

你离开我真会死。 提交于 2019-11-27 05:58:07
I tried opening a huge (~2GB) file in VIM but it choked. I don't actually need to edit the file, just jump around efficiently. How can I go about working with very large files in VIM? Florian I had a 12GB file to edit today. The vim LargeFile plugin did not work for me. It still used up all my memory and then printed an error message :-(. I could not use hexedit for either, as it cannot insert anything, just overwrite. Here is an alternative approach: You split the file, edit the parts and then recombine it. You still need twice the disk space though. Grep for something surrounding the line

How can I read lines from the end of file in Perl?

一曲冷凌霜 提交于 2019-11-27 05:50:24
问题 I am working on a Perl script to read CSV file and do some calculations. CSV file has only two columns, something like below. One Two 1.00 44.000 3.00 55.000 Now this CSV file is very big ,can be from 10 MB to 2GB. Currently I am taking CSV file of size 700 MB. I tried to open this file in notepad, excel but it looks like no software is going to open it. I want to read may be last 1000 lines from CSV file and see the values. How can I do that? I cannot open file in notepad or any other

gitignore by file size?

北慕城南 提交于 2019-11-27 05:13:41
问题 I'm trying to implement Git to manage creative assets (Photoshop, Illustrator, Maya, etc.), and I'd like to exclude files from Git based on file size rather than extension, location, etc. For example, I don't want to exclude all .avi files, but there are a handful of massive +1GB avi files in random directories that I don't want to commit. Any suggestions? 回答1: I'm new to .gitignore, so there may be better ways to do this, but I've been excluding files by file size using: find . -size +1G |

using php to download files, not working on large files? [duplicate]

时光总嘲笑我的痴心妄想 提交于 2019-11-27 05:12:39
This question already has an answer here: Downloading large files reliably in PHP 13 answers I'm using php to download files, rather than the file itself opening in a new window. It seems to work ok for smaller files, but does not work for large files (I need this to work on very large files). Here's the code I have to download the file: function downloadFile($file) { if (file_exists($file)) { //download file header('Content-Description: File Transfer'); header('Content-Type: application/octet-stream'); header('Content-Disposition: attachment; filename='.basename($file)); header('Content