large-files

Finding k-largest elements of a very large file (while k is very LARGE)

若如初见. 提交于 2019-12-03 03:23:19
问题 Let's assume that we have a very large file which contains billions of integers , and we want to find k largest elements of these values , the tricky part is that k itself is very large too , which means we cannot keep k elements in the memory (for example we have a file with 100 billon elements and we want to find 10 billion largest elements) How can we do this in O(n) ? What I thought : We start reading the file and we check it with another file which keeps the k largest elements (sorted in

How can you concatenate two huge files with very little spare disk space? [closed]

北城余情 提交于 2019-12-03 03:08:42
Closed. This question is off-topic. It is not currently accepting answers. Learn more . Want to improve this question? Update the question so it's on-topic for Stack Overflow. Suppose that you have two huge files (several GB) that you want to concatenate together, but that you have very little spare disk space (let's say a couple hundred MB). That is, given file1 and file2 , you want to end up with a single file which is the result of concatenating file1 and file2 together byte-for-byte, and delete the original files. You can't do the obvious cat file2 >> file1; rm file2 , since in between the

grep -f alternative for huge files

萝らか妹 提交于 2019-12-03 03:04:08
grep -F -f file1 file2 file1 is 90 Mb (2.5 million lines, one word per line) file2 is 45 Gb That command doesn't actually produce anything whatsoever, no matter how long I leave it running. Clearly, this is beyond grep's scope. It seems grep can't handle that many queries from the -f option. However, the following command does produce the desired result: head file1 > file3 grep -F -f file3 file2 I have doubts about whether sed or awk would be appropriate alternatives either, given the file sizes. I am at a loss for alternatives... please help. Is it worth it to learn some sql commands? Is it

What is different with PushStreamContent between web api & web api 2?

你离开我真会死。 提交于 2019-12-03 02:52:24
I've created two identical web api projects, one in VS 2012 and another in VS 2013, both targeting the 4.5 .net framework. The projects are based on Filip W's video download tutorial found here: http://www.strathweb.com/2013/01/asynchronously-streaming-video-with-asp-net-web-api/ Copying & pasting the code from the tutorial into the VS 2012 project (using web api 1?) produces no errors (after I add the proper 'using' statements). However, when I follow the same steps in the VS 2013 project I get the following two errors: Error 1 The call is ambiguous between the following methods or properties

Error tokenizing data. C error: out of memory pandas python, large file csv

本秂侑毒 提交于 2019-12-02 23:54:20
I have a large csv file of 3.5 go and I want to read it using pandas. This is my code: import pandas as pd tp = pd.read_csv('train_2011_2012_2013.csv', sep=';', iterator=True, chunksize=20000000, low_memory = False) df = pd.concat(tp, ignore_index=True) I get this error: pandas/parser.pyx in pandas.parser.TextReader.read (pandas/parser.c:8771)() pandas/parser.pyx in pandas.parser.TextReader._read_rows (pandas/parser.c:9731)() pandas/parser.pyx in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:9602)() pandas/parser.pyx in pandas.parser.raise_parser_error (pandas/parser.c:23325)()

Parsing large (20GB) text file with python - reading in 2 lines as 1

不羁的心 提交于 2019-12-02 21:59:41
I'm parsing a 20Gb file and outputting lines that meet a certain condition to another file, however occasionally python will read in 2 lines at once and concatenate them. inputFileHandle = open(inputFileName, 'r') row = 0 for line in inputFileHandle: row = row + 1 if line_meets_condition: outputFileHandle.write(line) else: lstIgnoredRows.append(row) I've checked the line endings in the source file and they check out as line feeds (ascii char 10). Pulling out the problem rows and parsing them in isolation works as expected. Am I hitting some python limitation here? The position in the file of

How do I download a large file (via HTTP) in .NET?

霸气de小男生 提交于 2019-12-02 20:18:37
I need to download a large file (2 GB) over HTTP in a C# console application. Problem is, after about 1.2 GB, the application runs out of memory. Here's the code I'm using: WebClient request = new WebClient(); request.Credentials = new NetworkCredential(username, password); byte[] fileData = request.DownloadData(baseURL + fName); As you can see... I'm reading the file directly into memory. I'm pretty sure I could solve this if I were to read the data back from HTTP in chunks and write it to a file on disk. How could I do this? If you use WebClient.DownloadFile you could save it directly into a

Finding k-largest elements of a very large file (while k is very LARGE)

别说谁变了你拦得住时间么 提交于 2019-12-02 17:48:16
Let's assume that we have a very large file which contains billions of integers , and we want to find k largest elements of these values , the tricky part is that k itself is very large too , which means we cannot keep k elements in the memory (for example we have a file with 100 billon elements and we want to find 10 billion largest elements) How can we do this in O(n) ? What I thought : We start reading the file and we check it with another file which keeps the k largest elements (sorted in increasing order) , if the read element is larger than the first line of the second file we delete the

What's the best way to load large JSON lists in Python?

橙三吉。 提交于 2019-12-02 06:15:59
问题 I have access to a set of files (around 80-800mb each). Unfortunately, there's only one line in every file. The line contains exactly one JSON object (a list of lists). What's the best way to load and parse it into smaller JSON objects? 回答1: There is already a similar post here. Here is the solution they proposed: import json with open('file.json') as infile: o = json.load(infile) chunkSize = 1000 for i in xrange(0, len(o), chunkSize): with open('file_' + str(i//chunkSize) + '.json', 'w') as

How to read millions of rows from the text file and insert into table quickly

China☆狼群 提交于 2019-12-01 19:38:56
I have gone through the Insert 2 million rows into SQL Server quickly link and found that I can do this by using Bulk insert. So I am trying to create the datatable (code as below), but as this is a huge file (more than 300K row) I am getting an OutOfMemoryEexception in my code: string line; DataTable data = new DataTable(); string[] columns = null; bool isInserted = false; using (TextReader tr = new StreamReader(_fileName, Encoding.Default)) { if (columns == null) { line = tr.ReadLine(); columns = line.Split(','); } for (int iColCount = 0; iColCount < columns.Count(); iColCount++) { data