large-files

Is it possible to store only a checksum of a large file in git?

北战南征 提交于 2019-12-06 14:03:02
I'm a bioinformatician currently extracting normal-sized sequences from genomic files. Some genomic files are large enough that I don't want to put them into the main git repository, whereas I'm putting the extracted sequences into git. Is it possible to tell git "Here's a large file - don't store the whole file, just take its checksum, and let me know if that file is missing or modified." If that's not possible, I guess I'll have to either git-ignore the large files, or, as suggested in this question , store them in a submodule. I wrote a script that does this sort of thing. You put file

gcc/g++: error when compiling large file

China☆狼群 提交于 2019-12-06 13:47:20
I have a auto-generated C++ source file, around 40 MB in size. It largely consists of push_back commands for some vectors and string constants that shall be pushed. When I try to compile this file, g++ exits and says that it couldn't reserve enough virtual memory (around 3 GB). Googling this problem, I found that using the command line switches --param ggc-min-expand=0 --param ggc-min-heapsize=4096 may solve the problem. They, however, only seem to work when optimization is turned on. 1) Is this really the solution that I am looking for? 2) Or is there a faster, better (compiling takes ages

XML streaming with XProc

孤者浪人 提交于 2019-12-06 11:05:51
问题 I'm playing with xproc, the XML pipeline language and http://xmlcalabash.com/. I'd like to find an example for streaming large xml documents. for example, given the following huge xml document: <Books> <Book> <title>Book-1</title> </Book> <Book> <title>Book-2</title> </Book> <Book> <title>Book-3</title> </Book> <!-- many many.... --> <Book> <title>Book-N</title> </Book> </Books> How should I proceed to loop (streaming) over x->N documents like <Books> <Book> <title>Book-x</title> </Book> <

How to copy a large file in Windows XP?

牧云@^-^@ 提交于 2019-12-06 03:01:30
I have a large file in windows XP - its 38GB. (a VM image) I cannot seem to copy it. Dragging on the desktop - gives error of "Insufficient system resources exist to complete the requested service" Using Java - FileChannel.transferTo(0, fileSize, dest) fails for all files > 2GB Using Java - FileChannel.transferTo() in chunks of 100Mb fails after ~18Gb java.io.IOException: Insufficient system resources exist to complete the requested service at sun.nio.ch.FileDispatcher.write0(Native Method) at sun.nio.ch.FileDispatcher.write(FileDispatcher.java:44) at sun.nio.ch.IOUtil.writeFromNativeBuffer

HttpClient throws OutOfMemory exception when TransferEncodingChunked is not set

偶尔善良 提交于 2019-12-06 02:41:55
In order to support upload of large (actually very large, up to several gigabytes) files with progress report we started using HttpClient with PushStreamContent, as described here . It works straightforward, we copy bytes between two streams, here is a code example: private void PushContent(Stream src, Stream dest, int length) { const int bufferLength = 1024*1024*10; var buffer = new byte[bufferLength]; var pos = 0; while (pos < length) { var bytes = Math.Min(bufferLength, length - pos); src.Read(buffer, 0, bytes); dest.Write(buffer, 0, bytes); pos += bufferLength; dest.Flush(); Console

Binary search over a huge file with unknown line length

元气小坏坏 提交于 2019-12-06 01:20:33
I'm working with huge data CSV file. Each file contains milions of record , each record has a key. The records are sorted by thier key. I dont want to go over the whole file when searching for certian data. I've seen this solution : Reading Huge File in Python But it suggests that you use the same length of lines on the file - which is not supported in my case. I thought about adding a padding to each line and then keeping a fixed line length , but I'd like to know if there is a better way to do it. I'm working with python You don't have to have a fixed width record because you don't have to

Does fread fail for large files?

天大地大妈咪最大 提交于 2019-12-06 00:51:45
问题 I have to analyze a 16 GB file. I am reading through the file sequentially using fread() and fseek() . Is it feasible? Will fread() work for such a large file? 回答1: You don't mention a language, so I'm going to assume C. I don't see any problems with fread , but fseek and ftell may have issues. Those functions use long int as the data type to hold the file position, rather than something intelligent like fpos_t or even size_t . This means that they can fail to work on a file over 2 GB, and

need help designing for search algorithm in a more efficient way

橙三吉。 提交于 2019-12-06 00:43:47
问题 I have a problem that involves biology area. Right now I have 4 VERY LARGE files(each with 0.1 billion lines), but the structure is rather simple, each line of these files has only 2 fields, both stands for a type of gene. My goal is: design an efficient algorithm that can achieves the following: Find a circle within the contents of these 4 files. The circle is defined as: field #1 in a line in file 1 == field #1 in a line in file 2 and field #2 in a line in file 2 == field #1 in a line in

Remote linux server to remote linux server large sparse files copy - How To?

我是研究僧i 提交于 2019-12-05 22:53:50
I have two twins CentOS 5.4 servers with VMware Server installed on each. What is the most reliable and fast method for copying virtual machines files from one server to the other, assuming that I always use sparse file for my vmware virtual machines? The vm's files are a pain to copy since they are very large (50 GB) but since they are sparse files I think something can be done to improve the speed of the copy. If you want to copy large data quickly, rsync over SSH is not for you. As running an rsync daemon for quick one-shot copying is also overkill, yer olde tar and nc do the trick as

Does GitLab support large files via git-annex or otherwise?

北战南征 提交于 2019-12-05 22:47:08
问题 I run a GitLab instance and would like to allow my users to upload files of almost any size. It is well-known that git still has problems with large files. I am aware of approaches to circumvent this issue by storing the files somewhere else and versioning just the metadata, e.g. git-annex, git-media and git-fat. Are any of these integrated into GitLab, or would it be easy to do so? 回答1: As of February 18 2015 git-annex is supported on GitLab 7.8 Enterprise Edition 回答2: This is discussed and