large-files

Parsing Large XML file with Python lxml and Iterparse

戏子无情 提交于 2019-12-01 07:28:47
问题 I'm attempting to write a parser using lxml and the iterparse method to step through a very large xml file containing many items. My file is of the format: <item> <title>Item 1</title> <desc>Description 1</desc> <url> <item>http://www.url1.com</item> </url> </item> <item> <title>Item 2</title> <desc>Description 2</desc> <url> <item>http://www.url2.com</item> </url> </item> and so far my solution is: from lxml import etree context = etree.iterparse( MYFILE, tag='item' ) for event, elem in

How to Send Large File From Client To Server Using WCF?

梦想的初衷 提交于 2019-12-01 05:44:24
How to Send Large File From Client To Server Using WCF in C#? Below the configuration code. <system.serviceModel> <bindings> <basicHttpBinding> <binding name="HttpStreaming_IStreamingSample" maxReceivedMessageSize="67108864" transferMode="Streamed"> </binding> </basicHttpBinding> </bindings> <client> <endpoint address="http://localhost:4127/StreamingSample.svc" binding="basicHttpBinding" bindingConfiguration="HttpStreaming_IStreamingSample" contract="StreamingSample.IStreamingSample" name="HttpStreaming_IStreamingSample" /> </client> </system.serviceModel> You need to check out streaming, as

What libraries are available for manipulating super large images in .Net

梦想与她 提交于 2019-12-01 05:29:26
问题 I have some really large files for example 320 MB tif file with 14000 X 9000 pixels. The operations I need to perform are basically scaling the images to get smaller versions of it and breaking the image into tiles. My code works fine with small files and I use the .Net Bitmap objects but I will occasionally get Out of Memory exceptions for larger files. I've tried using the FreeImage libraries FreeImageBitmap but have the same problems. I'm using something like the following to scale the

most efficient way to find partial string matches in large file of strings (python)

大憨熊 提交于 2019-12-01 05:23:18
I downloaded the Wikipedia article titles file which contains the name of every Wikipedia article. I need to search for all the article titles that may be a possible match. For example, I might have the word "hockey", but the Wikipedia article for hockey that I would want is "Ice_hockey". It should be a case-insensitive search too. I'm using Python, and is there a more efficient way than to just do a line by line search? I'll be performing this search like 500 or a 1000 times per minute ideally. If line by line is my only option, are there some optimizations I can do within this? I think there

How do quickly search through a .csv file in Python

无人久伴 提交于 2019-12-01 05:04:31
问题 I'm reading a 6 million entry .csv file with Python, and I want to be able to search through this file for a particular entry. Are there any tricks to search the entire file? Should you read the whole thing into a dictionary or should you perform a search every time? I tried loading it into a dictionary but that took ages so I'm currently searching through the whole file every time which seems wasteful. Could I possibly utilize that the list is alphabetically ordered? (e.g. if the search word

PHP fwrite() for writing a large string to file

拟墨画扇 提交于 2019-12-01 04:13:10
问题 I have to write a large string 10MB to file, and I am using this line to achieve that: fwrite($file, $content); the problem is: not the whole string is written to the file, and limited to a specific limit. and fwrite always return 7933594 . 回答1: Yes, fwrite function is limited to length, and for a large files you may split the file to a smaller pieces like the following: $file = fopen("file.json", "w"); $pieces = str_split($content, 1024 * 4); foreach ($pieces as $piece) { fwrite($file,

most efficient way to find partial string matches in large file of strings (python)

拟墨画扇 提交于 2019-12-01 02:51:18
问题 I downloaded the Wikipedia article titles file which contains the name of every Wikipedia article. I need to search for all the article titles that may be a possible match. For example, I might have the word "hockey", but the Wikipedia article for hockey that I would want is "Ice_hockey". It should be a case-insensitive search too. I'm using Python, and is there a more efficient way than to just do a line by line search? I'll be performing this search like 500 or a 1000 times per minute

On windows _fseeki64 does not seek to SEEK_END correctly for large files

强颜欢笑 提交于 2019-12-01 01:05:39
I have reduced the problem to the following basic function which should simply print the number of bytes in the file. When I execute it for a file of 83886080 bytes (80 MB) it prints the correct number. However for a file of 4815060992 bytes (4.48 GB) it prints 520093696 which is way to low. It seems to have something to do with the SEEK_END option because if I set the pointer to 4815060992 bytes manually (e.g. _fseeki64(fp, (__int64)4815060992, SEEK_SET) _ftelli64 does return the correct position. So a workaround would be to get the proper file size without using SEEK_END , how is this done?

Memory-efficent way to iterate over part of a large file

雨燕双飞 提交于 2019-11-30 21:17:23
I normally avoid reading files like this: with open(file) as f: list_of_lines = f.readlines() and use this type of code instead. f = open(file) for line in file: #do something Unless I only have to iterate over a few lines in a file (and I know which lines those are) then it think it is easier to take slices of the list_of_lines. Now this has come back to bite me. I have a HUGE file (reading it into memory is not possible) but I don't need to iterate over all of the lines just a few of them. I have code completed that finds where my first line is and finds how many lines after that I need to

NumPy reading file with filtering lines on the fly

ぐ巨炮叔叔 提交于 2019-11-30 19:57:20
I have a large array of numbers written in a CSV file and need to load only a slice of that array. Conceptually I want to call np.genfromtxt() and then row-slice the resulting array, but the file is so large that may not to fit in RAM the number of relevant rows might be small, so there is no need to parse every line. MATLAB has the function textscan() that can take a file descriptor and read only a chunk of the file. Is there anything like that in NumPy? For now, I defined the following function that reads only the lines that satisfy the given condition: def genfromtxt_cond(fname, cond=