large-data

Why does MongoDB takes up so much space?

孤街浪徒 提交于 2019-11-27 02:43:35
问题 I am trying to store records with a set of doubles and ints (around 15-20) in mongoDB. The records mostly (99.99%) have the same structure. When I store the data in a root which is a very structured data storing format, the file is around 2.5GB for 22.5 Million records. For Mongo, however, the database size (from command show dbs ) is around 21GB , whereas the data size (from db.collection.stats() ) is around 13GB . This is a huge overhead ( Clarify: 13GB vs 2.5GB, I'm not even talking about

Read large data from csv file in php [duplicate]

我是研究僧i 提交于 2019-11-27 02:06:28
This question already has an answer here: file_get_contents => PHP Fatal error: Allowed memory exhausted 3 answers I am reading csv & checking with mysql that records are present in my table or not in php. csv has near about 25000 records & when i run my code it display "Service Unavailable" error after 2m 10s (onload: 2m 10s) here i have added code // for set memory limit & execution time ini_set('memory_limit', '512M'); ini_set('max_execution_time', '180'); //function to read csv file function readCSV($csvFile) { $file_handle = fopen($csvFile, 'r'); while (!feof($file_handle) ) { set_time

How to read large (~20 GB) xml file in R?

人盡茶涼 提交于 2019-11-26 23:30:08
问题 I want to read data from large xml file (20 GB) and manipulate them. I tired to use "xmlParse()" but it gave me memory issue before loading. Is there any efficient way to do this? My data dump looks like this, <tags> <row Id="106929" TagName="moto-360" Count="1"/> <row Id="106930" TagName="n1ql" Count="1"/> <row Id="106931" TagName="fable" Count="1" ExcerptPostId="25824355" WikiPostId="25824354"/> <row Id="106932" TagName="deeplearning4j" Count="1"/> <row Id="106933" TagName="pystache" Count=

All k nearest neighbors in 2D, C++

扶醉桌前 提交于 2019-11-26 20:45:36
问题 I need to find for each point of the data set all its nearest neighbors. The data set contains approx. 10 million 2D points. The data are close to the grid, but do not form a precise grid... This option excludes (in my opinion) the use of KD Trees, where the basic assumption is no points have same x coordinate and y coordinate. I need a fast algorithm O(n) or better (but not too difficult for implementation :-)) ) to solve this problem ... Due to the fact that boost is not standardized, I do

Add lines to a file

荒凉一梦 提交于 2019-11-26 20:34:12
问题 I'm new using R. I'm trying to add new lines to a file with my existing data in R. The problem is that my data is of about 30000 rows and 13000 cols. I already try to add a line with the writeLines function but the resulted file contains only the line added. 回答1: Have you tried using the write function? line="blah text blah blah etc etc" write(line,file="myfile",append=TRUE) 回答2: write.table , write.csv and others all have the append= argument, which appends append=TRUE and usually overwrites

How to read only lines that fulfil a condition from a csv into R?

随声附和 提交于 2019-11-26 20:17:35
I am trying to read a large csv file into R. I only want to read and work with some of the rows that fulfil a particular condition (e.g. Variable2 >= 3 ). This is a much smaller dataset. I want to read these lines directly into a dataframe, rather than load the whole dataset into a dataframe and then select according to the condition, since the whole dataset does not easily fit into memory. You could use the read.csv.sql function in the sqldf package and filter using SQL select. From the help page of read.csv.sql : library(sqldf) write.csv(iris, "iris.csv", quote = FALSE, row.names = FALSE)

How to plot with a png as background? [duplicate]

我是研究僧i 提交于 2019-11-26 18:49:19
This question already has an answer here: Overlay data onto background image 3 answers I made a plot with a 3 million points and saved it as PNG. It took a few hours and I would like to avoid re-drawing all the points. How can I generate a new plot that has this PNG as a background? Try this: library(png) #Replace the directory and file information with your info ima <- readPNG("C:\\Documents and Settings\\Bill\\Data\\R\\Data\\Images\\sun.png") #Set up the plot area plot(1:2, type='n', main="Plotting Over an Image", xlab="x", ylab="y") #Get the plot information so the image will fill the plot

What causes a Python segmentation fault?

点点圈 提交于 2019-11-26 17:21:37
I am implementing Kosaraju's Strong Connected Component(SCC) graph search algorithm in Python. The program runs great on small data set, but when I run it on a super-large graph (more than 800,000 nodes), it says "Segmentation Fault". What might be the cause of it? Thank you! Additional Info: First I got this Error when running on the super-large data set: "RuntimeError: maximum recursion depth exceeded in cmp" Then I reset the recursion limit using sys.setrecursionlimit(50000) but got a 'Segmentation fault' Believe me it's not a infinite loop, it runs correct on relatively smaller data. It is

Parallel.ForEach can cause a “Out Of Memory” exception if working with a enumerable with a large object

浪尽此生 提交于 2019-11-26 15:51:57
I am trying to migrate a database where images were stored in the database to a record in the database pointing at a file on the hard drive. I was trying to use Parallel.ForEach to speed up the process using this method to query out the data. However, I noticed that I was getting an OutOfMemory Exception. I know Parallel.ForEach will query a batch of enumerables to mitigate the cost of overhead if there is one for spacing the queries out (so your source will more likely have the next record cached in memory if you do a bunch of queries at once instead of spacing them out). The issue is due to

Read large data from csv file in php [duplicate]

僤鯓⒐⒋嵵緔 提交于 2019-11-26 09:55:10
问题 This question already has answers here : file_get_contents => PHP Fatal error: Allowed memory exhausted (3 answers) Closed 8 months ago . I am reading csv & checking with mysql that records are present in my table or not in php. csv has near about 25000 records & when i run my code it display \"Service Unavailable\" error after 2m 10s (onload: 2m 10s) here i have added code // for set memory limit & execution time ini_set(\'memory_limit\', \'512M\'); ini_set(\'max_execution_time\', \'180\');