large-data | 易学教程

QCompleter for large models

阅读更多关于 QCompleter for large models

QCompleter works slightly slow on large data sets (large models): when I start to input characters in QCombobox it passes few seconds to show auto-complete popup with variants, when input 2nd char QCompleter does not react on key press for few seconds as well. Next characters works fine. Model size is about 100K records. Is it possible to improve QCompleter performance or show popup after 2nd or 3rd input symbol? Are there some good examples? Aleksey Kontsevich Solution appears similar to this: https://stackoverflow.com/a/33404207/630169 as QCompleter also uses QListView in its popup() . So

Using Globals instead of passing large arrays in Matlab

阅读更多关于 Using Globals instead of passing large arrays in Matlab

问题 I am using large arrays (about 70 MB each) and am worried about passing them to functions. My understanding is Matlab uses pass-by-value function arguments, making local copies for the called function. As a dirty workaround, I've been declaring the large arrays as global, and manually de-allocating them when computations are completed. My question: Is there a way to use pointers in Matlab? This is how I would do it in C/C++. If not, are there other more memory efficient methods? I've read

Designing an external memory sorting algorithm

阅读更多关于 Designing an external memory sorting algorithm

If I have a very large list stored in external memory that needs to be sorted. Asumimg this list is too large for internal memory, what major factors should be considered in designing an external sorting algorithm? Before you go building your own external sort, you might look at the tools your operating system supplies. Windows has SORT.EXE, which works well enough on some text files, although it has ... idiosyncrasies. The GNU sort, too, works pretty well. You could give either of those a try on a subset of your data to see if they'll do what you need. Otherwise . . . The external sort is a

Moving large number of large files in git repository

阅读更多关于 Moving large number of large files in git repository

问题 My repository has large number of large files. They are mostly data (text). Sometimes, I need to move these files to another location due to refactoring or packaging. I use git mv command to "rename" the path of the files, but it seems inefficient in that the size of the commit (the actual diff size) is very huge, same as rm , git add Is there other ways to reduce the commit size? or should I just add them to .gitignore and upload as a zip file to upstream? Thank you for the answers. FYI,

MATLAB randomly permuting columns differently

阅读更多关于 MATLAB randomly permuting columns differently

I have a very large matrix A with N rows and M columns. I want to basically do the following operation for k = 1:N A(k,:) = A(k,randperm(M)); end but fast and efficiently. (Both M and N are very large, and this is only an inner loop in a more massive outer loop.) More context: I am trying to implement a permutation test for a correlation matrix ( http://en.wikipedia.org/wiki/Resampling_%28statistics%29 ). My data is very large and I am very impatient. If anyone knows of a fast way to implement such a test, I would also be grateful to hear your input! Do I have any hope of avoiding doing this

Using Globals instead of passing large arrays in Matlab

阅读更多关于 Using Globals instead of passing large arrays in Matlab

I am using large arrays (about 70 MB each) and am worried about passing them to functions. My understanding is Matlab uses pass-by-value function arguments, making local copies for the called function. As a dirty workaround, I've been declaring the large arrays as global, and manually de-allocating them when computations are completed. My question: Is there a way to use pointers in Matlab? This is how I would do it in C/C++. If not, are there other more memory efficient methods? I've read that globals are generally a bad idea. @mutzmatron answered my question in a comment, so this is a repost:

Insert large amount of data to BigQuery via bigquery-python library

阅读更多关于 Insert large amount of data to BigQuery via bigquery-python library

I have large csv files and excel files where I read them and create the needed create table script dynamically depending on the fields and types it has. Then insert the data to the created table. I have read this and understood that I should send them with jobs.insert() instead of tabledata.insertAll() for large amount of data. This is how I call it (Works for smaller files not large ones). result = client.push_rows(datasetname,table_name,insertObject) # insertObject is a list of dictionaries When I use library's push_rows it gives this error in windows. [Errno 10054] An existing connection

R: xmlEventParse with Large, Varying-node XML Input and Conversion to Data Frame

阅读更多关于 R: xmlEventParse with Large, Varying-node XML Input and Conversion to Data Frame

I have ~100 XML files of publication data each > 10GB formatted like this: <?xml version="1.0" encoding="UTF-8"?> <records xmlns="http://website”> <REC rid=“this is a test”> <UID>ABCD123</UID> <data_1> <fullrecord_metadata> <references count=“3”> <reference> <uid>ABCD2345</uid> </reference> <reference> <uid>ABCD3456</uid> </reference> <reference> <uid>ABCD4567</uid> </reference> </references> </fullrecord_metadata> </data_1> </REC> <REC rid=“this is a test”> <UID>XYZ0987</UID> <data_1> <fullrecord_metadata> <references count=“N”> </references> </fullrecord_metadata> </data_1> </REC> </records>

R: xmlEventParse with Large, Varying-node XML Input and Conversion to Data Frame

阅读更多关于 R: xmlEventParse with Large, Varying-node XML Input and Conversion to Data Frame

问题 I have ~100 XML files of publication data each > 10GB formatted like this: <?xml version="1.0" encoding="UTF-8"?> <records xmlns="http://website”> <REC rid=“this is a test”> <UID>ABCD123</UID> <data_1> <fullrecord_metadata> <references count=“3”> <reference> <uid>ABCD2345</uid> </reference> <reference> <uid>ABCD3456</uid> </reference> <reference> <uid>ABCD4567</uid> </reference> </references> </fullrecord_metadata> </data_1> </REC> <REC rid=“this is a test”> <UID>XYZ0987</UID> <data_1>

R could not allocate memory on ff procedure. How come?

阅读更多关于 R could not allocate memory on ff procedure. How come?

I'm working on a 64-bit Windows Server 2008 machine with Intel Xeon processor and 24 GB of RAM. I'm having trouble trying to read a particular TSV (tab-delimited) file of 11 GB (>24 million rows, 20 columns). My usual companion, read.table , has failed me. I'm currently trying the package ff , through this procedure: > df <- read.delim.ffdf(file = "data.tsv", + header = TRUE, + VERBOSE = TRUE, + first.rows = 1e3, + next.rows = 1e6, + na.strings = c("", NA), + colClasses = c("NUMERO_PROCESSO" = "factor")) Which works fine for about 6 million records, but then I get an error, as you can see: