large-data | 易学教程

Optimising HDF5 dataset for Read/Write speed

阅读更多关于 Optimising HDF5 dataset for Read/Write speed

问题 I'm currently running an experiment where I scan a target spatially and grab an oscilloscope trace at each discrete pixel. Generally my trace lengths are 200Kpts. After scanning the entire target I assemble these time domain signals spatially and essentially play back a movie of what was scanned. My scan area is 330x220 pixels in size so the entire dataset is larger than RAM on the computer I have to use. To start with I was just saving each oscilloscope trace as a numpy array and then after

Computing the null space of a bigmatrix in R

阅读更多关于 Computing the null space of a bigmatrix in R

问题 I can not find any function or package to calculate the null space or (QR decomposition) of a bigmatrix (from library(bigmemory) ) in R. For example: library(bigmemory) a <- big.matrix(1000000, 1000, type='double', init=0) I tried the following but got the errors shown. How can I find the null space of a bigmemory object? a.qr <- Matrix::qr(a) # Error in as.vector(data) : # no method for coercing this S4 class to a vector q.null <- MASS::Null(a) # Error in as.vector(data) : # no method for

Is it possible to save only half of a symmetric matrix to save the memory?

阅读更多关于 Is it possible to save only half of a symmetric matrix to save the memory?

问题 There is a large matrix that is used in Ax=b type problem. A is symmetric. Is there any algorithm to let us save only half of the matrix and do operation like x=A\b on it? 回答1: You'll only save half the memory, but you can do this by creating a flat version of the matrix, saving that, then unflattening it. The extra time required probably doesn't make the saving worthwhile, mind: % pretend this is symettric... A = rand(10, 10); % store it as a flat list flatA = []; for k = 1:size(A,1) flatA =

Components with large datasets runs slow on IE11/Edge only

阅读更多关于 Components with large datasets runs slow on IE11/Edge only

问题 Consider the code below. <GridBody Rows={rows} /> and imagine that rows.length would amount to any value 2000 or more with each array has about 8 columns in this example. I use a more expanded version of this code to render a part of a table that has been bottle necking my web application. var GridBody = React.createClass({ render: function () { return <tbody> {this.props.Rows.map((row, rowKey) => { return this.renderRow(row, rowKey); })} </tbody>; }, renderRow: function (row, rowKey) {

Improve Query Performance From a Large HDFStore Table with Pandas

阅读更多关于 Improve Query Performance From a Large HDFStore Table with Pandas

问题 I have a large (~160 million rows) dataframe that I've stored to disk with something like this: def fillStore(store, tablename): files = glob.glob('201312*.csv') names = ["ts", "c_id", "f_id","resp_id","resp_len", "s_id"] for f in files: df = pd.read_csv(f, parse_dates=True, index_col=0, names=names) store.append(tablename, df, format='table', data_columns=['c_id','f_id']) The table has a time index and I will query using c_id and f_id in addition to times (via the index). I have another

SQL Joins Taking forever on table with 60 matches

阅读更多关于 SQL Joins Taking forever on table with 60 matches

问题 I have a query that makes joins between 4 tables. Matching my input params on a table by table basis yeilds the following TrialBal - 8 million records matching entity and pl_date Join to ActDetail - Execution is about 85 secs, the row count is 8672175 (with group by, the row count is 1...for now). ActDetail would return zero rows on an inner join. Join to CalendarEngine - there are only 60 matchig records in this table (pl_date & entity), but when this is introduced to the SQL, the query just

SQL Joins Taking forever on table with 60 matches

阅读更多关于 SQL Joins Taking forever on table with 60 matches

How can I validate my 3,000,000 line long XML file?

阅读更多关于 How can I validate my 3,000,000 line long XML file?

问题 I have an XML file. It is nearly correct, but it is not. Error on line 302211. Extra Content at the end of the document. I've spent literally two days trying to debug this, but the file is so big it's nearly impossible. Is there anything I can do ? Here are the relevant lines also (I include 2 lines before the error code, the error begins on the <seg> tag). <tu> <tuv xml:lang="en"> <prop type="feed"></prop> <seg> <bpt i="1" x="1" type="feed"> test </bpt> To switch on computer: <ept i="1"> > <

Sending data through socket spaces, receiving with spaces

阅读更多关于 Sending data through socket spaces, receiving with spaces

问题 I'm using C++ with QT4 for this. And when I try to send large html files(in this case, 8kb), the process of sending and receiving work well. But the file received come with spaces between each character of the html file. Here an example, the file is sent like this: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd"> <html><head><meta name="qrichtext" content="1" /><style type="text/css"> p, li { white-space: pre-wrap; } </style></head><body style="

updating line in large text file using scala

阅读更多关于 updating line in large text file using scala

问题 i've a large text file around 43GB in .ttl contains triples in the form : <http://www.wikidata.org/entity/Q1001> <http://www.w3.org/2002/07/owl#sameAs> <http://la.dbpedia.org/resource/Mahatma_Gandhi> . <http://www.wikidata.org/entity/Q1001> <http://www.w3.org/2002/07/owl#sameAs> <http://lad.dbpedia.org/resource/Mohandas_Gandhi> . and i want to find the fastest way to update a specific line inside the file without rewriting all next text. either by updating it or deleting it and appending it