transpose

PySpark Dataframe cast two columns into new column of tuples based value of a third column

允我心安 提交于 2019-12-06 12:28:36
问题 As the subject describes, I have a PySpark Dataframe that I need to cast two columns into a new column that is a list of tuples based the value of a third column. This cast will reduce or flatten the dataframe by a key value, product id in this case, and the result os one row per key. There are hundreds of millions of rows in this dataframe, with 37M unique product ids. Therefore I need a way to do the transformation on the spark cluster without bringing back any data to the driver (Jupyter

MATLAB: Block matrix multiplying without loops

点点圈 提交于 2019-12-06 11:55:14
I have a block matrix [A B C...] and a matrix D (all 2-dimensional). D has dimensions y-by-y, and A, B, C , etc are each z-by-y. Basically, what I want to compute is the matrix [D*(A'); D*(B'); D*(C');...] , where X ' refers to the transpose of X . However, I want to accomplish this without loops for speed considerations. I have been playing with the reshape command for several hours now, and I know how to use it in other cases, but this use case is different from the other ones and I cannot figure it out. I also would like to avoid using multi-dimensional matrices if at all possible. Honestly

How to transpose lines to column for only 7 rows at a time in file

蹲街弑〆低调 提交于 2019-12-06 07:41:24
Please help, I have a text file that looks something like this: ID: 000001 Name: John Smith Email: jsmith@ibm.com Company: IBM blah1: a blah2: b blah3: c ID: 000002 Name: Jane Doe Email: jdoe@ibm.com Company: IBM blah1: a blah2: b blah3: c ID:000003 . . . etc. Notice that each customer's info is in 7 rows. The ID:000002 marks the start of the next customer, 000003 the next customer, so on and so forth. I would like my output file to be like this (instead of each customer's data in the next rows, to have each ID and subsequent 7 rows to be transposed to columns): ID: 000001,Name: John Smith

Transpose and widen Data

六眼飞鱼酱① 提交于 2019-12-06 06:10:00
My panda data frame looks like as follows: Country Code 1960 1961 1962 1963 1964 1965 1966 1967 1968 ... 2015 ABW 2.615300 2.734390 2.678430 2.929920 2.963250 3.060540 ... 4.349760 AFG 0.249760 0.218480 0.210840 0.217240 0.211410 0.209910 ... 0.671330 ALB NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 1.12214 ... How can I transpose it that it looks like as follows? Country_Code Year Econometric_Metric ABW 1960 2.615300 ABW 1961 2.734390 ABW 1962 2.678430 ... ABW 2015 4.349760 AFG 1960 0.249760 AFG 1961 0.218480 AFG 1962 0.210840 ... AFG 2015 0.671330 ALB 1960 NaN ALB 1961 NaN ALB 1962 NaN ALB 2015 1

transpose 1D array of leading dimension N

北城余情 提交于 2019-12-06 04:56:41
问题 how can i transpose an 1d array of leading dimension N, without extra space ? any language is fine 回答1: My solution for 1D in-place Matrix transposition mn = M*N; /* M rows and N columns */ q = mn - 1; i = 0; /* Index of 1D array that represents the matrix */ do { k = (i*M) % q; while (k>i) k = (M*k) % q; if (k!=i) Swap(k, i); } while ( ++i <= (mn -2) ); /* Update row and column */ matrix.M = N; matrix.N = M; 回答2: Transposing a non-square matrix in-place represented as a linear array is a bit

How to unpivot in BigQuery?

本小妞迷上赌 提交于 2019-12-06 03:06:58
问题 Not sure what functions to call, but transpose is the closest thing I can think of. I have a table in BigQuery that is configured like this: but I want to query a table that is configured like this: What does the SQL code look like for creating this table? Thanks! 回答1: Use the UNION of tables (with ',' in BigQuery), plus some column aliasing: SELECT Location, Size, Quantity FROM ( SELECT Location, 'Small' as Size, Small as Quantity FROM [table] ), ( SELECT Location, 'Medium' as Size, Medium

Need to convert columns to rows in R

▼魔方 西西 提交于 2019-12-06 02:42:34
问题 I have data that looks like a b c 1 5 4 3 6 1 2 5 3 I want to convert it to convert all the columns to rows and want an output like r1 r2 r3 r4 a 1 3 2 b 5 6 5 c 4 1 3 Thanks in advance 回答1: We can transpose the dataset and convert to data.frame with the first column as the row names. m1 <- t(df1) d2 <- data.frame(r1= row.names(m1), m1, row.names=NULL) EDIT: Included the row.names argument in the data.frame call (from @Richard Scriven's comment) Or as @Ananda Mahto mentioned, we can use names

Transpose and group data

邮差的信 提交于 2019-12-06 00:44:17
I need to transpose two column in rows, and group by first column; here is an example. From this: A B IP1 21 IP1 22 IP1 23 IP2 80 IP2 443 IP3 21 IP3 22 IP3 23 IP3 80 IP3 443 To this: A B C D E F IP1 21 22 23 IP2 80 443 IP3 21 22 23 80 443 How can I do this? Can I avoid the use of macro and VBA? You better use VBA , but if you really need formula solution: First, you need to create Unique list: D2=IFERROR(INDEX($A$1:$A$19, MATCH(0, COUNTIF($D$1:D1, $A$1:$A$19), 0)),0) And drag it down to copy. Then, we need to lookup for 1st, 2nd, 3rd, etc match: E2=IFERROR(INDEX($B$1:$B$19, SMALL(IF($D2=$A$1:

Non-square matrix transpose with shared mem in CUDA

回眸只為那壹抹淺笑 提交于 2019-12-05 22:07:05
I was trying to get a variation of the SDK matrix transpose sample for all kind of sizes. Briefly, I have to take an input array (double *a) and write it on two different parts (you will notice the different offsets) of a bigger matrix (double *tab). I'm storing the data in row-major format so I'm using this macro for indexing: #define IDX2L(i,j,ld) (((i)*ld))+(j)) // 0 based index +row-major format This is the simple code I use. __global__ void cuda_a_Coalesced(double *tab, int tab_rows, int a_rows, double *a) { __shared__ double tile[16*(16+1)]; int col = threadIdx.x + blockIdx.x * blockDim

Finding the transpose of a very, very large matrix

拟墨画扇 提交于 2019-12-05 22:02:35
I have this huge 2 dimensional array of data. It is stored in row order: A(1,1) A(1,2) A(1,3) ..... A(n-2,n) A(n-1,n) A(n,n) I want to rearrange it into column order A(1,1) A(2,1) A(3,1) ..... A(n,n-2) A(n,n-1) A(n,n) The data set is rather large - more than will fit on the RAM on a computer. (n is about 10,000, but each data item takes about 1K of space.) Does anyone know slick or efficient algorithms to do this? Create n empty files (reserve enough space for n elements, if you can). Iterate through your original matrix. Append element (i,j) to file j . Once you are done with that, append the