matrix-multiplication | 易学教程

Can UIPinchGestureRecognizer and UIPanGestureRecognizer Be Merged?

阅读更多关于 Can UIPinchGestureRecognizer and UIPanGestureRecognizer Be Merged?

I am struggling a bit trying to figure out if it is possible to create a single combined gesture recognizer that combines UIPinchGestureRecognizer with UIPanGestureRecognizer. I am using pan for view translation and pinch for view scaling. I am doing incremental matrix concatenation to derive a resultant final transformation matrix that is applied to the view. This matrix has both scale and translation. Using separate gesture recognizers leads to a jittery movement/scaling. Not what I want. Thus, I want to handle concatenation of scale and translation once within a single gesture. Can someone

Efficient Algorithms for Computing a matrix times its transpose [closed]

阅读更多关于 Efficient Algorithms for Computing a matrix times its transpose [closed]

For a class, a question that was posed by my teacher was the algorithmic cost of multiplying a matrix times its transpose. With the standard 3 loop matrix multiplication algorithm, the efficiency is O(N^3), and I wonder if there was a way to manipulate or take advantage of matrix * matrix transpose to get a faster algorithm. I understand that when you multiply a matrix by its transpose you have to calculate less of the matrix because its symmetrical, but I can't think of how to manipulate an algorithm that could take less than O(n^3). i know there's algorithms like Coppensmith and Straussen

How to do R multiplication with Nx1 1xM for Matrix NxM?

阅读更多关于 How to do R multiplication with Nx1 1xM for Matrix NxM?

问题 I want to do a simple column (Nx1) times row (1xM) multiplication, resulting in (NxM) matrix. Code where I create a row by sequence, and column by transposing a similar sequence row1 <- seq(1:6) col1 <- t(seq(1:6)) col1 * row1 Output which indicates that R thinks matrices more like columns [,1] [,2] [,3] [,4] [,5] [,6] [1,] 1 4 9 16 25 36 Expected output: NxM matrix. OS: Debian 8.5 Linux kernel: 4.6 backports Hardware: Asus Zenbook UX303UA 回答1: In this case using outer would be a more natural

Laderman's 3x3 matrix multiplication with only 23 multiplications, is it worth it?

阅读更多关于 Laderman's 3x3 matrix multiplication with only 23 multiplications, is it worth it?

Take the product of two 3x3 matrices A*B=C . Naively this requires 27 multiplications using the standard algorithm . If one were clever, you could do this using only 23 multiplications, a result found in 1973 by Laderman . The technique involves saving intermediate steps and combining them in the right way. Now lets fix a language and a type, say C++ with elements of double . If the Laderman algorithm was hard-coded versus the simple double loop, could we expect the performance of a modern compiler to edge out the differences of the algorithms? Notes about this question: This is a programming

Transpose matrix multiplication in cuBLAS howto

阅读更多关于 Transpose matrix multiplication in cuBLAS howto

问题 The problem is simple: I have two matrices, A and B, that are M by N, where M >> N. I want to first take the transpose of A, and then multiply that by B (A^T * B) to put that into C, which is N by N. I have everything set up for A and B, but how do I call cublasSgemm properly without it returning the wrong answer? I understand that cuBlas has a cublasOperation_t enum for transposing things beforehand, but somehow I'm not quite using it correctly. My matrices A and B are in row-major order, i

Why can't my CPU maintain peak performance in HPC

阅读更多关于 Why can't my CPU maintain peak performance in HPC

I have developed a high performance Cholesky factorization routine, which should have peak performance at around 10.5 GFLOPs on a single CPU (without hyperthreading). But there is some phenomenon which I don't understand when I test its performance. In my experiment, I measured the performance with increasing matrix dimension N, from 250 up to 10000. In my algorithm I have applied caching (with tuned blocking factor), and data are always accessed with unit stride during computation, so cache performance is optimal; TLB and paging problem are eliminated; I have 8GB available RAM, and the

OpenMP C++ Matrix Multiplication run slower in parallel

阅读更多关于 OpenMP C++ Matrix Multiplication run slower in parallel

问题 I'm learning the basics of paralel execution of for loop using OpenMP. Sadly, my paralel program runs 10x slower than serial version. What am I doing wrong? Am I missing some barriers? double **basicMultiply(double **A, double **B, int size) { int i, j, k; double **res = createMatrix(size); omp_set_num_threads(4); #pragma omp parallel for private(k) for (i = 0; i < size; i++) { for (j = 0; j < size; j++) { for (k = 0; k < size; k++) { res[i][j] += A[i][k] * B[k][j]; } } } return res; } Thank

How to implement fast image filters on iOS platform

阅读更多关于 How to implement fast image filters on iOS platform

I am working on iOS application where user can apply a certain set of photo filters. Each filter is basically set of Photoshop actions with a specific parameters. This actions are: Levels adjustment Brightness / Contrast Hue / Saturation Single and multiple overlay I've repeated all this actions in my code using arithmetic expressions looping through the all pixels in image. But when I run my app on iPhone 4, each filter takes about 3-4 sec to apply which is quite a few time for the user to wait. The image size is 640 x 640 px which is @2x of my view size because it's displayed on Retina

Matrix multiplication using hdf5

阅读更多关于 Matrix multiplication using hdf5

I'm trying to multiplicate 2 big matrices with memory limit using hdf5 (pytables) but function numpy.dot seems to give me error: Valueerror: array is too big I need to do matrix multiplication by myself maybe blockwise or there is some another python function similar to numpy.dot? import numpy as np import time import tables import cProfile import numexpr as ne n_row=10000 n_col=100 n_batch=10 rows = n_row cols = n_col batches = n_batch atom = tables.UInt8Atom() #? filters = tables.Filters(complevel=9, complib='blosc') # tune parameters fileName_a = 'C:\carray_a.h5' shape_a = (rows*batches,

Why does the order of loops in a matrix multiply algorithm affect performance? [duplicate]

阅读更多关于 Why does the order of loops in a matrix multiply algorithm affect performance? [duplicate]

This question already has an answer here: Why does the order of the loops affect performance when iterating over a 2D array? 7 answers I am given two functions for finding the product of two matrices: void MultiplyMatrices_1(int **a, int **b, int **c, int n){ for (int i = 0; i < n; i++) for (int j = 0; j < n; j++) for (int k = 0; k < n; k++) c[i][j] = c[i][j] + a[i][k]*b[k][j]; } void MultiplyMatrices_2(int **a, int **b, int **c, int n){ for (int i = 0; i < n; i++) for (int k = 0; k < n; k++) for (int j = 0; j < n; j++) c[i][j] = c[i][j] + a[i][k]*b[k][j]; } I ran and profiled two executables