vectorization | 易学教程

How do I align cv::Mat for AVX-512/AVX2 architectures?

阅读更多关于 How do I align cv::Mat for AVX-512/AVX2 architectures?

问题 Disclamer: I'm a simd newbie, so if this filthy peasant asks some bad questions. From my understanding, AVX-512 architectures can process up to 16 float variables all together, while AVX2 "only" 4. In order to take advantage of this, the data has to be aligned. As I found out here, this can be done with: For AVX-512: alignas(32) float a[8]; For AVX2: alignas(16) float a[8]; Ok, so my first question is: since 16 is a factor of 32, why don't we always use alignas(32) also for AVX2 architectures

List comprehension in R: map, not filter

阅读更多关于 List comprehension in R: map, not filter

问题 So, this question tells how to perform a list comprehension in R to filter out new values. I'm wondering, what is the standard R way of writing a list comprehension which is generating new values? Basically, if we have a function f and vector x , I want the list f(el) for el in x . (This is like map in functional programming). In Python, this would just be [f(el) for el in x] . How do I write this in R, in the standard way? The problem is, right now I have for-loops: result = c(0) for (i in 1

Efficiently taking the absolute value of an integer vector in C

阅读更多关于 Efficiently taking the absolute value of an integer vector in C

问题 The task is to set each element of a C integer array to its absolute value. I'm trying to do it as efficiently as possible. Below are a progression of optimizations that I've made. Please tell me if these are actually optimizations at all, and if any more can be made! The first parameter of the function will be an integer array, and the second will be an integer size of that array. Here's the standard implementation: void absolute (int array[], int n){ for(int i = 0; i < n; i++) if(array[i] <

Pandas alternate way to add new column with lot of conditions other than apply

阅读更多关于 Pandas alternate way to add new column with lot of conditions other than apply

问题 I have two dataframes, let's say df and map_dum . Here is the df . >>> print(df) sales 0 5 1 10 2 9 3 7 4 1 5 1 6 -1 7 2 8 9 9 8 10 1 11 3 12 10 13 -2 14 8 15 5 16 9 17 6 18 10 19 -1 20 5 21 3 And here is for the map_dum . >>> print(map_dum) class more_than_or_equal_to less_than 0 -1 -1000 0 1 1 0 2 2 2 2 4 3 3 4 6 4 4 6 8 5 5 8 10 6 6 10 1000 My goal is to add new column to the df , column class . In order to do so, I have to check the value in df['sales'] lies in between which values in map

How to vectorize a for loop in R

阅读更多关于 How to vectorize a for loop in R

问题 I'm trying to clean this code up and was wondering if anybody has any suggestions on how to run this in R without a loop. I have a dataset called data with 100 variables and 200,000 observations. What I want to do is essentially expand the dataset by multiplying each observation by a specific scalar and then combine the data together. In the end, I need a data set with 800,000 observations (I have four categories to create) and 101 variables. Here's a loop that I wrote that does this, but it

broadcasting a function on a 2-dimensional numpy array

阅读更多关于 broadcasting a function on a 2-dimensional numpy array

问题 I would like to improve the speed of my code by computing a function once on a numpy array instead of a for loop is over a function of this python library. If I have a function as following: import numpy as np import galsim from math import * M200=1e14 conc=6.9 def func(M200, conc): halo_z=0.2 halo_pos =[1200., 3769.7] halo_pos = galsim.PositionD(x=halo_pos_arcsec[0],y=halo_pos_arcsec[1]) nfw = galsim.NFWHalo(mass=M200, conc=conc, redshift=halo_z,halo_pos=halo_pos, omega_m = 0.3, omega_lam =0

Manually vectorized code 10x slower than auto optimized - what I did wrong?

阅读更多关于 Manually vectorized code 10x slower than auto optimized - what I did wrong?

问题 I'm trying to learn how to exploit vectorization with gcc. I followed this tutorial of Erik Holk ( with source code here ) I just modified it to double. I used this dotproduct to compute multiplication of randomly generated square matrices 1200x1200 of doubles ( 300x300 double4 ). I checked that the results are the same. But what really surprised me is, that the simple dotproduct was actually 10x faster than my manually vectorized. maybe, double4 is too big for SSE ( it would need AVX2 ? )

Normalizing a numpy array

阅读更多关于 Normalizing a numpy array

问题 Given an array, I want to normalize it such that each row sums to 1. I currently have the following code: import numpy w = numpy.array([[0, 1, 0, 1, 0, 0], [1, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 1], [1, 0, 0, 0, 1, 0], [0, 0, 0, 1, 0, 1], [0, 1, 1, 0, 1, 0]], dtype = float) def rownormalize(array): i = 0 for row in array: array[i,:] = array[i,:]/sum(row) i += 1 I've two questions: 1) The code works, but I'm wondering if there's a more elegant way. 2) How can I convert the data type into a float

Unwanted sorted behavior on result of vector-concatenation function

阅读更多关于 Unwanted sorted behavior on result of vector-concatenation function

问题 I apply a simple anonymous function to return c(x,x+5) on the sequence 1:5 I expect to see c(1,6,2,7,3,8,4,9,5,10) (the concatenation of the subresults) but instead the result vector is unwantedly sorted. What is doing that and how do I prevent it? > (function(x) c(x,x+5)) (1:5) [1] 1 2 3 4 5 6 7 8 9 10 However applying the function on each individual argument is right: > (function(x) c(x,x+5)) (1) [1] 1 6 > (function(x) c(x,x+5)) (2) [1] 2 7 ... > (function(x) c(x,x+5)) (5) [1] 5 10 回答1: You

how does postgres handle the bit data type?

阅读更多关于 how does postgres handle the bit data type?

问题 i have a table with a column vector of type bit(2000) . how does the db engine handle operations AND and OR over this values? does it simply divide into 32bit chunks (or 64, respectively) and then compares each chunk separately and in the end simply concats the results together? or does it handle simply as two strings? my point is to predict, which use case would be faster. i got a key-value data (user-item). userID | itemID U1 | I1 U1 | Ix Un | Ij for each user i want to calculate a list of