Function crashes when using _mm_load_pd

前端未结

关注

 3  817

I have the following function:

template 
void SSE_vectormult(T * A, T * B, int size)
{

    __m128d a;
    __m128d b;
    __m128d c;
    do


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  滥情空心        
                
              
                            
                2020-12-12 04:56
              
            
            
                                                                       
Your data is not guaranteed to be 16 byte aligned as required by SSE loads. Either use _mm_loadu_pd:

    a = _mm_loadu_pd(A);
    ...
    a = _mm_loadu_pd(A2ptr);
    b = _mm_loadu_pd(B2ptr);


or make sure that your data is correctly aligned where possible, e.g. for static or locals:

alignas(16) double A2[2], B2[2], C[2];    // C++11, or C11 with <stdalign.h>


or without C++11, using compiler-specific language extensions:

 __attribute__ ((aligned(16))) double A2[2], B2[2], C[2];   // gcc/clang/ICC/et al

__declspec (align(16))         double A2[2], B2[2], C[2];   // MSVC


You could use #ifdef to #define an ALIGN(x) macro that works on the target compiler.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  野趣味        
                
              
                            
                2020-12-12 05:14
              
            
            
                                                                       
If you look at http://msdn.microsoft.com/en-us/library/cww3b12t(v=vs.90).aspx you can see that the function __mm_load_pd is defined as:

__m128d _mm_load_pd (double *p);


So, in your code A should be of type double, but A is of tipe T that is a template param. You should be sure that you are calling your SSE_vectormult function with the rights template params or just remove the template and use the double type instead,
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  旧时难觅i        
                
              
                            
                2020-12-12 05:19
              
            
            
                                                                       
Let me try and answer why your code works in Linux and not Windows.  Code compiled in 64-bit mode has the stack aligned by 16 bytes.  However, code compiled in 32-bit mode is only 4 byte aligned on windows and is not guaranteed to be 16 byte aligned on Linux.

GCC defaults to 64-bit mode on 64-bit systems.  However MSVC defaults to 32-bit mode even on 64-bit systems.  So I'm going to guess that you did not compile your code in 64-bit mode in windows and _mm_load_pd and _mm_store_pd both need 16 byte aligned addresses so the code crashes.

You have at least three different solutions to get your code working in Windows as well. 


Compile your code in 64 bit mode.
Use unaligned loads and stores (e.g. _mm_storeu_pd) 
Align the data yourself as Paul R suggested.  


The best solution is the third solution since then your code will work on 32 bit systems and on older systems where unaligned loads/stores are much slower.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复