SpaCy: how to load Google news word2vec vectors?

后端未结

关注

 4  1645

面向向阳花 2020-12-25 13:48

I\'ve tried several methods of loading the google news word2vec vectors (https://code.google.com/archive/p/word2vec/):

en_nlp = spacy.load(\'en\',vector=Fals


      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   独厮守ぢ
                                             
                
                
                (楼主)
            
              
              
                2020-12-25 14:17
              

            
            
                        
For spacy 1.x, load Google news vectors into gensim and convert to a new format (each line in .txt contains a single vector: string, vec):

from gensim.models.word2vec import Word2Vec
from gensim.models import KeyedVectors
model = KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True)
model.wv.save_word2vec_format('googlenews.txt')


Remove the first line of the .txt:

tail -n +2 googlenews.txt > googlenews.new && mv -f googlenews.new googlenews.txt


Compress the txt as .bz2:

bzip2 googlenews.txt


Create a SpaCy compatible binary file:

spacy.vocab.write_binary_vectors('googlenews.txt.bz2','googlenews.bin')


Move the googlenews.bin to /lib/python/site-packages/spacy/data/en_google-1.0.0/vocab/googlenews.bin of your python environment.

Then load the wordvectors:

import spacy
nlp = spacy.load('en',vectors='en_google')


or load them after later:

nlp.vocab.load_vectors_from_bin_loc('googlenews.bin')

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复