R tm package invalid input in 'utf8towcs'

前端未结

关注

 14  1419

逝去的感伤 2020-11-29 01:47

I\'m trying to use the tm package in R to perform some text analysis. I tied the following:

require(tm)
dataSet <- Corpus(DirSource(\'tmp/\'))
dataSet <


      
      
        
          14条回答        

        
                    
            
            
                         
                
              
              
                
                   独厮守ぢ
                                             
                
                
                (楼主)
            
              
              
                2020-11-29 01:56
              

            
            
                        
I have often run into this issue and this Stack Overflow post is always what comes up first. I have used the top solution before, but it can strip out characters and replace them with garbage (like converting it’s to itâ€™s).

I have found that there is actually a much better solution for this! If you install the stringi package, you can replace tolower() with stri_trans_tolower() and then everything should work fine.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它14个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复