How to remove illegal characters so a dataframe can write to Excel

前端未结

关注

 7  862

I am trying to write a dataframe to an Excel spreadsheet using ExcelWriter, but it keeps returning an error:

openpyxl.utils.exceptions.IllegalCharacterError


                      
              相关标签:


      
      
        
          7条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  执笔经年        
                
              
                            
                2020-12-03 05:22
              
            
            
                                                                       
Based on Haipeng Su's answer, I added a function that does this:

dataframe = dataframe.applymap(lambda x: x.encode('unicode_escape').
                 decode('utf-8') if isinstance(x, str) else x)


Basically, it escapes the unicode characters if they exist. It worked and I can now write to Excel spreadsheets again!
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  鱼传尺愫        
                
              
                            
                2020-12-03 05:23
              
            
            
                                                                       
If you're still struggling to clean up the characters, this worked well for me:

import xlwings as xw
import pandas as pd
df = pd.read_pickle('C:\\Users\\User1\\picked_DataFrame_notWriting.df')
topath = 'C:\\Users\\User1\\tryAgain.xlsx'
wb = xw.Book(topath)
ws = wb.sheets['Data']
ws.range('A1').options(index=False).value = df
wb.save()
wb.close()

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  孤街浪徒        
                
              
                            
                2020-12-03 05:28
              
            
            
                                                                       
The same problem happened to me. I solved it as follows:


install python package xlsxwriter: 


pip install xlsxwriter



replace the default engine 'openpyxl' with 'xlsxwriter':


dataframe.to_excel("file.xlsx", engine='xlsxwriter')

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  情深已故        
                
              
                            
                2020-12-03 05:28
              
            
            
                                                                       
try a different excel writer engine solved my problem. 

writer = pd.ExcelWriter('file.xlsx', engine='xlsxwriter')

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  逝去的感伤        
                
              
                            
                2020-12-03 05:34
              
            
            
                                                                       
If you don't want to install another Excel writer engine (e.g. xlsxwriter), you may try to remove these illegal characters by looking for the pattern which causes the IllegalCharacterError error to be raised.
Open cell.py which is found at /path/to/your/python/site-packages/openpyxl/cell/, look for check_string function, you'll see it is using a defined regular expression pattern ILLEGAL_CHARACTERS_RE to find those illegal characters. Trying to locate its definition you'll see this line:
ILLEGAL_CHARACTERS_RE = re.compile(r'[\000-\010]|[\013-\014]|[\016-\037]')
This line is what you need to remove those characters. Copy this line to your program and execute the below code before your dataframe is written to Excel:
dataframe = dataframe.applymap(lambda x: ILLEGAL_CHARACTERS_RE.sub(r'', x) if isinstance(x, str) else x)
The above line will remove those characters in every cell.

But the origin of these characters may be a problem. As you say, the dataframe comes from three Excel spreadsheets. If the source Excel spreadsheets contains those characters, you will still face this problem. So if you can control the generation process of source spreadsheets, try to remove these characters there to begin with.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  盖世英雄少女心        
                
              
                            
                2020-12-03 05:35
              
            
            
                                                                       
I was also struggling with some weird characters in a data frame when writing the data frame to html or csv. For example, for characters with accent, I can't write to html file, so I need to convert the characters into characters without the accent. 

My method may not be the best, but it helps me to convert unicode string into ascii compatible.

# install unidecode first 
from unidecode import unidecode

def FormatString(s):
if isinstance(s, unicode):
  try:
    s.encode('ascii')
    return s
  except:
    return unidecode(s)
else:
  return s

df2 = df1.applymap(FormatString) 


In your situation, if you just want to get rid of the illegal characters by changing return unidecode(s) to return 'StringYouWantToReplace'.

Hope this can give me some ideas to deal with your problems. 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复