Pandas dataframe: truncate string fields

后端未结

关注

 3  2090

I have a dataframe and would like to truncate each field to up to 20 characters. I\'ve been naively trying the following:

df = df.astype(str).apply(lambda x:


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  死守一世寂寞        
                
              
                            
                2021-01-03 23:32
              
            
            
                                                                       
you can use .str.slice() method:

Demo:

In [177]: df = pd.DataFrame({
     ...:   'a': pd.util.testing.rands_array(30, 10),
     ...:   'b': pd.util.testing.rands_array(30, 10),
     ...: })
     ...:

In [178]: df
Out[178]:
                                a                               b
0  Mlf6nOsC8S6vv8OxW5ZOWifg3EoqAb  XSGLdkaewwZlNeZ4uTTivi2nMQFc6S
1  0E4XCBaYFBTSalUMPGpXmke6dQGbkW  KlHuVhbNgQL9HLHYQq3fEdqEIciOhX
2  URODJeLA0uLvcKBEXPyrmnnNU40MDl  NaY8LURHjgmT1pRrDnbPAeLZq3ANaL
3  OYA1ahlwVtEVnDOAkZgxNkbvZ7W8Rf  mIzkeLhM7SqYH17vGDzL6DJjSYftGs
4  uFC1shE02UfxS0VhDASmF8vh9XxFYX  fQOxjDjFehTNT27seOtCAAPW0as9Up
5  Ja33vQym6L0Ko2Kcf8cg7OMBKMitg5  iGdCvYTyZlR23NeeTAjG1PoL8mWm3j
6  iNZdXaVpB4zXClxTLt738DY7i6xs6p  q9VKg5fZdItmUpZiQrR6XW5WHmd33l
7  WWnViRRMPkbXNQOHeqGmzETDpGPRl9  t3I8Ve3ybCJcXajF8pydnwNZQWslTN
8  5oMFy2PBe1zUIE3XdraMwlrd5MKcx2  gSLtgXJwiS1HugLORXherFT4l1k5QV
9  weV8BlyJrtRbWpSCxSbj8cSyZxusFR  ylLWort9o8mHWQQ3JB1Twb0xRbLhot

In [179]: df.apply(lambda x: x.str.slice(0, 20))
Out[179]:
                      a                     b
0  Mlf6nOsC8S6vv8OxW5ZO  XSGLdkaewwZlNeZ4uTTi
1  0E4XCBaYFBTSalUMPGpX  KlHuVhbNgQL9HLHYQq3f
2  URODJeLA0uLvcKBEXPyr  NaY8LURHjgmT1pRrDnbP
3  OYA1ahlwVtEVnDOAkZgx  mIzkeLhM7SqYH17vGDzL
4  uFC1shE02UfxS0VhDASm  fQOxjDjFehTNT27seOtC
5  Ja33vQym6L0Ko2Kcf8cg  iGdCvYTyZlR23NeeTAjG
6  iNZdXaVpB4zXClxTLt73  q9VKg5fZdItmUpZiQrR6
7  WWnViRRMPkbXNQOHeqGm  t3I8Ve3ybCJcXajF8pyd
8  5oMFy2PBe1zUIE3XdraM  gSLtgXJwiS1HugLORXhe
9  weV8BlyJrtRbWpSCxSbj  ylLWort9o8mHWQQ3JB1T

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  耶瑟儿～        
                
              
                            
                2021-01-03 23:53
              
            
            
                                                                       
I think need str for indexing with str:

df = df.astype(str).apply(lambda x: x.str[:20])


Sample:

df = pd.DataFrame({'A':[1,2,3],
                   'B':[4,5,6],
                   'C':[7,8,9],
                   'D':[1,3,5],
                   'E':[5,3,6],
                   'F':[7,4,3]}) * 1000

print (df)
      A     B     C     D     E     F
0  1000  4000  7000  1000  5000  7000
1  2000  5000  8000  3000  3000  4000
2  3000  6000  9000  5000  6000  3000

df = df.astype(str).apply(lambda x: x.str[:2])
print (df)
    A   B   C   D   E   F
0  10  40  70  10  50  70
1  20  50  80  30  30  40
2  30  60  90  50  60  30


Another solution with applymap:

df = df.astype(str).applymap(lambda x: x[:2])
print (df)
    A   B   C   D   E   F
0  10  40  70  10  50  70
1  20  50  80  30  30  40
2  30  60  90  50  60  30


Problem of your solution is if x[:20] select only first 20 rows in each column.

You can test it by custom function:

def f(x):
    print (x)
    print (x[:2])

df = df.astype(str).apply(f)
print (df)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  情深已故        
                
              
                            
                2021-01-03 23:58
              
            
            
                                                                       
Simple one liner to trim long string field in Pandas DataFrame:

df['short_str'] = df['long_str'].str.slice(0,3)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复