Pandas dataframe: truncate string fields

后端 未结 3 2081
傲寒
傲寒 2021-01-03 23:07

I have a dataframe and would like to truncate each field to up to 20 characters. I\'ve been naively trying the following:

df = df.astype(str).apply(lambda x:         


        
相关标签:
3条回答
  • 2021-01-03 23:32

    you can use .str.slice() method:

    Demo:

    In [177]: df = pd.DataFrame({
         ...:   'a': pd.util.testing.rands_array(30, 10),
         ...:   'b': pd.util.testing.rands_array(30, 10),
         ...: })
         ...:
    
    In [178]: df
    Out[178]:
                                    a                               b
    0  Mlf6nOsC8S6vv8OxW5ZOWifg3EoqAb  XSGLdkaewwZlNeZ4uTTivi2nMQFc6S
    1  0E4XCBaYFBTSalUMPGpXmke6dQGbkW  KlHuVhbNgQL9HLHYQq3fEdqEIciOhX
    2  URODJeLA0uLvcKBEXPyrmnnNU40MDl  NaY8LURHjgmT1pRrDnbPAeLZq3ANaL
    3  OYA1ahlwVtEVnDOAkZgxNkbvZ7W8Rf  mIzkeLhM7SqYH17vGDzL6DJjSYftGs
    4  uFC1shE02UfxS0VhDASmF8vh9XxFYX  fQOxjDjFehTNT27seOtCAAPW0as9Up
    5  Ja33vQym6L0Ko2Kcf8cg7OMBKMitg5  iGdCvYTyZlR23NeeTAjG1PoL8mWm3j
    6  iNZdXaVpB4zXClxTLt738DY7i6xs6p  q9VKg5fZdItmUpZiQrR6XW5WHmd33l
    7  WWnViRRMPkbXNQOHeqGmzETDpGPRl9  t3I8Ve3ybCJcXajF8pydnwNZQWslTN
    8  5oMFy2PBe1zUIE3XdraMwlrd5MKcx2  gSLtgXJwiS1HugLORXherFT4l1k5QV
    9  weV8BlyJrtRbWpSCxSbj8cSyZxusFR  ylLWort9o8mHWQQ3JB1Twb0xRbLhot
    
    In [179]: df.apply(lambda x: x.str.slice(0, 20))
    Out[179]:
                          a                     b
    0  Mlf6nOsC8S6vv8OxW5ZO  XSGLdkaewwZlNeZ4uTTi
    1  0E4XCBaYFBTSalUMPGpX  KlHuVhbNgQL9HLHYQq3f
    2  URODJeLA0uLvcKBEXPyr  NaY8LURHjgmT1pRrDnbP
    3  OYA1ahlwVtEVnDOAkZgx  mIzkeLhM7SqYH17vGDzL
    4  uFC1shE02UfxS0VhDASm  fQOxjDjFehTNT27seOtC
    5  Ja33vQym6L0Ko2Kcf8cg  iGdCvYTyZlR23NeeTAjG
    6  iNZdXaVpB4zXClxTLt73  q9VKg5fZdItmUpZiQrR6
    7  WWnViRRMPkbXNQOHeqGm  t3I8Ve3ybCJcXajF8pyd
    8  5oMFy2PBe1zUIE3XdraM  gSLtgXJwiS1HugLORXhe
    9  weV8BlyJrtRbWpSCxSbj  ylLWort9o8mHWQQ3JB1T
    
    0 讨论(0)
  • 2021-01-03 23:53

    I think need str for indexing with str:

    df = df.astype(str).apply(lambda x: x.str[:20])
    

    Sample:

    df = pd.DataFrame({'A':[1,2,3],
                       'B':[4,5,6],
                       'C':[7,8,9],
                       'D':[1,3,5],
                       'E':[5,3,6],
                       'F':[7,4,3]}) * 1000
    
    print (df)
          A     B     C     D     E     F
    0  1000  4000  7000  1000  5000  7000
    1  2000  5000  8000  3000  3000  4000
    2  3000  6000  9000  5000  6000  3000
    
    df = df.astype(str).apply(lambda x: x.str[:2])
    print (df)
        A   B   C   D   E   F
    0  10  40  70  10  50  70
    1  20  50  80  30  30  40
    2  30  60  90  50  60  30
    

    Another solution with applymap:

    df = df.astype(str).applymap(lambda x: x[:2])
    print (df)
        A   B   C   D   E   F
    0  10  40  70  10  50  70
    1  20  50  80  30  30  40
    2  30  60  90  50  60  30
    

    Problem of your solution is if x[:20] select only first 20 rows in each column.

    You can test it by custom function:

    def f(x):
        print (x)
        print (x[:2])
    
    df = df.astype(str).apply(f)
    print (df)
    
    0 讨论(0)
  • 2021-01-03 23:58

    Simple one liner to trim long string field in Pandas DataFrame:

    df['short_str'] = df['long_str'].str.slice(0,3)
    
    0 讨论(0)
提交回复
热议问题