Set max string length in pandas

后端 未结 3 1848
梦毁少年i
梦毁少年i 2020-12-18 12:19

I want my dataframe to auto-truncate strings which are longer than a certain length.

basically:

pd.set_option(\'auto_truncate_string_exceeding_this_l         


        
相关标签:
3条回答
  • 2020-12-18 12:55

    You can use read_csv converters. Lets say you want to truncate column name abc, you can pass a dictionary with function like

    def auto_truncate(val):
        return val[:255]
    df = pd.read_csv('file.csv', converters={'abc': auto_truncate}
    

    If you have columns with different lengths

    df = pd.read_csv('file.csv', converters={'abc': lambda: x: x[:255], 'xyz': lambda: x: x[:512]}
    

    Make sure column type is string. Column index can also be used instead of name in converters dict.

    0 讨论(0)
  • 2020-12-18 12:57

    pd.set_option('display.max_colwidth', 255)

    0 讨论(0)
  • 2020-12-18 13:01

    I'm not sure you can do this on the whole df, the following would work after loading:

    In [21]:
    
    df = pd.DataFrame({"a":['jasjdhadasd']*5, "b":arange(5)})
    df
    Out[21]:
                 a  b
    0  jasjdhadasd  0
    1  jasjdhadasd  1
    2  jasjdhadasd  2
    3  jasjdhadasd  3
    4  jasjdhadasd  4
    In [22]:
    
    for col in df:
        if is_string_like(df[col]):
            df[col] = df[col].str.slice(0,5)
    df
    Out[22]:
           a  b
    0  jasjd  0
    1  jasjd  1
    2  jasjd  2
    3  jasjd  3
    4  jasjd  4
    

    EDIT

    I think if you specified the dtypes in the args to read_csv then you could set the max length:

    df = pd.read_csv('file.csv', dtype=(np.str, maxlen))

    I will try this and confirm shortly

    UPDATE

    Sadly you cannot specify the length, an error is raised if you try this:

    NotImplementedError: the dtype <U5 is not supported for parsing
    

    when attempting to pass the arg dtype=(str,5)

    0 讨论(0)
提交回复
热议问题