Remove the rows from pandas dataframe, that has sentences longer than certain word length

落花浮王杯 提交于 2020-08-27 06:31:25

问题


I want to remove the rows from the pandas dataframe, that contains the strings from a particular column whose length is greater than the desired length.

For example:

Input frame:

X    Y
0    Hi how are you.
1    An apple
2    glass of water
3    I like to watch movie

Now, say I want to remove the rows which has the string of words with length greater than or equal to 4 from the dataframe.

The desired output frame must be:

X    Y
1    An apple
2    glass of water

Row with value 0,3 in column 'X' is removed as the number of words in column 0 is 4 and column 3 is 5 respectively.


回答1:


First split values by whitespace, get number of rows by Series.str.len and check by inverted condition >= to < with Series.lt for boolean indexing:

df = df[df['Y'].str.split().str.len().lt(4)]
#alternative with inverted mask by ~
#df = df[~df['Y'].str.split().str.len().ge(4)]
print (df)
   X               Y
1  1        An apple
2  2  glass of water



回答2:


You can count the spaces:

df[df.Y.str.count('\s+').lt(3)]

   X               Y
1  1        An apple
2  2  glass of water


来源:https://stackoverflow.com/questions/56563681/remove-the-rows-from-pandas-dataframe-that-has-sentences-longer-than-certain-wo

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!