问题
I have written the following code to use regex to request pages, and look for strings that resemble interest rates. The overall code works; however, it is creating multiple empty dataframes and I can't get the code to drop the empty frames to clean up my output. I have been trying to use .dropna, .drop, and .empty to try and deprecate the dataframes but the output remains unchanged and keeps printing the empty dataframes with the information I have already. Is there an method I am not aware of that could get rid of these empty frames. Code and output below:
plcompetitors = ['https://www.lendingclub.com/loans/personal-loans',
'https://www.marcus.com/us/en/personal-loans',
'https://www.discover.com/personal-loans/']
#cycle through links in array until it finds APR rates/fixed or variable using regex
for link in plcompetitors:
cdate = datetime.date.today()
l = r.get(link)
l.encoding = 'utf-8'
data = l.text
soup = bs(data, 'html.parser')
paragraph = soup.find_all(text=re.compile('[0-9]%'))
for n in paragraph:
matches = []
matches.extend(re.findall('(?i)\d+(?:\.\d+)?%\s*(?:to|-)\s*\d+(?:\.\d+)?%', n.string))
sint = pd.Series(matches)
qdate = pd.Series([datetime.datetime.now()]*len(sint))
slink = pd.Series([link]*len(sint))
df = pd.concat([qdate,sint,slink],axis=1)
df.columns = ['Date','Interest Rate', 'URL']
print(df)
Output:
...
0 ...
1 ...
[2 rows x 3 columns]
...
0 ...
[1 rows x 3 columns]
...
0 ...
1 ...
2 ...
3 ...
[4 rows x 3 columns]
Empty DataFrame
Columns: [Date, Interest Rate, URL]
Index: []
Empty DataFrame
Columns: [Date, Interest Rate, URL]
Index: []
Empty DataFrame
Columns: [Date, Interest Rate, URL]
Index: []
Empty DataFrame
Columns: [Date, Interest Rate, URL]
Index: []
...
0 ...
[1 rows x 3 columns]
Empty DataFrame
Columns: [Date, Interest Rate, URL]
Index: []
Empty DataFrame
Columns: [Date, Interest Rate, URL]
Index: []
Empty DataFrame
Columns: [Date, Interest Rate, URL]
Index: []
Empty DataFrame
Columns: [Date, Interest Rate, URL]
Index: []
回答1:
How about you just don't print/use the empty ones?
if df.empty:
continue
Or
if not df.empty:
print(df)
回答2:
if df.dropna(how='all').empty:
continue
as per https://pandas.pydata.org/pandas-docs/version/0.18/generated/pandas.Series.empty.html a df with only nans will return False for .empty so if that matters good to use dropna first. You can use 'any' if having any NaN is too much or 'all' if you only want to drop a row/column if its all NaNs (probably what you want)
来源:https://stackoverflow.com/questions/51052506/removing-empty-dataframes-with-pandas