On the basis of list as below, I have to create a DataFrame with \"state\" and \"region\" columns:
Original data:
Alabama[edit]
Auburn (Auburn Universi
Shortest version I could think of:
import pandas as pd
lst = list()
with open('university_towns.txt', 'r', newline='\n') as infile:
for line in infile.readlines():
if '[edit]' in line:
state = line.split('[')[0]
else:
lst.append([state, line.split(' ')[0]])
df = pd.DataFrame(lst, columns=['State', 'RegionName'])
print(df)
Produces on my machine (Python 3.6):
State RegionName
0 Alabama Auburn
1 Alabama Florence
2 Alabama Jacksonville
3 Alabama Livingston
4 Alabama Montevallo
5 Alabama Troy
6 Alabama Tuscaloosa
7 Alabama Tuskegee
8 Alaska Fairbanks
9 Arizona Flagstaff
10 Arizona Tempe