问题
I have a .txt dataset like this:
user_000044 2009-04-24 13:47:07 Spandau Ballet Through The Barricades
I have to read the last two colums, Spandau Ballet as unique and Through the Barricades as unique. How can I do this?
In need to create two array, artists =[]
and tracks = []
in which I put data in a loop, but I can't define the portion of text in a line.
Someone can help me?
回答1:
You are probably better off with using the pandas
-module to load the content of the .txt
into a pandas DataFrame
and proceed from there. If you're unfamiliar with it...a
DataFrame
is as close to an Excelsheet as it can get with Python. pandas
will handle reading the lines for you so you don't have to write your own loop.
Assuming your textfile is four column, tab-separated, this would look like:
# IPython for demo:
import pandas as pd
df = pd.read_csv('ballet.txt', sep='\t', header=None, names=['artists', 'tracks'], usecols=[2, 3])
# usecols here limits the Dataframe to only consist the 3rd and 4th column of your .txt
Your DataFrame then could look like:
df
# Out:
artists tracks
0 Spandau Ballet Through The Barricades
1 Berlin Ballet Swan Lake
Access single columns by column names:
df.artists # or by their index e.g. df.iloc[:, 0]
# Out:
0 Spandau Ballet
1 Berlin Ballet
Name: 2, dtype: object
You can still put the data into an array at this point, but I can't think of a reason you really wanna do this if you're aware of the alternatives.
回答2:
If the columns in your file are separated by tabulations, you can use np.loadtxt
(NumPy function) following
artists, tracks = np.loadtxt("myfile.txt", delimiter = "\t", dtype = str, usecols = [ 3, 4 ], unpack = True)
This will output a NumPy array. Optionally, you can convert these arrays into conventional Python lists of strings following
artists = [ str(s) for s in artists ]
tracks = [ str(s) for s in tracks ]
回答3:
An option using python and no third-party packages:
data = open('dataset.txt', 'r').readlines()
artists = []
tracks = []
for line in data:
artist, track = line.split(' '*2)[-2::]
artists.append(artist.strip())
tracks.append(track.strip())
print artists
print tracks
output:
['Spandau Ballet']
['Through The Barricades']
[-2::]
gets the last 2 columns in each line, adjust to get other columns if needed.
来源:https://stackoverflow.com/questions/51431017/how-can-i-read-specific-colums-from-a-txt-file-in-python