I do a lot of data analysis in perl and I am trying to replicate this work in python using pandas, numpy, matplotlib, etc.
The general workflow goes as follows:
Hopefully this helps you get started?
import sys, os
def regex_match(line) :
return 'LOOPS' in line
my_hash = {}
for fd in os.listdir(sys.argv[1]) : # for each file in this directory
for line in open(sys.argv[1] + '/' + fd) : # get each line of the file
if regex_match(line) : # if its a line I want
line.rstrip('\n').split('\t') # get the data I want
my_hash[line[1]] = line[2] # store the data
for key in my_hash : # data science can go here?
do_something(key, my_hash[key] * 12)
# plots
p.s. make the first line
#!/usr/bin/python
(or whatever which python
returns ) to run as an executable