Select records based on the specific index string value and then remove subsequent fields by python

主宰稳场 提交于 2019-12-11 20:06:10

问题


I have a .csv file named file01.csv that contains many records. Some records are required and some are not. I find that the required records has a string variable “Mi”, but it is not exist into the unnecessary records. So, I want to select the required records based on string value “Mi” in the field for every records.

Finally I want to delete the subsequent fields of each record from the field that contains value “Mi”. Any suggestion and advice is appreciated.

Optional:

  1. In addition, I want to delete the first column.

  2. Split column BB into two column named as a_id, and c_id. Separate the value by _ (underscore) and left side will go to a_id, and right side will go to c_id.

My fileO.csv is as follows:

   AA      BB       CC       DD     EE      FF    GG
   1       1_1.csv  (=0      =10"   27"     =57   "Mi"
   0.97    0.9      0.8      NaN    0.9     od    0.2
   2       1_3.csv  (=0      =10"   27"     "Mi"  0.5
   0.97    0.5      0.8      NaN    0.9     od    0.4
   3       1_6.csv  (=0      =10"   "Mi"     =53  cnt
   0.97    0.9      0.8      NaN    0.9     od    0.6
   4       2_6.csv  No Bi    000    000     000   000
   5       2_8.csv  No Bi    000    000     000   000
   6       6_9.csv  less     000    000     000   000
   7       7_9.csv  s(=0     =26"   =46"    "Mi"  121     

My Expected results files (outFile.csv):

a_id    b_id    CC    DD    EE    FF    GG             
1       1       0     10    27    57              
1       3       0     10    27
1       6       0     10 
7       9       0     26    46  

回答1:


The following approach should work fine using Python csv module:

import csv
import re
import string

output_header = ['a_id', 'b_id', 'CC', 'DD', 'EE', 'FF', 'GG']

sanitise_table = string.maketrans("","")
nodigits_table = sanitise_table.translate(sanitise_table, string.digits)

def find_mi(row):
    for index, col in enumerate(row):
        if col.find('Mi') != -1:
            return index
    return -1

def sanitise_cell(cell):
    return cell.translate(sanitise_table, nodigits_table)       # Keep digits

f_input = open('fileO.csv', 'rb')
f_output = open('outFile.csv', 'wb')

csv_input = csv.reader(f_input)
csv_output = csv.writer(f_output)

input_header = next(f_input)
csv_output.writerow(output_header)

for row in csv_input:
    #print '%2d  %s' % (len(row), row)  
    if len(row) >= 2:
        bb = re.match(r'(\d+)__(\d+).0\.csv', row[1])
        mi = find_mi(row)

        if bb and mi != -1:
            row[:] = row[:mi] + [''] * (len(row) - mi)
            row[:] = [sanitise_cell(col) for col in row]
            row[0] = bb.group(1)
            row[1] = bb.group(2)
            csv_output.writerow(row)

f_input.close()
f_output.close()

outFile.csv will contain the following:

a_id,b_id,CC,DD,EE,FF,GG
1,1,0,10,27,57,
1,3,0,10,27,,
1,6,0,10,,,
7,9,0,26,46,,

Tested using Python 2.6.6



来源:https://stackoverflow.com/questions/34176960/select-records-based-on-the-specific-index-string-value-and-then-remove-subseque

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!