sorting numerically by first row

瘦欲@ 提交于 2019-12-02 11:10:51

Use this to convert old Mac OS carriage return to newline:

tr '\r' '\n' < myfile.txt | sort
Jean-François Fabre

As stated here you can have problems with this (and in the other pseudo-follow-up-duplicate question you asked, yes, you did)

tr '\r' '\n' < myfile.txt | sort -n

It works fine here on MSYS but on some platforms you may have to add:

export LC_CTYPE=C

or tr will consider the file as a text file, and probably will tag it as corrupt after having reached the max line limit.

Obviously I could not test it, but I'm confident it will solve the problem given what I read on the linked answer.

A python approach (python 2 & 3 compatible), immune to all shell problems. Works great, and portable. I noticed that the input file has some '0x8C' chars (exotic dots), probably confusing tr command. That is handled properly below:

import csv,sys

# read the file as binary, as it is not really text
with open("Proteins.txt","rb") as f:
    data = bytearray(f.read())
    # replace 0x8c char by classical dots
    for i,c in enumerate(data):
        if c>0x7F: # non-ascii: replace by dot
            data[i] = ord(".")

    # convert to list of ASCII strings (split using the old MAC separator)
    lines = "".join(map(chr,data)).split("\r")

    # treat our lines as input for CSV reader
    cr = csv.reader(lines,delimiter='\t',quotechar='"')

    # read all the lines in a list    
    rows = list(cr)
    # perform the sort (tricky)
    # on first row, numerical, removing the leading 0 which is illegal
    # in python 3, and if not numerical, put it at the top

    rows = sorted(rows,key=lambda x : x[0].isdigit() and int(x[0].strip("0")))

# write back the file as a nice, legal, ASCII tsv file

if sys.version_info < (3,):
    f = open("Proteins_sorted_2.txt","wb")
else:
    f = open("Proteins_sorted_2.txt","w",newline='')

cw = csv.writer(f,delimiter='\t',quotechar='"')
cw.writerows(rows)
f.close()
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!