How to compare two txt files and then apply changes in one of them

折月煮酒 提交于 2019-12-11 19:08:18

问题


I am trying to merge two text (PDB) files. One (bigger one) contains full set of data describing the protein, second one contains very small set of data changing just small part (set of coordinates).

Example:

Basic file (part):

ATOM    605  CD2 LEU A  92      11.727  14.051  55.011  1.00 75.51      4pxz C  
ATOM    606  N   ARG A  93      10.555  10.636  58.260  1.00 62.79      4pxz N  
ATOM    607  CA  ARG A  93      11.357   9.429  58.493  1.00 59.89      4pxz C  
ATOM    608  C   ARG A  93      10.429   8.207  58.562  1.00 62.83      4pxz C  
ATOM    609  O   ARG A  93      10.760   7.168  57.994  1.00 61.39      4pxz O  
ATOM    610  CB  ARG A  93      12.236   9.564  59.757  1.00 58.23      4pxz C  
ATOM    611  CG  ARG A  93      13.088   8.333  60.120  1.00 60.51      4pxz C  
ATOM    612  CD  ARG A  93      13.985   7.822  58.995  1.00 61.21      4pxz C  
ATOM    613  NE  ARG A  93      14.503   6.485  59.295  1.00 60.36      4pxz N  
ATOM    614  CZ  ARG A  93      15.012   5.642  58.400  1.00 66.21      4pxz C  
ATOM    615  NH1 ARG A  93      15.074   5.979  57.116  1.00 52.54      4pxz N  
ATOM    616  NH2 ARG A  93      15.455   4.453  58.780  1.00 48.93      4pxz N  
ATOM    617  N   THR A  94       9.247   8.357  59.192  1.00 60.68      4pxz N  
ATOM    618  CA  THR A  94       8.227   7.305  59.271  1.00 59.92      4pxz C

Auxillary file (with set of coordinates to be substituted):

ATOM     39  CA  ARG A  93      11.357   9.429  58.493  1.00 59.89      hatp C  
ATOM     40  CB  ARG A  93      12.236   9.564  59.757  1.00 58.23      hatp C  
ATOM     41  CG  ARG A  93      11.569   9.166  61.087  1.00 60.51      hatp C  
ATOM     42  CD  ARG A  93      12.319   8.102  61.886  1.00 61.21      hatp C  
ATOM     43  NE  ARG A  93      11.978   6.754  61.425  1.00 60.36      hatp N  
ATOM     44  CZ  ARG A  93      11.731   5.714  62.217  1.00 66.21      hatp C  
ATOM     45  NH2 ARG A  93      11.430   4.535  61.694  1.00 48.93      hatp N  
ATOM     46  NH1 ARG A  93      11.793   5.843  63.538  1.00 52.54      hatp N  

Expected outcome: -> Changed coordinates <-

ATOM    604  CD1 LEU A  92       9.685  13.033  54.000  1.00 73.10      4pxz C
ATOM    605  CD2 LEU A  92      11.727  14.051  55.011  1.00 75.51      4pxz C
ATOM    606  N   ARG A  93      10.555  10.636  58.260  1.00 62.79      4pxz N
ATOM    607  CA  ARG A  93   -> 11.357   9.429  58.493<- 1.00 59.89      4pxz C
ATOM    608  C   ARG A  93      10.429   8.207  58.562  1.00 62.83      4pxz C
ATOM    609  O   ARG A  93      10.760   7.168  57.994  1.00 61.39      4pxz O
ATOM    610  CB  ARG A  93   -> 12.236   9.564  59.757<- 1.00 58.23      4pxz C
ATOM    611  CG  ARG A  93   -> 11.569   9.166  61.087<- 1.00 60.51      4pxz C
ATOM    612  CD  ARG A  93   -> 12.319   8.102  61.886<- 1.00 61.21      4pxz C
ATOM    613  NE  ARG A  93   -> 11.978   6.754  61.425<- 1.00 60.36      4pxz N
ATOM    614  CZ  ARG A  93   -> 11.731   5.714  62.217<- 1.00 66.21      4pxz C
ATOM    615  NH1 ARG A  93   -> 11.793   5.843  63.538<- 1.00 52.54      4pxz N
ATOM    616  NH2 ARG A  93   -> 11.430   4.535  61.694<- 1.00 48.93      4pxz N
ATOM    617  N   THR A  94       9.247   8.357  59.192  1.00 60.68      4pxz N
ATOM    618  CA  THR A  94       8.227   7.305  59.271  1.00 59.92      4pxz C

I tried doing so by means of:

  • Building a lists and appending every line as single entry for both files

  • Extracting atom type, name of residue, chain and residue number from both files (eg. CD1 LEU A 92, respecitvely) and appending to another lists

  • Comparing extract lists

  • Writing a file containing mixed lists from point 1, accordingly to point 3.

Code:

import re

aminoacid_pattern = re.compile(r"\w.{2,3}.\b(\w[A-Z]\w*)\b\s.\s\d+")
coords_pattern = re.compile(r"\w.{2,3}.\b(\w[A-Z]\w*)\b\s.\s\d+")

class fileSaver:
    protein = "4pxzclean.pdb"
    flexres = "ARGA93.pdb.tmp"
    def __init__(self):
        pass

    def aminoacid_to_substitute(self, flexres, data = []):

        with open(flexres, 'r') as flex:
                for line in flex:
                    if aminoacid_pattern != None:
                        data.append(line)
        return data

    def parse_rigid(self, rigidprot, test = []):

        with open(rigidprot, 'r') as rigid:
            for line in rigid:
                if aminoacid_pattern != None:
                    test.append(line)
        return test

class fileComparer:
    def __init__(self):
        pass

    def compare_data(self, data_flex, data_rigid, cleanflex = [], cleanrigid = []):

        for el in data_flex:
            if aminoacid_pattern != None:
                cleanflex.append(re.findall(r".\w\s+\w.{2,3}\s\w\s*\d{2,3}",str(el)))


        for el in data_rigid:
            if aminoacid_pattern != None:
                cleanrigid.append(re.findall(r".\w\s+\w.{2,3}\s\w\s*\d{2,3}",str(el)))

        with open("test.txt", 'a+') as test:
            for rig_el in data_rigid:
                for flex_el in data_flex:
                    for rg_el in cleanrigid:
                        if rg_el not in cleanflex:
                            test.write(rig_el)
                        if rg_el in cleanflex:
                            test.write(flex_el)



if __name__ == '__main__':
    initialize = fileSaver()
    flex = initialize.aminoacid_to_substitute("ARGA93.pdb.tmp")
    rigid = initialize.parse_rigid("4pxzclean.pdb")
    comparer = fileComparer()
    comparer.compare_data(flex,rigid)

Unfortunately, it gives infinitely long file, without any changed line. Could you tell me where is a mistake?

来源:https://stackoverflow.com/questions/58390928/how-to-compare-two-txt-files-and-then-apply-changes-in-one-of-them

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!