问题
I am trying to merge two text (PDB) files. One (bigger one) contains full set of data describing the protein, second one contains very small set of data changing just small part (set of coordinates).
Example:
Basic file (part):
ATOM 605 CD2 LEU A 92 11.727 14.051 55.011 1.00 75.51 4pxz C
ATOM 606 N ARG A 93 10.555 10.636 58.260 1.00 62.79 4pxz N
ATOM 607 CA ARG A 93 11.357 9.429 58.493 1.00 59.89 4pxz C
ATOM 608 C ARG A 93 10.429 8.207 58.562 1.00 62.83 4pxz C
ATOM 609 O ARG A 93 10.760 7.168 57.994 1.00 61.39 4pxz O
ATOM 610 CB ARG A 93 12.236 9.564 59.757 1.00 58.23 4pxz C
ATOM 611 CG ARG A 93 13.088 8.333 60.120 1.00 60.51 4pxz C
ATOM 612 CD ARG A 93 13.985 7.822 58.995 1.00 61.21 4pxz C
ATOM 613 NE ARG A 93 14.503 6.485 59.295 1.00 60.36 4pxz N
ATOM 614 CZ ARG A 93 15.012 5.642 58.400 1.00 66.21 4pxz C
ATOM 615 NH1 ARG A 93 15.074 5.979 57.116 1.00 52.54 4pxz N
ATOM 616 NH2 ARG A 93 15.455 4.453 58.780 1.00 48.93 4pxz N
ATOM 617 N THR A 94 9.247 8.357 59.192 1.00 60.68 4pxz N
ATOM 618 CA THR A 94 8.227 7.305 59.271 1.00 59.92 4pxz C
Auxillary file (with set of coordinates to be substituted):
ATOM 39 CA ARG A 93 11.357 9.429 58.493 1.00 59.89 hatp C
ATOM 40 CB ARG A 93 12.236 9.564 59.757 1.00 58.23 hatp C
ATOM 41 CG ARG A 93 11.569 9.166 61.087 1.00 60.51 hatp C
ATOM 42 CD ARG A 93 12.319 8.102 61.886 1.00 61.21 hatp C
ATOM 43 NE ARG A 93 11.978 6.754 61.425 1.00 60.36 hatp N
ATOM 44 CZ ARG A 93 11.731 5.714 62.217 1.00 66.21 hatp C
ATOM 45 NH2 ARG A 93 11.430 4.535 61.694 1.00 48.93 hatp N
ATOM 46 NH1 ARG A 93 11.793 5.843 63.538 1.00 52.54 hatp N
Expected outcome: -> Changed coordinates <-
ATOM 604 CD1 LEU A 92 9.685 13.033 54.000 1.00 73.10 4pxz C
ATOM 605 CD2 LEU A 92 11.727 14.051 55.011 1.00 75.51 4pxz C
ATOM 606 N ARG A 93 10.555 10.636 58.260 1.00 62.79 4pxz N
ATOM 607 CA ARG A 93 -> 11.357 9.429 58.493<- 1.00 59.89 4pxz C
ATOM 608 C ARG A 93 10.429 8.207 58.562 1.00 62.83 4pxz C
ATOM 609 O ARG A 93 10.760 7.168 57.994 1.00 61.39 4pxz O
ATOM 610 CB ARG A 93 -> 12.236 9.564 59.757<- 1.00 58.23 4pxz C
ATOM 611 CG ARG A 93 -> 11.569 9.166 61.087<- 1.00 60.51 4pxz C
ATOM 612 CD ARG A 93 -> 12.319 8.102 61.886<- 1.00 61.21 4pxz C
ATOM 613 NE ARG A 93 -> 11.978 6.754 61.425<- 1.00 60.36 4pxz N
ATOM 614 CZ ARG A 93 -> 11.731 5.714 62.217<- 1.00 66.21 4pxz C
ATOM 615 NH1 ARG A 93 -> 11.793 5.843 63.538<- 1.00 52.54 4pxz N
ATOM 616 NH2 ARG A 93 -> 11.430 4.535 61.694<- 1.00 48.93 4pxz N
ATOM 617 N THR A 94 9.247 8.357 59.192 1.00 60.68 4pxz N
ATOM 618 CA THR A 94 8.227 7.305 59.271 1.00 59.92 4pxz C
I tried doing so by means of:
Building a lists and appending every line as single entry for both files
Extracting atom type, name of residue, chain and residue number from both files (eg. CD1 LEU A 92, respecitvely) and appending to another lists
Comparing extract lists
Writing a file containing mixed lists from point 1, accordingly to point 3.
Code:
import re
aminoacid_pattern = re.compile(r"\w.{2,3}.\b(\w[A-Z]\w*)\b\s.\s\d+")
coords_pattern = re.compile(r"\w.{2,3}.\b(\w[A-Z]\w*)\b\s.\s\d+")
class fileSaver:
protein = "4pxzclean.pdb"
flexres = "ARGA93.pdb.tmp"
def __init__(self):
pass
def aminoacid_to_substitute(self, flexres, data = []):
with open(flexres, 'r') as flex:
for line in flex:
if aminoacid_pattern != None:
data.append(line)
return data
def parse_rigid(self, rigidprot, test = []):
with open(rigidprot, 'r') as rigid:
for line in rigid:
if aminoacid_pattern != None:
test.append(line)
return test
class fileComparer:
def __init__(self):
pass
def compare_data(self, data_flex, data_rigid, cleanflex = [], cleanrigid = []):
for el in data_flex:
if aminoacid_pattern != None:
cleanflex.append(re.findall(r".\w\s+\w.{2,3}\s\w\s*\d{2,3}",str(el)))
for el in data_rigid:
if aminoacid_pattern != None:
cleanrigid.append(re.findall(r".\w\s+\w.{2,3}\s\w\s*\d{2,3}",str(el)))
with open("test.txt", 'a+') as test:
for rig_el in data_rigid:
for flex_el in data_flex:
for rg_el in cleanrigid:
if rg_el not in cleanflex:
test.write(rig_el)
if rg_el in cleanflex:
test.write(flex_el)
if __name__ == '__main__':
initialize = fileSaver()
flex = initialize.aminoacid_to_substitute("ARGA93.pdb.tmp")
rigid = initialize.parse_rigid("4pxzclean.pdb")
comparer = fileComparer()
comparer.compare_data(flex,rigid)
Unfortunately, it gives infinitely long file, without any changed line. Could you tell me where is a mistake?
来源:https://stackoverflow.com/questions/58390928/how-to-compare-two-txt-files-and-then-apply-changes-in-one-of-them