Match Numbers in Two Files & Get Results: Results of Comparing .lw & .pw Files

In summary: I have two files, .lw file contains data as59880 SPC X2d12G 4714 UNK X 900Band .pw file has59474 SPC X2c8bG 991 ILE A 118B59726 SPC X2cdfG 1803 SER A 168B59876 SPC X2d11G 4055 ASP A 356B59879 SPC X2d12G 3849 ASN A 344BThe output should be:431-hydrogen-bond-frame.dat.c.d.pw [(4714, 'UNK', 'X 900B', 59880, 'SPC
  • #1
Bala06
10
0
Dear Members

I would like to match numbers in two files of extensions .lw & .pw and put the results according to matching numbers.

For example, the .lw file contains data as

59880 SPC X2d12G 4714 UNK X 900B

and .pw file has

59474 SPC X2c8bG 991 ILE A 118B
59726 SPC X2cdfG 1803 SER A 168B
59876 SPC X2d11G 4055 ASP A 356B
59879 SPC X2d12G 3849 ASN A 344B

I want to match according to this number "X2d12G" and put in output as

For example like this (result):
431-hydrogen-bond-frame.dat.c.d.pw [(4714, 'UNK', 'X 900B', 59880, 'SPC', 'X2d12G', 59879, 'SPC', 'X2d12G', 4186, 'ASN', 'A 344B')]
453-hydrogen-bond-frame.dat.c.d.pw [(4714, 'UNK', 'X 900B', 59880, 'SPC', 'X2d12G', 59879, 'SPC', 'X2d12G', 4186, 'ASN', 'A 344B')]

Since the attachments is limited, I couldn't attached thos file 453-hydrogen-bond-frame_lw.txt. It also contains the same data as 431-hydrogen-bond-frame_lw.txt.

When ever I run my python code, I'm not getting the result as expected.

I'm running python script as (python water_cont.py > summary.txt)

I 'm posting the python code for your reference.

Code:
#! /usr/bin/env python
import sys, os, math, glob
#  Run as:  python water_cont.py 
#  This script will provide the summary of waters along the trajectory.  
#  Use it after the run of python_water_cont.py
#  Bala 28 May. 2011
#

def read_lig_wat(file):
    file = open (file, "r")
    data=file.readlines()
    atom_number1 = map(lambda x: int(x[0:7]), data)
    resname1 = map(lambda x: x[10:13], data)
    res_number1=map(lambda x: x[17:23], data)
    atom_number2=map(lambda x: int(x[27:36]), data)
    resname2=map(lambda x: x[37:40], data)
    res_number2= map(lambda x: x[44:50], data)
    return atom_number1, resname1, res_number1, atom_number2, resname2, res_number2 def read_prot_wat(file1):
    file1 = open (file, "r")
    data1=file.readlines()
    atom_number11 = map(lambda x: int(x[0:7]), data)
    resname11 = map(lambda x: x[10:13], data)
    res_number11=map(lambda x: x[17:23], data)
    atom_number22=map(lambda x: int(x[27:36]), data)
    resname22=map(lambda x: x[37:40], data)
    res_number22= map(lambda x: x[44:50], data)
    return atom_number11, resname11, res_number11, atom_number22, resname22, res_number22 

 
for filename in glob.glob1("/home/water", "*.lw"):
   atom_number1, resname1, res_number1, atom_number2, resname2, res_number2 =read_lig_wat(filename)
#   column_file=summary+".lw"
#   file2=open( column_file, "w")
   text=len(atom_number1)

for filename in glob.glob1("/home/water", "*.pw"):
   atom_number11, resname11, res_number11, atom_number22, resname22, res_number22 =read_lig_wat(filename)
#   column_file=filename+".lw"
#   file2=open( column_file, "w")
   text1=len(atom_number11)

   List=[]

   for i in range(text):
      for j in range(text1):
          
#         print  res_number2[i], res_number22[j]
#         if res_number2[i]==res_number11[j]:
#             print res_number1[i], res_number2[i]
         if res_number1[i]==res_number11[j] or res_number1[i]==res_number22[j]\
            or res_number2[i]==res_number11[j] or res_number2[i]==res_number22[j]:
#            print atom_number1[i], resname1[i], res_number1[i], atom_number2[i], resname2[i], res_number2[i]

            List.append((atom_number1[i], resname1[i], res_number1[i], atom_number2[i], resname2[i], res_number2[i], atom_number11[j], resname11[j], res_number11[j], atom_number22[j], resname22[j], res_number22[j]))
#            print List
            print filename, List

#             file2.write("%5i%8s%11s%8i%8s%11s%5i%8s%11s%8i%8s%11s \n" % (atom_number1[i], resname1[i], res_number1[i], atom_number2[i], resname2[i], res_number2[i], atom_number11[i], resname11[i], res_number11[i], atom_number22[i], resname22[i], res_number22[i]  ))

Kindly advice.

Many Thanks
Balaji
 

Attachments

  • 431-hydrogen-bond-frame_lw.txt
    53 bytes · Views: 467
  • 431-hydrogen-bond-frame_pw.txt
    21.8 KB · Views: 585
  • 453-hydrogen-bond-frame_pw.txt
    23.2 KB · Views: 464
Last edited:
Technology news on Phys.org
  • #2
First off: I think using slices in this way is a very bad idea and not the way most python programmers would do it. A more sensible thing would be rather than expecting fixed-width character fields (!) to do the readlines, then for each line do a split() on each line, then this will return a list of the whitespace-delimited tokens in that line. For starters, what if ONE LINE in your file is deformed and has, say, one whitespace character too many? Second off, if I try to run the program "in my head" (haven't tried to run it on disk yet) the very first thing I notice is your sample inputs begin with a five-character ID, yet when you parse the files you first attempt to grab a seven-character token from the beginning of the string. You're sure this is correct?

Second off, even if you were to use slices in this way, I think there is something bad about your repeated "map lambda" construction. A rule of thumb: if you find yourself repeating yourself in a computer program, this is a good place to. I would be very uncomfortable if this were my program until I took that repeated map lambda construction into a separate slice_out_field(0,7, data) method. The problem is you've copy and pasted this so many times, what if there was an error in one of your copy and pastes? It would be very easy to overlook.

Third off-- you say "I'm not getting the result as expected". What result are you getting instead?

I think you should start by just rewriting this to use a more conventional text parsing method like split(), or even better a regular expression (these are easy to use in Python and well fit to your problem). You have what looks to me like error-prone code and you are trying to chase a mysterious error in it... cleaning things up is a good first step. It is probably fixable as is though if you give us some more information (what is it doing now instead of working, why is it 0:7 then 10:13 and not 0:5 and then 6:9).
 
  • #3
Dear Python Users

I have to map the text "X2d12G"in two files .lw and .pw and draw output as the contents of both the files.

The small correction in the .lw file. It should be like this:
" 4714 UNK X 900B 59880 SPC X2d12G"

In the attachment the format was not correct.

Say for example like this

431-hydrogen-bond-frame.dat.c.d.pw [(4714, 'UNK', 'X 900B', 59880, 'SPC', 'X2d12G', 59879, 'SPC', 'X2d12G', 4186, 'ASN', 'A 344B')]

Now, by running the script it doesn't produces any output for me (output file size is 0kb).

Kindly advice

Many Thanks
Balaji
 
Last edited:

Related to Match Numbers in Two Files & Get Results: Results of Comparing .lw & .pw Files

1. How do I compare numbers in two files?

To compare numbers in two files, you can use a programming language such as Python or a specialized software like Microsoft Excel. These tools allow you to read the numbers from both files, compare them, and generate results based on the comparison.

2. Can I compare different types of files?

Yes, you can compare different types of files as long as they contain numbers or data that can be converted into numbers. For example, you can compare .lw and .pw files, as well as .csv and .txt files. The key is to have a way to extract the numbers from the files and compare them.

3. What do the results of comparing two files mean?

The results of comparing two files indicate the similarities and differences between the numbers in each file. Typically, the results will show the number of matched numbers, the number of unique numbers in each file, and the difference between the total numbers in each file.

4. How accurate is the comparison of numbers in two files?

The accuracy of the comparison depends on the method used and the quality of the data in the files. If the files are formatted properly and the numbers are extracted accurately, the comparison should be highly accurate. However, some factors such as rounding errors or missing data can affect the accuracy of the results.

5. What can I do with the results of comparing two files?

The results of comparing two files can be used for various purposes, such as identifying data discrepancies, finding missing or duplicate data, or verifying data integrity. You can also use the results to make data-driven decisions or to improve the quality of your data by identifying and correcting errors.

Similar threads

  • Engineering and Comp Sci Homework Help
Replies
1
Views
2K
Back
Top