Strange vanishing of CR in strings coming from a copy of a file's content passed to raw_input()

旧时模样 提交于 2019-12-25 19:40:37

问题


Trying to clear up the reasons of what seemed to be a bug, I finally bumped into a weird behaviour of the raw_input() function in Python 2.7:

it removes the CR characters of pairs CR LF from only the strings that result from a manual copy (via the clipboard) of a file's content. The strings passed to raw_input() that are copies of a display of identical strings than the former ones don't loose their CR characters. The alone CR chars remain untouched in all the cases. A CR (carriage return) is a \r character.

To be clearer than with a muddled description, here's a code describing what must be done to observe the fact, whose orders need only to be executed.

The point is in the Text object: it has 7 characters instead of the 8 that were passed to raw_input() to create Text.

To verifiy that the argument passed to raw_input() had really 8 characters, I created another file PASTED.txt with the same argument. It is indeed an awkward task to be sure of something in this problem, as the copying in a Notepad++ window showed me: all sorts of ends of lines (\r , \n , \r\n) appear as CR LF at the extremities of the lines in such a window.

Ctrl-A to select the whole data of a file is recommended.

I am in the perplexity of wondering if I did a mistake of coding or comprehension, or if it is a real feature of Python.

I hope commentaries and light from you.

with open('PRIM.txt','wb') as f:
    f.write('A\rB\nC\r\nD')
print "  1) A file with name 'PRIM.txt' has just been created with content A\\rB\\nC\\r\\nD"
raw_input("  Open this file and copy manually its CONTENT in the clipboard.\n"+\
          "    --when done, press Enter to continue-- ")


print "\n  2) Paste this CONTENT in a Notepad++ window "+\
      "     and see the symbols at the extremities of the lines."
raw_input("    --when done, press Enter to continue-- ")


Text = raw_input("\n  3) Paste this CONTENT here and press a key : ")
print ("     An object Text has just been created with this pasted value of CONTENT.")


with open('PASTED.txt','wb') as f:
    f.write('')
print "\n  4) An empty file 'PASTED.txt' has just been created."
print "     Paste manually in this file the PRIM's CONTENT and shut this file."
raw_input("     --when done, press Enter to continue-- ")


print "\n  5) Enter the copy of this display of A\\rB\\nC\\r\\nD : \nA\rB\nC\r\nD"
DSP = raw_input('please, enter it on the following line :\n')
print "    An object DSP has just been created with this pasted value of this copied display"


print '\n----------'
with open('PRIM.txt','rb') as fv:
    verif = fv.read()
print "The read content of the file 'PRIM.txt' obtained by open() and read() : "+repr(verif)
print "len of the read content of the file 'PRIM.txt'  ==",len(verif)


print '\n----------'
print "The file PASTED.txt received by pasting the manually copied CONTENT of PRIM.txt"
with open('PASTED.txt','rb') as f:
    cpd = f.read()
    print "The read content of the file 'PASTED.txt' obtained by open() and read() "+\
          "is now : "+repr(cpd)
    print "its len is==",len(cpd)


print '\n----------'
print 'The object Text received through raw_input() the manually copied CONTENT of PRIM.txt'
print "value of Text=="+repr(Text)+\
      "\nText.split('\\r\\n')==",Text.split('\r\n')
print 'len of Text==',len(Text)


print '\n----------'
print "The object DSP received  through raw_input() the copy of the display of A\\rB\\nC\\r\\nD" 
print "value of DSP==",repr(DSP)
print 'len of DSP==',len(DSP)

My OS is Windows. I wonder if the same is observed on other operating systems.


回答1:


sys.stdin is opened in text mode (you can check this by displaying sys.stdin.mode and seeing that it is 'r'). If you open any file in text mode in Python, then the platform native line ending (\r\n for Windows) will be converted to a simple line feed (\n) in the Python string.

You can see this in operation by opening your PASTED.txt file using mode 'r' instead of 'rb'.




回答2:


After my post, I could look up from my code, and I indeed noticed that the modification of data copied from a file and passed to raw_input() is the same as the modification of newlines that Python performs when it reads data directly in a file, which is evidenced here:

with open("TestWindows.txt", 'wb') as f:
    f.write("PACIFIC \r  ARCTIC \n  ATLANTIC \r\n  ")

print "\n- Following string have been written in TestWindows.txt in mode 'wb' :\n"+\
      "PACIFIC \\r  ARCTIC \\n  ATLANTIC \\r\\n  "


print "\n- data got by reading the file TestWindows.txt in 'rb' mode :"
with open("TestWindows.txt", 'rb') as f:
    print "    repr(data)==",repr(f.read())

print "\n- data got by reading the file TestWindows.txt in 'r' mode :"
with open("TestWindows.txt", 'r') as f:
    print "    repr(data)==",repr(f.read())

print "\n- data got by reading the file TestWindows.txt in 'rU' mode :"
with open("TestWindows.txt", 'rU') as f:
    print "    repr(data)==",repr(f.read())

result:

- Following string have been written in TestWindows.txt in mode 'wb' :
PACIFIC \r  ARCTIC \n  ATLANTIC \r\n  

- data got by reading the file TestWindows.txt in 'rb' mode :
    repr(data)== 'PACIFIC \r  ARCTIC \n  ATLANTIC \r\n  '

- data got by reading the file TestWindows.txt in 'r' mode :
    repr(data)== 'PACIFIC \r  ARCTIC \n  ATLANTIC \n  '

- data got by reading the file TestWindows.txt in 'rU' mode :
    repr(data)== 'PACIFIC \n  ARCTIC \n  ATLANTIC \n  '

First, the file PASTED.txt has the same content as the file PRIM.txt, resulting from copying PRIM.txt's content and pasting it in PASTED.txt without transiting in a Python string. So, when data goes from a file to another file transiting only by clipboard, it isn't modified. This fact proves that the content of PRIM.txt stands uncorrupted in the clipboard where the copying put the data.

Secondly, data going from a file to a Python string via clipboard and raw_input() is modified; hence the modification takes place between the clipboard and the Python string. So I thought that raw_input() might do the same interpretation of data received from the clipboard than the Python interpreter does when it receives data from a reading of file.

Then, I embroidered on the idea that the replacement of \r\n with \n is due to the fact that a data of "Windows nature" becomes a data of "Python nature" and that a clipboard doesn't introduce a modification in data because it is a part under control of the Windows operating system.

Alas, the fact that data copied from the screen and passed to raw_input() doesn't undergo transformation of the newlines \r\n , despite the fact that this data transits through Windows's clipboard, breaks my tiny concept.

Then I thought that Python knows the nature of a data not because of its source, but because of information contained in the data; such information is a 'format'. I found the following page concerning Windows's clipboard and there are indeed several formats for the information recorded by a clipboard:

http://msdn.microsoft.com/en-us/library/ms648709(v=vs.85).aspx

Maybe, the explanation of the modification of \r\n by Python is linked to these formats existing in clipboard and maybe not. But I don't understand enough all this mess and I am far to be sure.

Is anybody able to explain all the above observations ?

.

.

Thank you for your answer, ncoghlan. But I don't think it's the reason:

  • sys.stdin has no attribute mode

  • sys.stdin refers to the keyboard, as far as I undesrtand. However, in my code, data doesn't come from a typing on the keyboard but from a pasting via the clipboard. It's different.

The key point is that I don't understand how the Python interpeter could differentiates a data coming from clipboard having been copied from a file and a data coming from clipboard having been copied from the screen



来源:https://stackoverflow.com/questions/5060220/strange-vanishing-of-cr-in-strings-coming-from-a-copy-of-a-files-content-passed

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!