问题
I've been trying to resolve this issue for over a week and could really do with some help.
We are using a httprequest to post files to an api. Most files come out ok, but docx files end up corrupted.
After much research I'm pretty sure that I'm doing something wrong in the binary post that is adding extra data / bytes to the file.
Streams are being closed and I think
I've got the boundries and headers right....
Are there any obvious mistakes in the code below? Or would anybody be able to point me in the right direction for a fix. Why is extra data being added to this file? Are http headers the issue, or am I reading the stream incorrectly? What is the most likely cause of my woes?
(I have tried to examine the extra data in the docx file to find out where it's coming from. But I have been unable to do so. There are many docx repair tools out there, but none I've come across give information about the error, they just fix the file. I have tried the Open XML SDK 2.0 for Microsoft Office, but this won't open the corrupt file, so I can't compare it to a fixed one. )
Code:
Sub PostTheFile(CVFile, fullFilePath, PostToURL)
strBoundary = "---------------------------9849436581144108930470211272"
strRequestStart = "--" & strBoundary & vbCrlf &_
"Content-Disposition: attachment; name=""file""; filename=""" & CVFile & """" & vbcrlf & vbcrlf
strRequestEnd = vbCrLf & "--" & strBoundary & "--"
Set stream = Server.CreateObject("ADODB.Stream")
stream.Type = adTypeBinary
stream.Mode = adModeReadWrite
stream.Open
stream.Write StringToBinary(strRequestStart)
stream.Write ReadBinaryFile(fullFilePath)
stream.Write StringToBinary(strRequestEnd)
stream.Position = 0
BINARYPOST= stream.read
stream.Close
Set stream = Nothing
Set httpRequest = Server.CreateObject("MSXML2.ServerXMLHTTP.6.0")
httpRequest.Open "PATCH", PostToURL, False, "username", "pw"
httpRequest.setRequestHeader "Content-Type", "multipart/form-data; boundary=""" & strBoundary & """"
httpRequest.Send BINARYPOST
Response.write "httpRequest.status: " & httpRequest.status
Set httpRequest = Nothing
End Sub
Function StringToBinary(input)
dim stream
set stream = Server.CreateObject("ADODB.Stream")
stream.Charset = "UTF-8"
stream.Type = adTypeText
stream.Mode = adModeReadWrite
stream.Open
stream.WriteText input
stream.Position = 0
stream.Type = adTypeBinary
StringToBinary = stream.Read
stream.Close
set stream = Nothing
End Function
Function ReadBinaryFile(fullFilePath)
dim stream
set stream = Server.CreateObject("ADODB.Stream")
stream.Type = 1
stream.Open()
stream.LoadFromFile(fullFilePath)
ReadBinaryFile = stream.Read()
stream.Close
set stream = nothing
end function
Links to Files
Here are links to the files before and after going through the API. I kept them really simple.
http://fresherandprosper.com/cvsamples/testcv.corrupted.docx
http://fresherandprosper.com/cvsamples/testcv.notcorrupted.docx
Update
After Edi9999's fantastic help (see below) I thought my problems were over. All I had to do was figure out how I was generating the unwanted additional sequence in my code and remove it.
But I couldn't seem to nail WHAT to remove from my code. Nothing worked as expected.
Then I realised... each time I posted the file, the ending sequence came out slightly different.
0015 e88a 5060 0700 00da 3b00 000f 0000
0000 0000 0000 0000 0000 0060 1d00 0077
6f72 642f 7374 796c 6573 2e78 6d6c 504b
0506 0000 0000 0b00 0b00
And the exact same file, using the exact same code posted 30 seconds later:
0015 e88a 5060 0700 00da 3b00 000f 0000
0000 0000 0000 0000 0000 0060 1d00 0077
6f72 642f 7374 796c 6573 2e78 6d6c 504b
0506 0000 0000 0b00 0b00 c102 00
And again, a few minutes later:
0015 e88a 5060 0700 00da 3b00 000f 0000
0000 0000 0000 0000 0000 0060 1d00 0077
6f72 642f 7374 796c 6573 2e78 6d6c 504b
0506 0000 0000 0b00 0b00 c102 0000 ed24
Maybe this deserves a new question. But there's already about 6 relating to this issue so I'm reluctant to add yet another one.
回答1:
Here is what I tried to do with your docx:
- I opened them with word, the corrupted one was indeed corrupt
- I unzipped the files, they were fully identical
I watched at the size of the docx, it was different for the docx.
So I looked into the binary file: The beginning of the file is identical
504b 0304 1400 0600 0800 0000 2100 ddfc
9537 6601 0000 2005 0000 1300 0802 5b43
6f6e 7465 6e74 5f54 7970 6573 5d2e 786d
6c20 a204 0228 a000 0200 0000 0000 0000
But at then end:
Uncorrupted file
6f72 642f 7374 796c 6573 2e78 6d6c 504b
0506 0000 0000 0b00 0b00 c102 0000 ed24
0000 0000
Corrupted file
6f72 642f 7374 796c 6573 2e78 6d6c 504b
0506 0000 0000 0b00 0b00 c102 0000 ed24
0000 0000 0a2d 2d2d 2d2d 2d2d 2d2d
As you can see, they is a sequence: 0a2d 2d2d 2d2d 2d2d 2d2d
. The rest of the file is identical. And when I delete this sequence, the file is not corrupted any more.
Converted into ascii, 0a2d 2d2d 2d2d 2d2d 2d2d
is \n----
This is probably due to the strRequestEnd = vbCrLf & "--" & strBoundary & "--"
Howewer, as I don't really understand exactly what happens into your code, If you want more help, please explain more deeply this portion of code.
Hope this helps
来源:https://stackoverflow.com/questions/18243668/what-is-wrong-with-this-binary-file-transfer-corrupting-docx-files