Exact same file and code. So why does the binary of my docx file always end differently?

∥☆過路亽.° 提交于 2019-12-24 14:21:52

问题


We take a (non-corrupted) .docx file from our server and post it via httprequest to an API. When downloading it from the API it comes out corrupted. I 99% sure that this is down to the code that posts the file, not the API.

It turns out the corrupted file had some extra characters in the binary - I thought it would be pretty easy to find out where they came from and remove them. Boy was I wrong.

I've since realised that every time we post the file, the binary ending is slightly different. We're using the exact same file, using the exact same code.

What could account for this difference?

Example Binary Endings

0015 e88a 5060 0700 00da 3b00 000f 0000
0000 0000 0000 0000 0000 0060 1d00 0077
6f72 642f 7374 796c 6573 2e78 6d6c 504b
0506 0000 0000 0b00 0b00 

30 seconds later:

0015 e88a 5060 0700 00da 3b00 000f 0000
0000 0000 0000 0000 0000 0060 1d00 0077
6f72 642f 7374 796c 6573 2e78 6d6c 504b
0506 0000 0000 0b00 0b00 c102 00

Another 30 seconds later:

0015 e88a 5060 0700 00da 3b00 000f 0000
0000 0000 0000 0000 0000 0060 1d00 0077
6f72 642f 7374 796c 6573 2e78 6d6c 504b
0506 0000 0000 0b00 0b00 c102 0000 ed24

Posting Code

Sub PostTheFile(CVFile, fullFilePath, PostToURL)

    strBoundary = "---------------------------9849436581144108930470211272"
    strRequestStart = "--" & strBoundary & vbCrlf &_
        "Content-Disposition: attachment; name=""file""; filename=""" & CVFile & """" & vbcrlf & vbcrlf
    strRequestEnd = vbCrLf & "--" & strBoundary & "--" 

    Set stream = Server.CreateObject("ADODB.Stream")
        stream.Type = adTypeBinary 
        stream.Mode = adModeReadWrite     
        stream.Open
        stream.Write StringToBinary(strRequestStart)
        stream.Write ReadBinaryFile(fullFilePath)
        stream.Write StringToBinary(strRequestEnd)
        stream.Position = 0
        BINARYPOST= stream.read
        stream.Close

    Set stream = Nothing    

    Set httpRequest = Server.CreateObject("MSXML2.ServerXMLHTTP.6.0")
        httpRequest.Open "PATCH", PostToURL, False, "username", "pw"
        httpRequest.setRequestHeader "Content-Type", "multipart/form-data; boundary=""" & strBoundary & """"
        httpRequest.Send BINARYPOST
        Response.write "httpRequest.status: " & httpRequest.status 
    Set httpRequest = Nothing   
End Sub


Function StringToBinary(input)
    dim stream
    set stream = Server.CreateObject("ADODB.Stream")
        stream.Charset = "UTF-8"
        stream.Type = adTypeText 
        stream.Mode = adModeReadWrite 
        stream.Open
        stream.WriteText input
        stream.Position = 0
        stream.Type = adTypeBinary 
        StringToBinary = stream.Read
        stream.Close
    set stream = Nothing
End Function

Function ReadBinaryFile(fullFilePath) 
    dim stream
    set stream = Server.CreateObject("ADODB.Stream")
        stream.Type = 1
        stream.Open()
        stream.LoadFromFile(fullFilePath)
        ReadBinaryFile = stream.Read()
        stream.Close
    set stream = nothing
end function 

Update

We played with a few different boundaries and Charsets.

There was some additional BOM stuff going on with UTF-8.

http://wikipedia.org/wiki/Byte_order_mark‎

Now the issue is clearly the addition of (a seemingly random amount of) NULL / zero padding.

E.g. The first time it adds 13 sets of "00". Hit refresh and the second time it will add 8. A third time it adds 7. Each time with the exact same file and code.

Suggestion - How Likely is This?

The destination URL for the post is https - so a friend suggested that our server may have recognised this and added random padding as part of the encryption. This sounds kind of unlikely to me, but I don't have any better suggestions.


回答1:


I have found a similar question:

Error in downloaded pdf file - ASP classic

Here are some tips that come from there:

  • set Stream .Mode property to 3
  • set Response.ContentType to "xxx/xxx"
  • Before you start adding Response Headers, you should call Response.Clear (just to be sure you're not sending extra markup) (This seems very similar)

Hope this helps :-)



来源:https://stackoverflow.com/questions/18341803/exact-same-file-and-code-so-why-does-the-binary-of-my-docx-file-always-end-diff

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!