I need to store a huge amount of binary data into a file, but I want also to read/write the header of that file in XML format.
Yes, I could just store the binary
This is not natively supportted by JAXB as you do not want serialize the binary data to XML, but can usually be done in a higher level when using JAXB. The way I do this is with webservices (SOAP and REST) is using MIME multipart/mixed messages (check multipart specification). Initially designed for emails, works great to send xml with binary data and most webservice frameworks such as axis or jersey support it in an almost transparent way.
Here is an example of sending an object in XML together with a binary file with REST webservice using Jersey with the jersey-multipart extension.
XML object
@XmlRootElement
public class Book {
private String title;
private String author;
private int year;
//getter and setters...
}
Client
byte[] bin = some binary data...
Book b = new Book();
b.setAuthor("John");
b.setTitle("wild stuff");
b.setYear(2012);
MultiPart multiPart = new MultiPart();
multiPart.bodyPart(new BodyPart(b, MediaType.APPLICATION_XML_TYPE));
multiPart.bodyPart(new BodyPart(bin, MediaType.APPLICATION_OCTET_STREAM_TYPE));
response = service.path("rest").path("multipart").
type(MultiPartMediaTypes.MULTIPART_MIXED).
post(ClientResponse.class, multiPart);
Server
@POST
@Consumes(MultiPartMediaTypes.MULTIPART_MIXED)
public Response post(MultiPart multiPart) {
for(BodyPart part : multiPart.getBodyParts()) {
System.out.println(part.getMediaType());
}
return Response.status(Response.Status.ACCEPTED).
entity("Attachements processed successfully.").
type(MediaType.TEXT_PLAIN).build();
}
I tried to send a file with 110917 bytes. Using wireshark, you can see that the data is sent directly over HTTP like this:
Hypertext Transfer Protocol
POST /org.etics.test.rest.server/rest/multipart HTTP/1.1\r\n
Content-Type: multipart/mixed; boundary=Boundary_1_353042220_1343207087422\r\n
MIME-Version: 1.0\r\n
User-Agent: Java/1.7.0_04\r\n
Host: localhost:8080\r\n
Accept: text/html, image/gif, image/jpeg\r\n
Connection: keep-alive\r\n
Content-Length: 111243\r\n
\r\n
[Full request URI: http://localhost:8080/org.etics.test.rest.server/rest/multipart]
MIME Multipart Media Encapsulation, Type: multipart/mixed, Boundary: "Boundary_1_353042220_1343207087422"
[Type: multipart/mixed]
First boundary: --Boundary_1_353042220_1343207087422\r\n
Encapsulated multipart part: (application/xml)
Content-Type: application/xml\r\n\r\n
eXtensible Markup Language
John
wild stuff
2012
Boundary: \r\n--Boundary_1_353042220_1343207087422\r\n
Encapsulated multipart part: (application/octet-stream)
Content-Type: application/octet-stream\r\n\r\n
Media Type
Media Type: application/octet-stream (110917 bytes)
Last boundary: \r\n--Boundary_1_353042220_1343207087422--\r\n
As you see, binary data is sent has octet-stream, with no waste of space, contrarly to what happens when sending binary data inline in the xml. The is just the very low overhead MIME envelope. With SOAP the principle is the same (just that it will have the SOAP envelope).