Mime4j: DefaultMessageBuilder fails to parse mbox content

五迷三道 提交于 2019-12-23 10:53:09

问题


I've downloaded mime4j 0.8.0 snapshot from subversion and built it with maven. The relevant jars I generated can be found here.

Now I try to parse a toy mbox file from mime4j test.

I use this sample code. Briefly:

final File mbox = new File("c:\\mbox.rlug");
int count = 0;
for (CharBufferWrapper message : MboxIterator.fromFile(mbox).charset(ENCODER.charset()).build()) {
    System.out.println(messageSummary(message.asInputStream(ENCODER.charset())));
    count++;
}
System.out.println("Found " + count + " messages");

+

private static String messageSummary(InputStream messageBytes) throws IOException, MimeException {
    MessageBuilder builder = new DefaultMessageBuilder();
    Message message = builder.parseMessage(messageBytes);
    return String.format("\nMessage %s \n" +
            "Sent by:\t%s\n" +
            "To:\t%s\n",
            message.getSubject(),
            message.getSender(),
            message.getTo());
}

The output is:

Message null Sent by: null To: null

Message null Sent by: null To: null

Message null Sent by: null To: null

Message null Sent by: null To: null

Message null Sent by: null To: null

Found 5 messages

There are indeed 5 messages, but why are all fields null?


回答1:


Based on @zvisofer answer, I found the guilty piece of code in BufferedLineReaderInputStream:

@Override
public int readLine(final ByteArrayBuffer dst)
        throws MaxLineLimitException, IOException {
    if (dst == null) {
        throw new IllegalArgumentException("Buffer may not be null");
    }
    if (!readAllowed()) return -1;

    int total = 0;
    boolean found = false;
    int bytesRead = 0;
    while (!found) {
        if (!hasBufferedData()) {
            bytesRead = fillBuffer();
            if (bytesRead == -1) {
                break;
            }
        }
        int i = indexOf((byte)'\n');
        int chunk;
        if (i != -1) {
            found = true;
            chunk = i + 1 - pos();
        } else {
            chunk = length();
        }
        if (chunk > 0) {
            dst.append(buf(), pos(), chunk);
            skip(chunk);
            total += chunk;
        }
        if (this.maxLineLen > 0 && dst.length() >= this.maxLineLen) {
            throw new MaxLineLimitException("Maximum line length limit exceeded");
        }
    }
    if (total == 0 && bytesRead == -1) {
        return -1;
    } else {
        return total;
    }
}

The best thing to do would be to report the bug but here is a fix, a little dirty but it's working fine

Create the class org.apache.james.mime4j.io.BufferedLineReaderInputStream in your project

Replace the method public int readLine(final ByteArrayBuffer dst) by this one :

@Override
public int readLine(final ByteArrayBuffer dst)
        throws MaxLineLimitException, IOException {
    if (dst == null) {
        throw new IllegalArgumentException("Buffer may not be null");
    }
    if (!readAllowed()) return -1;

    int total = 0;
    boolean found = false;
    int bytesRead = 0;
    while (!found) {
        if (!hasBufferedData()) {
            bytesRead = fillBuffer();
            if (bytesRead == -1) {
                break;
            }
        }

        int chunk;
        int i = indexOf((byte)'\r');
        if (i != -1) {
            found = true;
            chunk = i + 2 - pos();
        } else {
            i = indexOf((byte)'\n');
            if (i != -1) {
                found = true;
                chunk = i + 1 - pos();
            } else {
                chunk = length();
            }
        }
        if (chunk > 0) {
            dst.append(buf(), pos(), chunk);
            skip(chunk);
            total += chunk;
        }
        if (this.maxLineLen > 0 && dst.length() >= this.maxLineLen) {
            throw new MaxLineLimitException("Maximum line length limit exceeded");
        }
    }
    if (total == 0 && bytesRead == -1) {
        return -1;
    } else {
        return total;
    }
}

Enjoy both unix and dos files :)




回答2:


I found the problem.

DefaultMessageBuilder fails to parse mbox files that have windows line separator \r\n. When replacing them with UNIX line separator \n the parsing works.

This is a critical issue, since the mbox files downloaded from Gmail use \r\n.




回答3:


I downloaded your jar files, the sample code that you pointed to, and the sample mbox file that you pointed to, compiled the sample (with no changes) and ran it against the sample mbox file.

It worked as expected (fields contained the expected data, not nulls). This was on a Mac with Java 1.6_0_65, and also with 1.8.0_11

Output was as follows:

$ java -cp .:apache-mime4j-core-0.8.0-SNAPSHOT.jar:apache-mime4j-dom-0.8.0-SNAPSHOT.jar:apache-mime4j-mbox-iterator-0.8.0-SNAPSHOT.jar IterateOverMbox mbox.rlug.txt

Message Din windows ma pot, din LINUX NU ma pot conecta (la ZAPP) Sent by: rlug-bounce@lug.ro To: [rlug@lug.ro]

Message Re: RH 8.0 boot floppy Sent by: rlug-bounce@lug.ro To: [rlug@lug.ro]

Message Qmail mysql virtualusers +ssl + smtp auth +pop3 Sent by: rlug-bounce@lug.ro To: [rlug@lug.ro]

Message Re: Din windows ma pot, din LINUX NU ma pot conecta (la ZAPP) Sent by: rlug-bounce@lug.ro To: [rlug@lug.ro]

Message LSTP problem - solved Sent by: rlug-bounce@lug.ro To: [rlug@lug.ro]

Found 5 messages Done in: 108 milis



来源:https://stackoverflow.com/questions/27201605/mime4j-defaultmessagebuilder-fails-to-parse-mbox-content

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!