How to parse a OFX (Version 1.0.2) file in PHP?

流过昼夜 提交于 2021-02-07 12:26:22

问题


I have a OFX file downloaded from Citibank, this file has a DTD defined at http://www.ofx.net/DownloadPage/Files/ofx102spec.zip (file OFXBANK.DTD), the OFX file appear to be SGML valid. I'm trying with DomDocument of PHP 5.4.13, but I get several warning and file is not parsed. My Code is:

$file = "source/ACCT_013.OFX";
$dtd = "source/ofx102spec/OFXBANK.DTD";
$doc = new DomDocument();
$doc->loadHTMLFile($file);
$doc->schemaValidate($dtd);
$dom->validateOnParse = true;

The OFX file start as:

OFXHEADER:100
DATA:OFXSGML
VERSION:102
SECURITY:NONE
ENCODING:USASCII
CHARSET:1252
COMPRESSION:NONE
OLDFILEUID:NONE
NEWFILEUID:NONE

<OFX>
<SIGNONMSGSRSV1>
<SONRS>
<STATUS>
<CODE>0
<SEVERITY>INFO
</STATUS>
<DTSERVER>20130331073401
<LANGUAGE>SPA
</SONRS>
</SIGNONMSGSRSV1>
<BANKMSGSRSV1>
<STMTTRNRS>
<TRNUID>0
<STATUS>
<CODE>0
<SEVERITY>INFO
</STATUS>
<STMTRS>
<CURDEF>COP
<BANKACCTFROM> ...

I'm open to install and use any program in Server (Centos) for call from PHP.

PD: This class http://www.phpclasses.org/package/5778-PHP-Parse-and-extract-financial-records-from-OFX-files.html don't work for me.


回答1:


Well first of all even XML is a subset of SGML a valid SGML file must not be a well-formed XML file. XML is more strict and does not use all features that SGML offers.

As DOMDocument is XML (and not SGML) based, this is not really compatible.

Next to that problem, please see 2.2 Open Financial Exchange Headers in Ofexfin1.doc it explains you that

The contents of an Open Financial Exchange file consist of a simple set of headers followed by contents defined by that header

and further on:

A blank line follows the last header. Then (for type OFXSGML), the SGML-readable data begins with the <OFX> tag.

So locate the first blank line and strip everyhing until there. Then load the SGML part into DOMDocument by converting the SGML into XML first:

$source = fopen('file.ofx', 'r');
if (!$source) {
    throw new Exception('Unable to open OFX file.');
}

// skip headers of OFX file
$headers = array();
$charsets = array(
    1252 => 'WINDOWS-1251',
);
while(!feof($source)) {
    $line = trim(fgets($source));
    if ($line === '') {
        break;
    }
    list($header, $value) = explode(':', $line, 2);
    $headers[$header] = $value;
}

$buffer = '';

// dead-cheap SGML to XML conversion
// see as well http://www.hanselman.com/blog/PostprocessingAutoClosedSGMLTagsWithTheSGMLReader.aspx
while(!feof($source)) {

    $line = trim(fgets($source));
    if ($line === '') continue;

    $line = iconv($charsets[$headers['CHARSET']], 'UTF-8', $line);
    if (substr($line, -1, 1) !== '>') {
        list($tag) = explode('>', $line, 2);
        $line .= '</' . substr($tag, 1) . '>';
    }
    $buffer .= $line ."\n";
}

// use DOMDocument with non-standard recover mode
$doc = new DOMDocument();
$doc->recover = true;
$doc->preserveWhiteSpace = false;
$doc->formatOutput = true;
$save = libxml_use_internal_errors(true);
$doc->loadXML($buffer);
libxml_use_internal_errors($save);

echo $doc->saveXML();

This code-example then outputs the following (re-formatted) XML which also shows that DOMDocument loaded the data properly:

<?xml version="1.0"?>
<OFX>
  <SIGNONMSGSRSV1>
    <SONRS>
      <STATUS>
        <CODE>0</CODE>
        <SEVERITY>INFO</SEVERITY>
      </STATUS>
      <DTSERVER>20130331073401</DTSERVER>
      <LANGUAGE>SPA</LANGUAGE>
    </SONRS>
  </SIGNONMSGSRSV1>
  <BANKMSGSRSV1>
    <STMTTRNRS>
      <TRNUID>0</TRNUID>
      <STATUS>
        <CODE>0</CODE>
        <SEVERITY>INFO</SEVERITY>
      </STATUS>
      <STMTRS><CURDEF>COP</CURDEF><BANKACCTFROM> ...</BANKACCTFROM>
</STMTRS>
    </STMTTRNRS>
  </BANKMSGSRSV1>
</OFX>

I do not know whether or not this can be validated against the DTD then. Maybe this works. Additionally if the SGML is not written with the values that are of a tag on the same line (and only a single element on each line is required), then this fragile conversion will break.




回答2:


Simplest OFX parse into an array with easy access to all values and transactions.

function parseOFX($ofx) {
    $OFXArray=explode("<",$ofx);
    $a=array();
    foreach ($OFXArray as $v) {
        $pair=explode(">",$v);
        if (isset($pair[1])) {
            if ($pair[1]!=NULL) {
                if (isset($a[$pair[0]])) {
                    if (is_array($a[$pair[0]])) {
                        $a[$pair[0]][]=$pair[1];
                    } else {
                        $temp=$a[$pair[0]];
                        $a[$pair[0]]=array();
                        $a[$pair[0]][]=$temp;
                        $a[$pair[0]][]=$pair[1];
                    }
                } else {
                    $a[$pair[0]]=$pair[1];
                }
            }
        }
    }
    return $a;
}



回答3:


i use this:

$source = utf8_encode(file_get_contents('a.ofx'));

//add end tag
$source = preg_replace('#^<([^>]+)>([^\r\n]+)\r?\n#mU', "<$1>$2</$1>\n", $source);

//skip header
$source = substr($source, strpos($source,'<OFX>'));

//convert to array
$xml = simplexml_load_string($source);
$array = json_decode(json_encode($xml),true);

print_r($array);


来源:https://stackoverflow.com/questions/15735330/how-to-parse-a-ofx-version-1-0-2-file-in-php

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!