How do I get a 50MB zip file with a 600MB xml file into a mysql datatable?

狂风中的少年 提交于 2019-12-18 09:41:53

问题


How do I get a 50MB zip file with a 600MB xml file (over 300,000 "<"abc:ABCRecord">") into a mysql datatable? The xml file itself has the following structure:

<?xml version='1.0' encoding='UTF-8'?>
<abc:ABCData xmlns:abc="http://www.abc-example.com" xmlns:xyz="http:/www.xyz-example.com">
<abc:ABCHeader>
<abc:ContentDate>2015-08-15T09:03:29.379055+00:00</abc:ContentDate>
<abc:FileContent>PUBLISHED</abc:FileContent>
<abc:RecordCount>310598</abc:RecordCount>
<abc:Extension>
  <xyz:Sources>
    <xyz:Source>
      <xyz:ABC>5967007LIEEXZX4LPK21</xyz:ABC>
      <xyz:Name>Bornheim Register Centre</xyz:Name>
      <xyz:ROCSponsorCountry>NO</xyz:ROCSponsorCountry>
      <xyz:RecordCount>398</xyz:RecordCount>
      <xyz:ContentDate>2015-08-15T05:00:02.952+02:00</xyz:ContentDate>
      <xyz:LastAttemptedDownloadDate>2015-08-15T09:00:01.885686+00:00</xyz:LastAttemptedDownloadDate>
      <xyz:LastSuccessfulDownloadDate>2015-08-15T09:00:02.555222+00:00</xyz:LastSuccessfulDownloadDate>
      <xyz:LastValidDownloadDate>2015-08-15T09:00:02.555222+00:00</xyz:LastValidDownloadDate>
     </xyz:Source>
    </xyz:Sources>
   </abc:Extension>
 </abc:ABCHeader>
<abc:ABCRecords>
 <abc:ABCRecord>
 <abc:ABC>5967007LIEEXZX4LPK21</abc:ABC>
  <abc:Entity>
    <abc:LegalName>REGISTERENHETEN I Bornheim</abc:LegalName>
    <abc:LegalAddress>
      <abc:Line1>Havnegata 48</abc:Line1>
      <abc:City>Bornheim</abc:City>
      <abc:Country>NO</abc:Country>
      <abc:PostalCode>8900</abc:PostalCode>
    </abc:LegalAddress>
    <abc:HeadquartersAddress>
      <abc:Line1>Havnegata 48</abc:Line1>
      <abc:City>Bornheim</abc:City>
      <abc:Country>NO</abc:Country>
      <abc:PostalCode>8900</abc:PostalCode>
    </abc:HeadquartersAddress>
    <abc:BusinessRegisterEntityID register="Enhetsregisteret">974757873</abc:BusinessRegisterEntityID>
    <abc:LegalForm>Organisasjonsledd</abc:LegalForm>
    <abc:EntityStatus>Active</abc:EntityStatus>
  </abc:Entity>
  <abc:Registration>
    <abc:InitialRegistrationDate>2014-06-15T12:03:33.000+02:00</abc:InitialRegistrationDate>
    <abc:LastUpdateDate>2015-06-15T20:45:32.000+02:00</abc:LastUpdateDate>
    <abc:RegistrationStatus>ISSUED</abc:RegistrationStatus>
    <abc:NextRenewalDate>2016-06-15T12:03:33.000+02:00</abc:NextRenewalDate>
    <abc:ManagingLOU>59670054IEEXZX44PK21</abc:ManagingLOU>
  </abc:Registration>
</abc:ABCRecord>
<abc:ABCRecord>
  <abc:ABC>5967007LIE45ZX4MHC90</abc:ABC>
  <abc:Entity>
    <abc:LegalName>SUNNDAL HOSTBANK</abc:LegalName>
    <abc:LegalAddress>
      <abc:Line1>Sunfsalsvegen 15</abc:Line1>
      <abc:City>SUNNDALSPRA</abc:City>
      <abc:Country>NO</abc:Country>
      <abc:PostalCode>6600</abc:PostalCode>
    </abc:LegalAddress>
    <abc:HeadquartersAddress>
      <abc:Line1>Sunndalsvegen 15</abc:Line1>
      <abc:City>SUNNDALSPRA</abc:City>
      <abc:Country>NO</abc:Country>
      <abc:PostalCode>6600</abc:PostalCode>
    </abc:HeadquartersAddress>
    <abc:BusinessRegisterEntityID register="Foretaksregisteret">9373245963</abc:BusinessRegisterEntityID>
    <abc:LegalForm>Hostbank</abc:LegalForm>
    <abc:EntityStatus>Active</abc:EntityStatus>
  </abc:Entity>
  <abc:Registration>
    <abc:InitialRegistrationDate>2014-06-26T15:01:02.000+02:00</abc:InitialRegistrationDate>
    <abc:LastUpdateDate>2015-06-27T15:02:39.000+02:00</abc:LastUpdateDate>
    <abc:RegistrationStatus>ISSUED</abc:RegistrationStatus>
    <abc:NextRenewalDate>2016-06-26T15:01:02.000+02:00</abc:NextRenewalDate>
    <abc:ManagingLOU>5967007LIEEXZX4LPK21</abc:ManagingLOU>
  </abc:Registration>
</abc:ABCRecord>
</abc:ABCRecords>
</abc:ABCData>

How does the mysql table need to look like and how can I accomplish this? The goal is to have all the abc tagged content in the table. In addition, there will be a new zip file each day provided via a download link and it should update the table each day. The zip files are named after the following structure: "20150815-XYZ-concatenated-file.zip". A step by step hint would be great? I tried this: Importing XML file with special tags & namespaces <abc:xyz> in mysql as of right now but it's not getting the job done yet!

Based on ThW explanation below I've done the following now:

<?php

// open input
$reader = new XMLReader();
$reader->open('./xmlreader.xml');

// open output
$output = fopen('./xmlreader.csv', 'w');
fputcsv($output, ['id', 'name']);

$xmlns = [
  'a' => 'http://www.abc-example.com'
];

// prepare DOM
$dom   = new DOMDocument;
$xpath = new DOMXpath($dom);
foreach ($xmlns as $prefix => $namespaceURI) {
  $xpath->registerNamespace($prefix, $namespaceURI);
}

// look for the first record element
while (
  $reader->read() && 
  (
    $reader->localName !== 'ABCRecord' || 
    $reader->namespaceURI !== $xmlns['a']
  )
) {
  continue;
}

// while you have an record element
while ($reader->localName === 'ABCRecord') {
  if ($reader->namespaceURI === 'http://www.abc-example.com') {
    // expand record element node
    $node = $reader->expand($dom);
    // fetch data and write it to output
    fputcsv(
      $output, 
      [
        $xpath->evaluate('string(a:ABC)', $node),
        $xpath->evaluate('string(a:Entity/a:LegalName)', $node)
      ]
    );
  }

  // move to the next record sibling
  $reader->next('ABCRecord');
} 

Is this correct?! And where do I find the output?! And how do I get the output in mysql. Sorry for my rookie questions, it's the first time I'm doing this ...

$dbHost = "localhost";
$dbUser = "root";
$dbPass = "password";
$dbName = "new_xml_extract";

$dbConn = mysqli_connect($dbHost, $dbUser, $dbPass, $dbName);

$delete = $dbConn->query("TRUNCATE TABLE `test_xml`");

....

$sql = "INSERT INTO `test_xml` (`.....`, `.....`)" . "VALUES ('". $dbConn->real_escape_string($.....) ."', '".$dbConn->real_escape_string($.....)."')";

$result = $dbConn->query($sql);
}

回答1:


MySQL does not know your XML structure. While it can import simple, wellformed XML structures directly, you will need to convert more complex structures yourself. You can generate CSV, SQL or a (supported) XML.

For large files like that XMLReader is the best API. First create an instance and open the file:

$reader = new XMLReader();
$reader->open('php://stdin');

Your are using namespaces, so I suggest defining a mapping array for them:

$xmlns = [
  'a' => 'http://www.abc-example.com'
];

It is possible to use the same prefixes/aliases as in the XML file, but you can use your own, too.

Next traverse the XML nodes until you find the first record element node:

while (
  $reader->read() && 
  ($reader->localName !== 'ABCRecord' ||  $reader->namespaceURI !== $xmlns['a'])
) {
  continue;
}

You need to compare the local name (the tag name without the namespace prefix) and the namespace URI. This way you program does not depend on the actual prefixes in the XML file.

After you found the first node, you can traverse to the next sibling with the same local name.

while ($reader->localName === 'ABCRecord') {
  if ($reader->namespaceURI === 'http://www.abc-example.com') {
    // read data for the record ...
  }      
  // move to the next record sibling
  $reader->next('ABCRecord');
}

You could use XMLReader to read the record data but it is easier with DOM and XPath expressions. XMLReader can expand the current node into a DOM node. So prepare a DOM document, create an XPath object for it and register the namespaces. Expanding a node will load the node and all descendants into memory, but not parent nodes or siblings.

$dom   = new DOMDocument;
$xpath = new DOMXpath($dom);
foreach ($xmlns as $prefix => $namespaceURI) {
  $xpath->registerNamespace($prefix, $namespaceURI);
}

while ($reader->localName === 'ABCRecord') {
  if ($reader->namespaceURI === 'http://www.abc-example.com') {
    $node = $reader->expand($dom);
    var_dump(
      $xpath->evaluate('string(a:ABC)', $node),
      $xpath->evaluate('string(a:Entity/a:LegalName)', $node)
    );
  }
  $reader->next('ABCRecord');
}

DOMXPath::evaluate() allows you to use Xpath expression to fetch scalar values or node lists from a DOM.

fputcsv() will it make really easy to write the data into a CSV.

Put together:

// open input
$reader = new XMLReader();
$reader->open('php://stdin');

// open output
$output = fopen('php://stdout', 'w');
fputcsv($output, ['id', 'name']);

$xmlns = [
  'a' => 'http://www.abc-example.com'
];

// prepare DOM
$dom   = new DOMDocument;
$xpath = new DOMXpath($dom);
foreach ($xmlns as $prefix => $namespaceURI) {
  $xpath->registerNamespace($prefix, $namespaceURI);
}

// look for the first record element
while (
  $reader->read() && 
  (
    $reader->localName !== 'ABCRecord' || 
    $reader->namespaceURI !== $xmlns['a']
  )
) {
  continue;
}

// while you have an record element
while ($reader->localName === 'ABCRecord') {
  if ($reader->namespaceURI === 'http://www.abc-example.com') {
    // expand record element node
    $node = $reader->expand($dom);
    // fetch data and write it to output
    fputcsv(
      $output, 
      [
        $xpath->evaluate('string(a:ABC)', $node),
        $xpath->evaluate('string(a:Entity/a:LegalName)', $node)
      ]
    );
  }

  // move to the next record sibling
  $reader->next('ABCRecord');
} 

Output:

id,name
5967007LIEEXZX4LPK21,"REGISTERENHETEN I Bornheim"
5967007LIE45ZX4MHC90,"SUNNDAL HOSTBANK"


来源:https://stackoverflow.com/questions/32112985/how-do-i-get-a-50mb-zip-file-with-a-600mb-xml-file-into-a-mysql-datatable

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!