Parsing curl results as simpleXML and using those to create new XML data

你离开我真会死。 提交于 2019-12-12 03:53:30

问题


I'm pulling data from PubMed as XML and using curl to process those results which I load into another page as SimpleXML. This allows me to grab the information I need (a list of pub IDs) and use that as a variable for ANOTHER pubmed scrape. This one gets the summaries of the specific pub IDs. Here's my first file (the $name will eventually be dynamic):

<?php 
header('Content-type: text/xml');
$name = 'white,theodore';

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term='.$name.'[author]&retmode=xml&retmax=50');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_VERBOSE, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'POST');
curl_setopt($ch, CURLOPT_FRESH_CONNECT, 1);

$output = curl_exec($ch);

print $output;

curl_close($ch);

?>

Which exports XML data that includes (among other things) a list of Pub Ids.

xml output

<eSearchResult>
<Count>45</Count>
<RetMax>45</RetMax>
<RetStart>0</RetStart>
  <IdList>
  <Id>27431223</Id>
  <Id>26234644</Id>
  <Id>25824209</Id>
  <Id>25667269</Id>
  <Id>25646566</Id>
  <Id>25085959</Id>
  <Id>24453983</Id>
  <Id>23908482</Id>
  <Id>23845238</Id>
  <Id>23758576</Id>
  <Id>23606207</Id>
  <Id>23475705</Id>
  <Id>23253612</Id>
  <Id>22951933</Id>
  <Id>22479177</Id>
  <Id>22080454</Id>
  <Id>21977036</Id>
  <Id>21951709</Id>
  <Id>21247460</Id>
  <Id>21145410</Id>
  <Id>21078937</Id>
  <Id>20941354</Id>
  <Id>20737430</Id>
  <Id>20656915</Id>
  <Id>20430817</Id>
  <Id>20161440</Id>
  <Id>19880755</Id>
  <Id>18757808</Id>
  <Id>18675371</Id>
  <Id>18539886</Id>
  <Id>18436555</Id>
  <Id>18404551</Id>
  <Id>18343803</Id>
  <Id>18310042</Id>
  <Id>17951521</Id>
  <Id>17071565</Id>
  <Id>15980350</Id>
  <Id>15766602</Id>
  <Id>15590814</Id>
  <Id>15047513</Id>
  <Id>14653518</Id>
  <Id>12576598</Id>
  <Id>12517831</Id>
  <Id>12019079</Id>
  <Id>11932451</Id>
</IdList>
<TranslationSet>
<Translation>
  <From>white, theodore[author]</From>
  <To>White, Theodore[Full Author Name]</To>
</Translation>
</TranslationSet>
<TranslationStack>
<TermSet>
  <Term>White, Theodore[Full Author Name]</Term>
  <Field>Full Author Name</Field>
  <Count>45</Count>
  <Explode>N</Explode>
</TermSet>
<OP>GROUP</OP>
</TranslationStack>
<QueryTranslation>White, Theodore[Full Author Name] </QueryTranslation>
</eSearchResult>

I then load that into another page so I can use SimpleXML to convert the Pub IDs into a variable. And using that variable, attempt another curl/pubmed request, this one pulls summaries based on those IDs:

<?php
$xml=simplexml_load_file('https://sbs2.umkc.edu/wp-content/themes/SBS_Theme/js/pubMedExport.php','SimpleXMLElement', LIBXML_NOCDATA) or die("Error: Cannot create object");
$idList = $xml->IdList;
foreach($idList->children() as $id) {
$idResult = $id . ",";
//echo $idResult;

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id='.$id.'');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_VERBOSE, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'POST');
curl_setopt($ch, CURLOPT_FRESH_CONNECT, 1);

$result = curl_exec($ch);
echo $result . "</br></br>";

curl_close($ch);

}
?>

I can get this to export as individual citations but my problem is, I need to still be able to grab ahold of that second set of data so that I can format certain things like Authors and exclude irrelevant data.

full citations

Here's the XML from ONE result.

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE eSummaryResult PUBLIC "-//NLM//DTD esummary v1 20041029//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20041029/esummary-v1.dtd">
<eSummaryResult>
  <DocSum>
    <Id>27431223</Id>
    <Item Name="PubDate" Type="Date">2016 Oct</Item>
    <Item Name="EPubDate" Type="Date">2016 Sep 23</Item>
    <Item Name="Source" Type="String">Antimicrob Agents Chemother</Item>
    <Item Name="AuthorList" Type="List">
      <Item Name="Author" Type="String">Bhattacharya S</Item>
      <Item Name="Author" Type="String">Sobel JD</Item>
      <Item Name="Author" Type="String">White TC</Item>
    </Item>
    <Item Name="LastAuthor" Type="String">White TC</Item>
    <Item Name="Title" Type="String">A Combination Fluorescence Assay Demonstrates Increased Efflux Pump Activity as a Resistance Mechanism in Azole-Resistant Vaginal Candida albicans Isolates.</Item>
    <Item Name="Volume" Type="String">60</Item>
    <Item Name="Issue" Type="String">10</Item>
    <Item Name="Pages" Type="String">5858-66</Item>
    <Item Name="LangList" Type="List">
    <Item Name="Lang" Type="String">English</Item>
    </Item>
    <Item Name="NlmUniqueID" Type="String">0315061</Item>
    <Item Name="ISSN" Type="String">0066-4804</Item>
    <Item Name="ESSN" Type="String">1098-6596</Item>
    <Item Name="PubTypeList" Type="List">
      <Item Name="PubType" Type="String">Journal Article</Item>
    </Item>
    <Item Name="RecordStatus" Type="String">Unknown status</Item>
    <Item Name="PubStatus" Type="String">epublish</Item>
    <Item Name="ArticleIds" Type="List">
      <Item Name="pubmed" Type="String">27431223</Item>
      <Item Name="pii" Type="String">AAC.01252-16</Item>
      <Item Name="doi" Type="String">10.1128/AAC.01252-16</Item>
      <Item Name="pmc" Type="String">PMC5038269</Item>
      <Item Name="rid" Type="String">27431223</Item>
      <Item Name="eid" Type="String">27431223</Item>
      <Item Name="pmcid" Type="String">pmc-id: PMC5038269;embargo-date: 2017/04/01;</Item>
    </Item>
    <Item Name="DOI" Type="String">10.1128/AAC.01252-16</Item>
    <Item Name="History" Type="List">
      <Item Name="received" Type="Date">2016/06/10 00:00</Item>
      <Item Name="accepted" Type="Date">2016/07/12 00:00</Item>
      <Item Name="pmc-release" Type="Date">2017/04/01 00:00</Item>
      <Item Name="entrez" Type="Date">2016/07/20 06:00</Item>
      <Item Name="pubmed" Type="Date">2016/07/20 06:00</Item>
      <Item Name="medline" Type="Date">2016/07/20 06:00</Item>
    </Item>
    <Item Name="References" Type="List"></Item>
    <Item Name="HasAbstract" Type="Integer">1</Item>
    <Item Name="PmcRefCount" Type="Integer">0</Item>
    <Item Name="FullJournalName" Type="String">Antimicrobial agents and chemotherapy</Item>
    <Item Name="ELocationID" Type="String">doi: 10.1128/AAC.01252-16</Item>
    <Item Name="SO" Type="String">2016 Oct;60(10):5858-66</Item>
</DocSum>

</eSummaryResult>
</br></br>

I can't figure out how to grab the items in that second set of data. The source reveals it's still formatted properly but I keep getting "Trying to get property of non-object" errors.

I considered sending these results to yet another file and use SimpleXML to control it, but because I'm parsing the first file and adding another curl on the same page, it doesn't seem to like it when I add the header

Any help would be greatly appreciated!

UPDATE: Thanks to @EatPeanutButter for pointing me in the right direction. By using $cxml=simplexml_load_string($result); instead of $Cxml = new SimpleXMLElement($result); I was not only able to grab the data I needed, but also combine the curls onto a single page as follows.

<?php 
$name = 'white,theodore';
// Return xml data from PubMed based on author search name
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term='.$name.'[author]&retmode=xml&retmax=50');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_VERBOSE, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'POST');
curl_setopt($ch, CURLOPT_FRESH_CONNECT, 1);

$output = curl_exec($ch);

curl_close($ch);

// Parse the results and concatenate into a string of Publication IDs
$xml=simplexml_load_string($output);
$idList = $xml->IdList;
$ids = "";
foreach($idList->children() as $id) {
    $ids .= $id . ",";
}

// Plug that string of IDs into another PubMed search, this one returning XML data for Publication Summaries
$path = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id='.$ids;

$ch2 = curl_init();
curl_setopt($ch2, CURLOPT_URL, $path);
curl_setopt($ch2, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch2, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch2, CURLOPT_VERBOSE, 0);
curl_setopt($ch2, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch2, CURLOPT_AUTOREFERER, true);
curl_setopt($ch2, CURLOPT_MAXREDIRS, 10);
curl_setopt($ch2, CURLOPT_CUSTOMREQUEST, 'POST');
curl_setopt($ch2, CURLOPT_FRESH_CONNECT, 1);

$result = curl_exec($ch2);

curl_close($ch2);
// Parse those results and print only what is needed for Citation format
$cxml=simplexml_load_string($result);
foreach($cxml->children() as $docsum) {
  foreach($docsum->children() as $item) {
    foreach($item->children() as $details) {
        if ((string) $details['Name'] === 'Author') {echo $details . "., ";}
    }
    if ((string) $item['Name'] === 'FullJournalName') { echo $item . ". "; }
    if ((string) $item['Name'] === 'Title') { echo "<strong>" . $item . "</strong> "; }
    if ((string) $item['Name'] === 'Volume') { echo "Vol." . $item . ", "; }
    if ((string) $item['Name'] === 'Issue') { echo "Issue" . $item . ". "; }
    if ((string) $item['Name'] === 'PubDate') { echo $item . ". "; }
    foreach($item->children() as $details) {
            if ((string) $details['Name'] === 'PubType') {echo $details . ", ";}
        }
  }
  echo "</br></br>";
}

?>

And now, of course, this has created a new issue which I'm going to post as a follow up question!

来源:https://stackoverflow.com/questions/40984052/parsing-curl-results-as-simplexml-and-using-those-to-create-new-xml-data

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!