How to parse a XML file stored in my google drive but which stands out as a html type?

问题

How to parse a XML file stored in my google drive but which stands out as a html type ?!

I save on my google Drive cloud a copie of an xml of the source: http://api.allocine.fr/rest/v3/movie?media=mp4-lc&partner=YW5kcm9pZC12Mg&profile=large&version=2&code=265621 I can parsing the source but i cant'xml parsing the copie that look like a html type !! i have parsing error like: The element type "meta" must be terminated by the matching end-tag "" or Element type "a.length" must be followed by either attribute specifications, ">" or "/>" I shared it on https://drive.google.com/file/d/16kJ5Nko-waVb8s2T12LaTEKaFY01603n/view?usp=sharing to give you an access and test my script. I know that i can using cacheService and it works but for have other control of the buffering i woud try this way

function xmlParsingXmlStoreOnGoogleDrive(){
     //So , this is the original xml that is good parsed
 var fetched=UrlFetchApp.fetch("http://api.allocine.fr/rest/v3/movie?media=mp4-lc&partner=YW5kcm9pZC12Mg&profile=large&version=2&code=265621")
 var blob=fetched.getBlob();
 var getAs=blob.getAs("text/xml")
 var data=getAs.getDataAsString("UTF-8")
 Logger.log(data.substring(1,350)); // substring to not saturate the debug display this expected code XML:
 /*
    ?xml version="1.0" encoding="utf-8"?>
    <!-- Copyright © 2019 AlloCiné -->
    <movie code="265621" xmlns="http://www.allocine.net/v6/ns/">
    <movieType code="4002">Long-métrage</movieType>
    <originalTitle>Mise à jour sur Google play</originalTitle>
    <title>Mise à jour sur Google play</title>
    <keywords>Portrait of a Lady on Fire </keywords>
 */
 var xmlDocument=XmlService.parse(data);
 var root=xmlDocument.getRootElement();
 var keywords=root.getChild("keywords",root.getNamespace()).getText();
 Logger.log(keywords);  // Display the expected result :"Portrait of a Lady on Fire "

 // And this my copie of the original xml, that i can't parsing
 var fetched=UrlFetchApp.fetch("https://drive.google.com/file/d/1K3-9dHy-h0UoOOY5jYfiSoYPezSi55h1/view?usp=sharing")
 var blob=fetched.getBlob();
 var getAs=blob.getAs("text/xml")
 var data=getAs.getDataAsString("UTF-8")
 Logger.log(data.substring(1,350)); // substring to not saturate the debug display this non expected code HTML !:
 /*
   !DOCTYPE html><html><head><meta name="google" content="notranslate"><meta http-equiv="X-UA-Compatible" content="IE=edge;">
   <style>@font-face{font-family:'Roboto';font-style:italic;font-weight:400;src:local('Roboto Italic'),local('Roboto-Italic'),
   url(//fonts.gstatic.com/s/roboto/v18/KFOkCnqEu92Fr1Mu51xIIzc.ttf)format('truetype');}@font-face{font-fam......
 */
 var xmlDocument=XmlService.parse(data); // ABORT WITH THE ERROR: Element type "a.length" must be followed by either attribute specifications, ">" or "/>"
 var root=xmlDocument.getRootElement();
 var keywords=root.getChild("keywords",root.getNamespace()).getText();
 Logger.log(keywords);
}

I read on this similar ask :Parse XML file (which is stored on GoogleDrive) with Google app script

that "Unfortunately we can't directly get xml files in the google drive" !! Is it right and would that simply mean that I can not realize my script?

回答1:

You want to retrieve the data from the file on Google Drive and parse as XML data using XmlService.
You want to achieve this using Google Apps Script.

If my understanding is correct, how about this answer?

Modification points:

About var fetched=UrlFetchApp.fetch("https://drive.google.com/file/d/16kJ5Nko-waVb8s2T12LaTEKaFY01603n/view?usp=sharing"), in this case, the file content cannot be retrieved from this endpoint. If you want to retrieve the file content with UrlFetchApp, please use the endpoint of https://drive.google.com/uc?id=16kJ5Nko-waVb8s2T12LaTEKaFY01603n&export=download. This is webContentLink.
When the file is in your Google Drive and/or shared publicly, you can retrieve the data with the script of DriveApp.getFileById(fileId).getBlob().getDataAsString().

Modified script:

For example, when your shared sample file of https://drive.google.com/file/d/16kJ5Nko-waVb8s2T12LaTEKaFY01603n/view?usp=sharing is used, the script becomes as follows.

Sample script 1:

In this pattern, the file content is retrieved from your shared file with UrlFetchApp.fetch().

var data = UrlFetchApp.fetch("https://drive.google.com/uc?id=16kJ5Nko-waVb8s2T12LaTEKaFY01603n&export=download").getContentText(); // Modified
var xmlDocument=XmlService.parse(data);
var root=xmlDocument.getRootElement();
var keywords=root.getChild("keywords",root.getNamespace()).getText();
Logger.log(keywords); // <--- You can see "Portrait of a Lady on Fire" at log.

In this case, the script is required to be shared publicly. If you want to retrieve the file content without sharing, please use the access token for requesting.

Sample script 2:

In this pattern, the file content is retrieved from your shared file with DriveApp.getFileById().

var fileId = "16kJ5Nko-waVb8s2T12LaTEKaFY01603n"; // Added
var data = DriveApp.getFileById(fileId).getBlob().getDataAsString(); // Added
var xmlDocument=XmlService.parse(data);
var root=xmlDocument.getRootElement();
var keywords=root.getChild("keywords",root.getNamespace()).getText();
Logger.log(keywords); // <--- You can see "Portrait of a Lady on Fire" at log.

16kJ5Nko-waVb8s2T12LaTEKaFY01603n of https://drive.google.com/file/d/16kJ5Nko-waVb8s2T12LaTEKaFY01603n/view?usp=sharing is the file ID.
In this case, the file is not required to be shared. But the file is required to be in your Google Drive.

References:

Files of Drive API
- webContentLink: A link for downloading the content of the file in a browser using cookie based authentication. In cases where the content is shared publicly, the content can be downloaded without any credentials.
getFileById(id)

If I misunderstood your question and this was not the direction you want, I apologize.

回答2:

Wonderful ! You are write. Your two suggestions are working. I just made a mistake elsewhere in my code. So that solution 1 does not work anymore. That is why give a new script to test it . For my training only, because my project is safe thanks to you :)

function storeXmlOnGoogleDriveThenParsIt(url){
  url=url||"http://api.allocine.fr/rest/v3/movie?media=mp4-lc&partner=YW5kcm9pZC12Mg&profile=large&version=2&code=265621"; // to test
  // on my Google Drive i make a copi of the url called. (This to preserve the server from too many request.)
  var bufferedXml=DriveApp.getRootFolder().searchFolders('title = "BufferFiles"').next().createFile("xmlBuffered.xml", UrlFetchApp.fetch(url).getContentText(),MimeType.PLAIN_TEXT);
  var urlBufferedXml=bufferedXml.getUrl()   // The new url ,of the buffered file
  var fileId=urlBufferedXml.match(/https:\/\/drive.google.com\/file\/d\/(.*)\/view.*/)[1];


  //Now i want to pars the buffered xml file
  //[ Your seconde way to get data is working perect ! THANK YOU A LOT !!!
  var data = DriveApp.getFileById(fileId).getBlob().getDataAsString(); 
  var xmlDocument=XmlService.parse(data);                              
  var root=xmlDocument.getRootElement();
  var mynamespace=root.getNamespace();
  var keywords=root.getChild("keywords",root.getNamespace()).getText();
  Logger.log("keywords:"+keywords)                            // and parsing success ]


  //[ The first way to get data was ok BUT DAMNED it now aborting ! Since modifications on the line code that create the xml, and i cant' retrieve the right code
  var downloadUrlBufferedXml="https://drive.google.com/uc?id="+fileId+"&export=download";
  var data = UrlFetchApp.fetch(downloadUrlBufferedXml).getContentText(); // was good but now data is here again like a html text ! :(
  Logger.log("data"+data.substring(1,350)); // this show that data is HTML type and not XML type !  :(
  var xmlDocument=XmlService.parse(data);  // So i have Error like: The element type "meta" must be terminated by the matching end-tag "</meta>"  ]
  var root=xmlDocument.getRootElement();
  var mynamespace=root.getNamespace();
  var keywords=root.getChild("keywords",root.getNamespace()).getText();
  Logger.log("keywords:"+keywords)
}

来源：https://stackoverflow.com/questions/58279456/how-to-parse-a-xml-file-stored-in-my-google-drive-but-which-stands-out-as-a-html

标签

google-apps-script

xml-parsing

google-drive-api