Getting the Notepad++ XML parsing error “Extra content at the end of the document” even though there is none

╄→尐↘猪︶ㄣ 提交于 2019-12-10 17:08:33

问题


I am getting the abovementioned error message when trying to validate my 55 mb XML file in Notepad++. The first enocountered error is here (line 1441520 out of 22258651): Screenshot from Notepad++

I have turned on Show all characters. Nothing suggests that there should be any illegal characters at the end of the line. As you can see on the screenshot there are no other hidden characters than CR+LF.

EDIT: Below is a copy of the record that causes the parsing error in Notepad++:

    <?xml version="1.0" encoding="UTF-8"?>
<Registreringer>
  <Registrering>
    <ID>1697947</ID>
    <LHAnr>316-01</LHAnr>
    <RegId>316-01K1037</RegId>
    <RegType />
    <Signatur>K</Signatur>
    <Regnr>1037</Regnr>
    <srnr />
    <ArkivSkaber />
    <Journalnr />
    <Sted>460872</Sted>
    <sted1>315</sted1>
    <sted2>12</sted2>
    <sted3>0</sted3>
    <UTM />
    <Betegnelse>
Hidden.

Hidden.
</Betegnelse>
    <kat1 />
    <kat2 />
    <kat3 />
    <kat4 />
    <Datering>1804</Datering>
    <DateringNote />
    <Klausul>Almindelige regler</Klausul>
    <Bem />
    <BemEx1 />
    <BemEx2 />
    <IntBem />
    <KortResume>
Hidden

Opmaalt 1804 af Hidden.
</KortResume>
    <SogeOrd />
    <RegDato>25-04-2000 00:00:00</RegDato>
    <RegAf>Hidden</RegAf>
    <GodkDato />
    <Godkendt />
    <Varighed />
    <Fra>1804</Fra>
    <Til>1804</Til>
    <YderAar />
    <Signaturer />
    <IaltBind />
    <IaltPakker />
    <IaltLaeg />
    <Stiftet />
    <Nedlagt />
    <hyldemeter>0,00</hyldemeter>
    <hyldecentimeter />
    <placering />
    <Art>Markkort</Art>
    <Maal>26 x 38</Maal>
    <TeknOpl>
Affoto



</TeknOpl>
    <Fotograf />
    <Materiale />
    <materiale2 />
    <Negativ />
    <FotografNegativ />
    <foto1 />
    <foto2 />
    <Referencenr />
    <Ref>



</Ref>
    <Motiv />
    <Udgaver />
    <Obs />
    <billede />
    <Samlingstype>14</Samlingstype>
    <SkabelonId />
    <Publicering />
    <Materialetype />
    <PkBind>0</PkBind>
    <PkPakker>0</PkPakker>
    <PkLaeg>0</PkLaeg>
    <Henvisning>
      <Id>3592636</Id>
      <LhaNr>316-01</LhaNr>
      <RegId />
      <RegRef>1697947</RegRef>
      <SektionId />
      <Henvisning>Hidden</Henvisning>
      <StedId>460872</StedId>
      <Fra>1804</Fra>
      <Til>1804</Til>
      <DecimalId>1006268</DecimalId>
      <EmneordId>1449984</EmneordId>
      <EmneordLokal>
        <id>1449984</id>
        <LHAnr>316-01</LHAnr>
        <DecimalId>1006268</DecimalId>
        <Decimalklasse>40.164</Decimalklasse>
        <Emneord>Udskiftningskort</Emneord>
        <EmneStikord />
      </EmneordLokal>
      <StedLokal>
        <Id>460872</Id>
        <LhaNr>316-01</LhaNr>
        <StedKode>315-12-00</StedKode>
        <StedTxt>Hidden</StedTxt>
        <Sted1>315</Sted1>
        <Sted2>12</Sted2>
        <Sted3>0</Sted3>
        <GenStedkode />
      </StedLokal>
      <DecimalLokal>
        <ID>1006268</ID>
        <LHAnr>316-01</LHAnr>
        <Decimal>40.164</Decimal>
        <DecimalTxt>Kort</DecimalTxt>
        <CommonDecimal>40.164</CommonDecimal>
        <DecimalLokalStikord>
          <ID>6969206</ID>
          <LHAnr>316-01</LHAnr>
          <Decimal>40.164</Decimal>
          <Stikord>Kort</Stikord>
        </DecimalLokalStikord>
        <DecimalLokalStikord>
          <ID>6969207</ID>
          <LHAnr>316-01</LHAnr>
          <Decimal>40.164</Decimal>
          <Stikord>Matrikelkort</Stikord>
        </DecimalLokalStikord>
      </DecimalLokal>
    </Henvisning>
  </Registrering>
</Registreringer>

When using W3C's validator, I don't get any errors, so I suspect this is a Notepad++ specific issue with long XML files. Running EOL/blank removal scripts in Notepad++ also corrupts the file. I probably need to use a CLI based alternative... What do you recommend? @jim-garrison @villapx


回答1:


That error often occurs when there is a syntactic issue with your HTML tags, such as not properly closing a tag with a </tag> or having a space in a tag name.

Try pasting the contents of your XML file into a different XML validator, such as w3's, and see if you get the same error, or (hopefully) a more descriptive one.

To get a better answer, please provide a Minimal, Complete and Verifiable example so we can reproduce your problem.



来源:https://stackoverflow.com/questions/34344522/getting-the-notepad-xml-parsing-error-extra-content-at-the-end-of-the-documen

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!