Getting the Notepad++ XML parsing error “Extra content at the end of the document” even though there is none

问题

I am getting the abovementioned error message when trying to validate my 55 mb XML file in Notepad++. The first enocountered error is here (line 1441520 out of 22258651): Screenshot from Notepad++

I have turned on Show all characters. Nothing suggests that there should be any illegal characters at the end of the line. As you can see on the screenshot there are no other hidden characters than CR+LF.

EDIT: Below is a copy of the record that causes the parsing error in Notepad++:

    <?xml version="1.0" encoding="UTF-8"?>
<Registreringer>
  <Registrering>
    <ID>1697947</ID>
    <LHAnr>316-01</LHAnr>
    <RegId>316-01K1037</RegId>
    <RegType />
    <Signatur>K</Signatur>
    <Regnr>1037</Regnr>
    <srnr />
    <ArkivSkaber />
    <Journalnr />
    <Sted>460872</Sted>
    <sted1>315</sted1>
    <sted2>12</sted2>
    <sted3>0</sted3>
    <UTM />
    <Betegnelse>
Hidden.

Hidden.
</Betegnelse>
    <kat1 />
    <kat2 />
    <kat3 />
    <kat4 />
    <Datering>1804</Datering>
    <DateringNote />
    <Klausul>Almindelige regler</Klausul>
    <Bem />
    <BemEx1 />
    <BemEx2 />
    <IntBem />
    <KortResume>
Hidden

Opmaalt 1804 af Hidden.
</KortResume>
    <SogeOrd />
    <RegDato>25-04-2000 00:00:00</RegDato>
    <RegAf>Hidden</RegAf>
    <GodkDato />
    <Godkendt />
    <Varighed />
    <Fra>1804</Fra>
    <Til>1804</Til>
    <YderAar />
    <Signaturer />
    <IaltBind />
    <IaltPakker />
    <IaltLaeg />
    <Stiftet />
    <Nedlagt />
    <hyldemeter>0,00</hyldemeter>
    <hyldecentimeter />
    <placering />
    <Art>Markkort</Art>
    <Maal>26 x 38</Maal>
    <TeknOpl>
Affoto



</TeknOpl>
    <Fotograf />
    <Materiale />
    <materiale2 />
    <Negativ />
    <FotografNegativ />
    <foto1 />
    <foto2 />
    <Referencenr />
    <Ref>



</Ref>
    <Motiv />
    <Udgaver />
    <Obs />
    <billede />
    <Samlingstype>14</Samlingstype>
    <SkabelonId />
    <Publicering />
    <Materialetype />
    <PkBind>0</PkBind>
    <PkPakker>0</PkPakker>
    <PkLaeg>0</PkLaeg>
    <Henvisning>
      <Id>3592636</Id>
      <LhaNr>316-01</LhaNr>
      <RegId />
      <RegRef>1697947</RegRef>
      <SektionId />
      <Henvisning>Hidden</Henvisning>
      <StedId>460872</StedId>
      <Fra>1804</Fra>
      <Til>1804</Til>
      <DecimalId>1006268</DecimalId>
      <EmneordId>1449984</EmneordId>
      <EmneordLokal>
        <id>1449984</id>
        <LHAnr>316-01</LHAnr>
        <DecimalId>1006268</DecimalId>
        <Decimalklasse>40.164</Decimalklasse>
        <Emneord>Udskiftningskort</Emneord>
        <EmneStikord />
      </EmneordLokal>
      <StedLokal>
        <Id>460872</Id>
        <LhaNr>316-01</LhaNr>
        <StedKode>315-12-00</StedKode>
        <StedTxt>Hidden</StedTxt>
        <Sted1>315</Sted1>
        <Sted2>12</Sted2>
        <Sted3>0</Sted3>
        <GenStedkode />
      </StedLokal>
      <DecimalLokal>
        <ID>1006268</ID>
        <LHAnr>316-01</LHAnr>
        <Decimal>40.164</Decimal>
        <DecimalTxt>Kort</DecimalTxt>
        <CommonDecimal>40.164</CommonDecimal>
        <DecimalLokalStikord>
          <ID>6969206</ID>
          <LHAnr>316-01</LHAnr>
          <Decimal>40.164</Decimal>
          <Stikord>Kort</Stikord>
        </DecimalLokalStikord>
        <DecimalLokalStikord>
          <ID>6969207</ID>
          <LHAnr>316-01</LHAnr>
          <Decimal>40.164</Decimal>
          <Stikord>Matrikelkort</Stikord>
        </DecimalLokalStikord>
      </DecimalLokal>
    </Henvisning>
  </Registrering>
</Registreringer>

When using W3C's validator, I don't get any errors, so I suspect this is a Notepad++ specific issue with long XML files. Running EOL/blank removal scripts in Notepad++ also corrupts the file. I probably need to use a CLI based alternative... What do you recommend? @jim-garrison @villapx

回答1:

That error often occurs when there is a syntactic issue with your HTML tags, such as not properly closing a tag with a </tag> or having a space in a tag name.

Try pasting the contents of your XML file into a different XML validator, such as w3's, and see if you get the same error, or (hopefully) a more descriptive one.

To get a better answer, please provide a Minimal, Complete and Verifiable example so we can reproduce your problem.

来源：https://stackoverflow.com/questions/34344522/getting-the-notepad-xml-parsing-error-extra-content-at-the-end-of-the-documen

标签

xml

xml-parsing

syntax-error

notepad++

large-files