How to convert a text file into a hiararchical arrangement using xslt 1.0

倖福魔咒の 提交于 2019-12-24 11:34:08

问题


I have a text file, entries are separated by a carriage return and line feed. One entry for each tree number and the corresponding heading and the heading is separated from the tree number by a semi-colon.

Sample Text from File

Body Regions;A01
Anatomic Landmarks;A01.111
Breast;A01.236
Mammary Glands, Human;A01.236.249
Nipples;A01.236.500
Extremities;A01.378
Amputation Stumps;A01.378.100
Lower Extremity;A01.378.610
Buttocks;A01.378.610.100
Foot;A01.378.610.250
Ankle;A01.378.610.250.149
Forefoot, Human;A01.378.610.250.300
Metatarsus;A01.378.610.250.300.480
Toes;A01.378.610.250.300.792
Hallux;A01.378.610.250.300.792.380
Heel;A01.378.610.250.510
Hip;A01.378.610.400
Knee;A01.378.610.450
Leg;A01.378.610.500
Thigh;A01.378.610.750
Upper Extremity;A01.378.800
Arm;A01.378.800.075
Axilla;A01.378.800.090
Elbow;A01.378.800.420
Forearm;A01.378.800.585
Hand;A01.378.800.667
Fingers;A01.378.800.667.430
Thumb;A01.378.800.667.430.705
Metacarpus;A01.378.800.667.572
Wrist;A01.378.800.667.715
Shoulder;A01.378.800.750

UPDATE Based on the comments that input file should be an XML, below is the converted input file:

<?xml version="1.0"?>
<ROWSET>
    <ROW>
        <label>Body Regions</label>
        <id>A01</id>
    </ROW>
    <ROW>
        <label>Anatomic Landmarks</label>
        <id>A01.111</id>
    </ROW>
    <ROW>
        <label>Breast</label>
        <id>A01.236</id>
    </ROW>
    <ROW>
        <label>Mammary Glands, Human</label>
        <id>A01.236.249</id>
    </ROW>
    <ROW>
        <label>Nipples</label>
        <id>A01.236.500</id>
    </ROW>
    <ROW>
        <label>Extremities</label>
        <id>A01.378</id>
    </ROW>
    <ROW>
        <label>Amputation Stumps</label>
        <id>A01.378.100</id>
    </ROW>
    <ROW>
        <label>Lower Extremity</label>
        <id>A01.378.610</id>
    </ROW>
    <ROW>
        <label>Buttocks</label>
        <id>A01.378.610.100</id>
    </ROW>
    <ROW>
        <label>Foot</label>
        <id>A01.378.610.250</id>
    </ROW>
    <ROW>
        <label>Ankle</label>
        <id>A01.378.610.250.149</id>
    </ROW>
    <ROW>
        <label>Forefoot, Human</label>
        <id>A01.378.610.250.300</id>
    </ROW>
    <ROW>
        <label>Metatarsus</label>
        <id>A01.378.610.250.300.480</id>
    </ROW>
    <ROW>
        <label>Toes</label>
        <id>A01.378.610.250.300.792</id>
    </ROW>
    <ROW>
        <label>Hallux</label>
        <id>A01.378.610.250.300.792.380</id>
    </ROW>
    <ROW>
        <label>Heel</label>
        <id>A01.378.610.250.510</id>
    </ROW>
    <ROW>
        <label>Hip</label>
        <id>A01.378.610.400</id>
    </ROW>
    <ROW>
        <label>Knee</label>
        <id>A01.378.610.450</id>
    </ROW>
    <ROW>
        <label>Leg</label>
        <id>A01.378.610.500</id>
    </ROW>
    <ROW>
        <label>Thigh</label>
        <id>A01.378.610.750</id>
    </ROW>
    <ROW>
        <label>Upper Extremity</label>
        <id>A01.378.800</id>
    </ROW>
    <ROW>
        <label>Arm</label>
        <id>A01.378.800.075</id>
    </ROW>
    <ROW>
        <label>Axilla</label>
        <id>A01.378.800.090</id>
    </ROW>
    <ROW>
        <label>Elbow</label>
        <id>A01.378.800.420</id>
    </ROW>
    <ROW>
        <label>Forearm</label>
        <id>A01.378.800.585</id>
    </ROW>
    <ROW>
        <label>Hand</label>
        <id>A01.378.800.667</id>
    </ROW>
    <ROW>
        <label>Fingers</label>
        <id>A01.378.800.667.430</id>
    </ROW>
    <ROW>
        <label>Thumb</label>
        <id>A01.378.800.667.430.705</id>
    </ROW>
    <ROW>
        <label>Metacarpus</label>
        <id>A01.378.800.667.572</id>
    </ROW>
    <ROW>
        <label>Wrist</label>
        <id>A01.378.800.667.715</id>
    </ROW>
    <ROW>
        <label>Shoulder</label>
        <id>A01.378.800.750</id>
    </ROW>
</ROWSET>

Output XML Should look like:

<node id="MESH" label="NIH Medical Subject Headings">
<isComposedBy>
    <node id="A01" label="Body Regions">
        <isComposedBy>
            <node id="Anatomic Landmarks" label="A01.111"/>
            <node id="Breast" label="A01.236">
                <isComposedBy>
                    <node id="A01.236.249" label="Mammary Glands, Human"/>
                    <node id="A01.236.500" label="Nipples"/>
                </isComposedBy>
            </node>
            <node id="Extremities" label="A01.378">
                <isComposedBy>
                    <node id="A01.378.100" label="Amputation Stumps"/>
                    <node id="A01.378.610" label="Lower Extremity">
                        <isComposedBy>
                            <node id="A01.378.610.100" label="Buttocks"/>
                            <node id="A01.378.610.250" label="Foot">
                                <isComposedBy>
                                    <node id="A01.378.610.250.149" label="Ankle"/>
                                    <node id="A01.378.610.250.300" label="Forefoot">
                                        <isComposedBy>
                                            <node id="A01.378.610.250.300.480" label="Metatarsus"/>
                                            <node id="A01.378.610.250.300.792" label="Toes">
                                                <isComposedBy>
                                                    <node id="A01.378.610.250.300.792.380" label="Hallux"/>
                                                </isComposedBy>
                                            </node>
                                        </isComposedBy>
                                    </node>
                                    <node id="A01.378.610.250.510" label="Heel"/>
                                </isComposedBy>
                            </node>
                            <node id="A01.378.610.400" label="Hip"/>
                            <node id="A01.378.610.450" label="Knee"/>
                            <node id="A01.378.610.500" label="Leg"/>
                            <node id="A01.378.610.750" label="Thigh"/>
                        </isComposedBy>
                    </node>
                    <node id="A01.378.800" label="Upper Extremity">
                        <isComposedBy>
                            <node id="A01.378.800.075" label="Arm"/>
                            <node id="A01.378.800.090" label="Axilla"/>
                            <node id="A01.378.800.420" label="Elbow"/>
                            <node id="A01.378.800.585" label="Forearm"/>
                            <node id="A01.378.800.667" label="Hand">
                                <isComposedBy>
                                    <node id="A01.378.800.667.430" label="Fingers">
                                        <isComposedBy>
                                            <node id="A01.378.800.667.430.705" label="Thumb"/>
                                        </isComposedBy>
                                    </node>
                                    <node id="A01.378.800.667.572" label="Metacarpus"/>
                                    <node id="A01.378.800.667.715" label="Wrist"/>
                                </isComposedBy>
                            </node>
                            <node id="A01.378.800.750" label="Shoulder"/>
                        </isComposedBy>
                    </node>
                </isComposedBy>
            </node>
        </isComposedBy>
    </node>
</isComposedBy>
</node>

From the example above: Thumb with tree number A01.378.800.667.430.705 is part of Fingers;A01.378.800.667.430 which is part of Hand;A01.378.800.667 which is part of Upper Extremity;A01.378.800 that is part of Extremities;A01.378 and the root is Body Regions;A01.


回答1:


If it can be assumed that each "step" in an id is exactly 3 characters long, then you could do it this way:

XSLT 1.0

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:key name="child" match="ROW" use="substring(id, 1, string-length(id) - 4)" />

<xsl:template match="/ROWSET">
    <node id="MESH" label="NIH Medical Subject Headings">
        <xsl:apply-templates select="ROW[not(contains(id, '.'))]"/>
    </node>
</xsl:template>

<xsl:template match="ROW">
    <node id="{id}" label="{label}">
        <xsl:variable name="children" select="key('child', id)" />
        <xsl:if test="$children">
            <isComposedBy>
                <xsl:apply-templates select="$children"/>
            </isComposedBy>
        </xsl:if>
    </node>
</xsl:template>

</xsl:stylesheet> 

Otherwise it gets more complicated (in XSLT 1.0) and using an EXSLT extension would be really helpful here - if your processor can support it.

Alternatively, as it seems that your input is coming from a database, if you could modify its output to something like:

<ROWSET>
    <ROW>
        <label>Body Regions</label>
        <id>A01</id>
        <parent-id/>
    </ROW>
    <ROW>
        <label>Anatomic Landmarks</label>
        <id>A01.111</id>
        <parent-id>A01</parent-id>
    </ROW>
    <ROW>
        <label>Breast</label>
        <id>A01.236</id>
        <parent-id>A01</parent-id>
    </ROW>
    <ROW>
        <label>Mammary Glands, Human</label>
        <id>A01.236.249</id>
        <parent-id>A01.236</parent-id>
    </ROW>
    ...
    <ROW>
        <label>Shoulder</label>
        <id>A01.378.800.750</id>
        <parent-id>A01.378.800</parent-id>
    </ROW>
</ROWSET>

you would have a much more convenient starting point.



来源:https://stackoverflow.com/questions/31364818/how-to-convert-a-text-file-into-a-hiararchical-arrangement-using-xslt-1-0

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!