Tulosta
XML to CSV (Xml2Csv)
[ Howto Frontpage ]

I have done some XML to CSV conversion. I'm lazy, so liked to use methods, which are generic. No need to edit stylesheets everytime. I hope this helps to convert XML data into CSV or other easier format to use in shell scripts.

I'm XML novice (mostly cut&paste), so if some XML guru like to fix my stylesheets, send email.

Links, which has helped



My solution

Some example XML (example.xml)
<source>
   <sourcefield businessname ="" datatype ="DATE" description ="" fieldnumber ="1"
      fieldproperty ="0" fieldtype ="ELEMITEM" hidden ="NO" 
      keytype ="NOT A KEY" length ="19"
      level ="0" name ="BUSINESS_DATE" nullable ="NULL" 
      occurs ="0" offset ="0" physicallength ="19"
      physicaloffset ="0" picturetext ="" precision ="19" scale ="0" usage_flags =""/>
   <sourcefield businessname ="JOHNSON" datatype ="DATE" 
      description ="SHOEMAKER" fieldnumber ="1"
      fieldproperty ="1" fieldtype ="ELEMITEM" hidden ="NO" 
      keytype ="NOT A KEY" length ="19"
      level ="3" name ="BUSINESS_DATE" nullable ="NULL" occurs ="0" 
      offset ="0" physicallength ="19"
      physicaloffset ="0" picturetext ="" precision ="19" scale ="0" usage_flags ="jj"/>
</source>

Convert XML into elements

Element format is easier to use in shell scripts, awk, ...
Convert into elements (example.xml => example.2.xml)
<?xml version="1.0"?>
<source>
  <sourcefield>
    <businessname/>
    <datatype>DATE</datatype>
    <description/>
    <fieldnumber>1</fieldnumber>
    <fieldproperty>0</fieldproperty>
    <fieldtype>ELEMITEM</fieldtype>
    <hidden>NO</hidden>
    <keytype>NOT A KEY</keytype>
    <length>19</length>
    <level>0</level>
    <name>BUSINESS_DATE</name>
    <nullable>NULL</nullable>
    <occurs>0</occurs>
    <offset>0</offset>
    <physicallength>19</physicallength>
    <physicaloffset>0</physicaloffset>
    <picturetext/>
    <precision>19</precision>
    <scale>0</scale>
    <usage_flags/>
  </sourcefield>
  <sourcefield>
    <businessname>JOHNSON</businessname>
    <datatype>DATE</datatype>
    <description>SHOEMAKER</description>
    <fieldnumber>1</fieldnumber>
    <fieldproperty>1</fieldproperty>
    <fieldtype>ELEMITEM</fieldtype>
    <hidden>NO</hidden>
    <keytype>NOT A KEY</keytype>
    <length>19</length>
    <level>3</level>
    <name>BUSINESS_DATE</name>
    <nullable>NULL</nullable>
    <occurs>0</occurs>
    <offset>0</offset>
    <physicallength>19</physicallength>
    <physicaloffset>0</physicaloffset>
    <picturetext/>
    <precision>19</precision>
    <scale>0</scale>
    <usage_flags>jj</usage_flags>
  </sourcefield>
</source>


Xml convert into elements has done using command: xsltproc xml2elements.xsl example.xml > example.2.xml

xml2elements.xsl
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:output indent="yes" />

<xsl:strip-space elements="*" />

<xsl:template match="*">
<xsl:copy>
   <xsl:if test="@*">
       <xsl:for-each select="@*">
	  <xsl:element name="{name()}">
	      <xsl:value-of select="." />
	  </xsl:element>
       </xsl:for-each>
   </xsl:if>
   <xsl:apply-templates />
</xsl:copy>
</xsl:template>

</xsl:stylesheet>


XML into CSV

This solution works fine if XML is in element format. Look previous sections howto make conversion.

Convert example.2.xml into csv format.

Convert has done using next command, some examples to use paramaters.

# convert into elements
xsltproc xml2elements.xsl example.xml > example.2.xml
# convert into csv, using default parameters
xsltproc xml2csv.xsl example.2.xml
# convert into csv, without column names
xsltproc -param colnames n xml2csv.xsl example.2.xml
# convert using delimiter | and without colnames
xsltproc -param colnames n -stringparam delim "|" xml2csv.xsl example.2.xml


xml2csv.xsl
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="iso-8859-1"/>

<xsl:param name="delim" select="';'" />
<xsl:param name="break" select="'&#xA;'" /><!-- xA = NL, xD = CR -->
<xsl:param name="colnames" select="'y'"/>

<xsl:strip-space elements="*" />


<xsl:template match="/*/child::*">
<!--headerline-->
<xsl:if test="$colnames = 'y'">
  <xsl:if test="position() = 1">
      <xsl:for-each select="child::*">
	     <xsl:value-of select="name()"/>
             <xsl:if test="position() != last()">
	           <xsl:value-of select="$delim"/>
             </xsl:if>
      </xsl:for-each>
      <!-- hardcode version newline -->
	<!--<xsl:text>&#xa;</xsl:text>-->
      <!-- linebreak, nicer -->
      <xsl:value-of select="$break" />
  </xsl:if>
</xsl:if>


<!--dataline-->
<xsl:for-each select="child::*">

<xsl:if test="position() != last()"><xsl:value-of select="normalize-space(.)"/>
	  <xsl:value-of select="$delim" />
</xsl:if>

<xsl:if test="position()  = last()"><xsl:value-of select="normalize-space(.)"/>
	  <xsl:value-of select="$break" />
</xsl:if>

</xsl:for-each>
</xsl:template>

</xsl:stylesheet>


XML into DATA

Next stylesheet make output like:
occurs=0
offset=0
physicallength=19
physicaloffset=0
picturetext=
precision=19
scale=0
usage_flags=
ENDRECORD
occurs=0
offset=0
physicallength=19
physicaloffset=0
picturetext=
precision=19
scale=0
usage_flags=jj
ENDRECORD


Every record end has marked line ENDRECORD.
xml2data.xsl
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="iso-8859-1"/>

<xsl:param name="delim" select="'&#xA;'" />
<xsl:param name="break" select="'&#xA;ENDRECORD&#xA;'" />

<xsl:strip-space elements="*" />


<xsl:template match="/*/child::*">

<!--dataline-->
<xsl:for-each select="child::*">

<xsl:value-of select="name()"/><xsl:text>=</xsl:text>
<xsl:if test="position() != last()"><xsl:value-of select="normalize-space(.)"/>
				<xsl:value-of select="$delim" />
</xsl:if>

<xsl:if test="position()  = last()"><xsl:value-of select="normalize-space(.)"/>
				<xsl:value-of select="$break" />
</xsl:if>

</xsl:for-each>
</xsl:template>

</xsl:stylesheet>

Done using command:
xsltproc xml2data.xsl example.2.xml


Some methods to clean/flat/... XML file

Here is some useful templates to edit xml before convert to the csv format. Rename, remove, flatting - less hierarchical,...

ex1.xml
<?xml version="1.0"?>
<!--this is comments -->
<rootnode>
 <products author="Jesper">
  <product id="p1">
    <name>Delta</name>
    <price format="xxx">800</price>
    <stock>4</stock>
    <subnode>
            <sub1>1</sub1>
    </subnode>
    <some>
            <some1>value921</some1>
            <some2>value566</some2>
    </some>
    <country>Denmark</country>
  </product>
  <product id="p2">
    <name>Golf</name>
    <price format="xxx">1000</price>
    <stock>5</stock>
    <subnode>
            <sub1>2</sub1>
    </subnode>
    <some>
            <some1>value1</some1>
            <some2>value2</some2>
    </some>
    <country>Germany</country>
  </product>
  <product id="p3">
    <name>Alfa</name>
    <price format="xxx">1200</price>
    <stock>19</stock>
    <subnode>
            <sub1>2</sub1>
    </subnode>
    <country>Germany</country>
  </product>
  <product id="p4">
    <name>Foxtrot</name>
    <price format="xxx">1500</price>
    <stock>5</stock>
    <subnode>
            <sub1>2</sub1>
    </subnode>
    <country>Australia</country>
  </product>
<!-- p5 is a brand new product -->
  <product id="p5">
    <name>Tango</name>
    <price format="xxx">1225</price>
    <stock>3</stock>
    <subnode>
            <sub1>2</sub1>
    </subnode>
    <country>Japan</country>
  </product>
<!-- p6 is a old product -->
  <product id="p6">
    <name>Tango</name>
    <price format="xxx">1225</price>
    <stock>3</stock>
    <subnode>
            <sub1>2</sub1>
    </subnode>
    <country>Japan</country>
  </product>
 </products>
</rootnode>

Here is some templates to edit previous xml. Remove attributes, remove elements, remove parent element, remove child element, ...
remove.xsl

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

   <xsl:output indent="yes" />
   <xsl:strip-space elements="*" />

   <!-- default copy all node -->
   <xsl:template match="@*|node()">
     <xsl:copy>
       <xsl:apply-templates select="@*|node()"/>
     </xsl:copy>
   </xsl:template>

   <!-- template for the document element -->
   <!-- = remove root node -->
   <xsl:template match="/*">
     <xsl:apply-templates select="node()" />
   </xsl:template>

  <!-- remove attributes  <xxx attr="some">value</xxx> -->
  <xsl:template match="@attr"/>
  <xsl:template match="@format"/>
  <xsl:template match="@author"/>


  <!-- remove element  <xxx>value</xxx> -->
  <xsl:template match="xxx"/>
  <xsl:template match="country"/>

  <!-- remove only element childs <xxx><child1>val</child1></xxx> -->
  <xsl:template match="xxx/*"  />

   <!-- move children element value for mother and remove child = more flat, 
           less hierarchical -->
   <xsl:template match="subnode/*">
     <xsl:apply-templates select="node()" />
   </xsl:template>

   <!-- remove some element but leave the child = more flat, less hierarchical -->
   <xsl:template match="some">
     <xsl:apply-templates select="node()" />
   </xsl:template>

   <!-- remove some data based on attribute value -->
   <xsl:template match="product[@id='p6']" />

   <!-- remove comments -->
   <xsl:template match="comment()"/>

   <!-- rename element name=>itemname -->
   <xsl:template match="name">
    <itemname>
      <xsl:apply-templates select="@*|node()"/>
    </itemname>
   </xsl:template>

<!--
<xsl:value-of select="Person/FirstName"/>
-->

<!-- substring from attr.
<xsl:value-of select="substring(/person/@someattr,1,4)"/>
-->

<!--  Like to split parent value and child element
    <Parent>
       <Child><Attr1>Value</Attr1>0
       </Child>
     </Parent>
    change to:
    <ParentChildAttr1>Value</ParentChildAttr1>
    <ParentChild>6</ParentChild>
-->
    <!-- rewrite element hierarchy , make new names -->
    <xsl:template match="Parent">
        <ParentChildAttr1>
                <xsl:value-of select="Child/Attr1" />
        </ParentChildAttr1>

        <ParentChild>
                <!-- aiti elementin arvo, lapset ei kiinnosta -->
                <xsl:value-of select="<Child/text()" />
        </<ParentChild>
   </xsl:template>


<!-- some usable value-of ... -->

     <!-- value of element -->
     <!--<xsl:value-of select="Child" /> -->
     <!--<xsl:apply-templates select="node()" /> -->
     <!--<xsl:apply-templates select="@*" /> -->
     <!--<xsl:value-of select="." /> -->
     <!--<xsl:value-of select="self::node()" />-->
     <!--<xsl:value-of select="self" />-->
     <!--<xsl:value-of select="Child/@*" />-->
     <!--<xsl:value-of select="Child/text()" />-->




</xsl:stylesheet>

Run command

xsltproc remove.xsl ex1.xml

Output is:

<?xml version="1.0"?>
<products>
  <product id="p1">
    <itemname>Delta</name>
    <price>800</price>
    <stock>4</stock>
    <subnode>1</subnode>
  </product>
  <product id="p2">
    <itemname>Golf</name>
    <price>1000</price>
    <stock>5</stock>
    <subnode>2</subnode>
    <some1>value1</some1>
    <some2>value2</some2>
  </product>
  <product id="p3">
    <itemname>Alfa</name>
    <price>1200</price>
    <stock>19</stock>
    <subnode>2</subnode>
  </product>
  <product id="p4">
    <itemname>Foxtrot</name>
    <price>1500</price>
    <stock>5</stock>
    <subnode>2</subnode>
  </product>
  <product id="p5">
    <itemname>Tango</name>
    <price>1225</price>
    <stock>3</stock>
    <subnode>2</subnode>
  </product>
</products>

And all together. Basic data is ex1.xml and you need csv.

# remove some elements and attributes
cmd > xsltproc remove.xsl ex1.xml > ex1.b.xml
# change attributes => elements
cmd > xsltproc xml2elements.xsl ex1.b.xml > ex1.e.xml
# to csv
cmd > xsltproc xml2csv.xsl ex1.e.xml
# result csv
id;itemname;price;stock;subnode;some1;some2
p1;Delta;800;4;1;value1;value2
p2;Golf;1000;5;999;val911;val912
p3;Alfa;1200;19;2
p4;Foxtrot;1500;5;2
p5;Tango;1225;3;2