Large XML files could sometimes lead to overall slowness of your program. Thus it makes sense to sometimes split them to rearrange contents within to optimize querying your XML files. This post will touch base on how to split a large XML file on windows OS and I will use a cool tool written by Michel Rodriguez.
XML-twig is presumably an open source library that works on large XML files. Notable tools includes
- XML Grep
- XML merge
- XML pretty printer
- XML spell check
- XML split
Our focus is to build this library on windows so that we can use it from windows command prompt
Download XML-Twig
Get this library from cpan websites. Here is the URL http://search.cpan.org/~mirod/XML-Twig-3.48/Twig.pm
Download the XML-Twig-3.48.tar.gz file and extract it under c:\devtools
Here is a screenshot of what it looks like on my machine
Lets leave it there for the time being and install prerequisites
Pre-requisites
There are a couple of pre requisites that you will want to install.
- Perl. As this library is written in perl so you would want to install Perl on your system. ActiveState perl is something that I sometimes use and is a good option by far on Windows computers. Download for your processor arch 32 bit or 64 bit from the URL below
http://www.activestate.com/activeperl/downloads
I installed it under c:\perl64 - Visual studio Express 2010 for tool called “nmake”
I’ve got Visual studio 2012 but you can get an express edition of “Express 2013 for Windows Desktop” from URL below
http://www.visualstudio.com/en-US/products/visual-studio-express-vs#2010-Visual-CPP - Now that
perl
is available from command prompt You may also want to run this command
ppm install MinGW64
Above will install dmake as a perl module too just in case if you want to use that. But in this article I will use name.
That’s it for the pre requisites and I hope that you would have installed both of the above so that we can move on to next step.
Making
Ok so lets start with actually installing XML-Twig as a set of tools for you.
- Open command prompt as Administrator and change directory to c:\devtools
1cd c:\devtools - Run command below to install all mentioned tool at the start of this article e.g. xml merge, xml_split etc,
1perl Makefile.PL -y - Please note that as per documentation Here are separate dependencies you will want to install
XML::Twig needs XML::Parser (and the expat library) installed
12345678910111213141516Modules that can enhance XML::Twig are:Scalar::Util or WeakRefto avoid memory leaksEncode or Text::Iconv or Unicode::Map8 and Unicode::Stringsto do encoding conversionsTie::IxHashto use the keep_atts_order optionXML::XPathEngineto use XML::Twig::XPathLWPto use parseurlHTML::Entitiesto use the html_encode filterHTML::TreeBuilderto process HTML instead of XML
- (Optionally) However if you want to just install xml_split then just execute the command below
1perl Makefile.PL - Final steps
123nmake.exenmake.exe testnmake.exe install
nmake is generally found here “C:\Program Files (x86)\Microsoft Visual Studio xx.x\VC\bin\nmake.exe”
Once you have completed the final step you will see that files such as xml_split.bat etc are now available under site/bin folder of whereever you have installed perl as I mentioned I installed it under c:\perl64 so I find those commands under c:\perl64\site\bin
If you include this path in environment PATH variable then you will be able to just run something like
c:\xml_split -h
Before I close of this article here is a screenshot of what you will see when you follow above steps
I hope this will work out for you too. In case you have question leave your comment and someone from perl community will be able to help you.
Regards,
Leave a Reply