I need to find and combine information in some huge XML-files (doc <- xmlInternalTreeParse(file.name, useInternalNodes=TRUE, trim=TRUE) causes my 16GB computer to start swapping to disk before finishing), and have followed the good instructions on http://www.omegahat.org/RSXML/Overview.html.
Adding to the example from there, this is more or less what my file looks like:
<?xml version="1.0" ?>
<TABLE>
<SCHOOL>
<NAME> School1 </NAME>
<GRADES>
<STUDENT> Fred </STUDENT>
<TEST1> 66 </TEST1>
<TEST2> 80 </TEST2>
<FINAL> 70 </FINAL>
</GRADES>
<TEAMS>
<SOCCER> SoccerTeam1 </SOCCER>
<HOCKEY> HockeyTeam1 </HOCKEY>
</TEAMS>
</SCHOOL>
<SCHOOL>
<NAME> School2 </NAME>
<GRADES>
<STUDENT> Wilma </STUDENT>
<TEST1> 97 </TEST1>
<TEST2> 91 </TEST2>
<FINAL> 98 </FINAL>
</GRADES>
<TEAMS>
<SOCCER> SoccerTeam2 </SOCCER>
</TEAMS>
</SCHOOL>
</TABLE>
I need to list students per school with hockey-team, and the team-names. The wanted output from the example should be "Fred", "HockeyTeam1", "School1". The real example have thousands of "schools", "hockey teams" and "players".
How can I use xmlEventParse to parse the files to extract the info? I tried to extract all text-fields from the files, but after hours of waiting there was still no output. Note: The real files are more nested than this, so it is not enought to step fixed levels to find info.
See Question&Answers more detail:os