ruby - How to parse multiple XML files -
i trying parse multiple xml files nokogiri. in following format:
<?xml version="1.0" encoding="utf-8"?> <crdoc>[congressional record volume<volume>141</volume>, number<number>213</number>(<weekday>sunday</weekday>,<month>december</month> <day>31</day>,<year>1995</year>)] [<chamber>senate</chamber>] [page<pages>s19323</pages>]<congress>104</congress> <session>1</session> <document_title>unanimous-consent request--house message on s. 1508</document_title> <speaker name="mr. daschle">mr. daschle</speaker>.<speaking name="mr. daschle">mr. president, said on floor yesterday afternoon, , repeat afternoon. know distinguished majority leader wants agreement as do, , not hold him responsible fact not able overcome impasse. commend him efforts @ trying again today.</speaking> <speaking name="mr. daschle">let me try 1 other option. have been unable agree continuing resolution have put federal employees work pay. have been unable agree agreed last friday, 22d of december, have @ least sent them offices without pay. perhaps can try this.</speaking> <speaking name="mr. daschle">i ask unanimous consent senate proceed message house on s. 1508, senate concur in house amendment substitute amendment includes text of senator dole's back-to-work bill, , house-passed expedited procedures shall take effect if budget agreement not cut medicare more necessary ensure solvency of medicare part trust fund and, second, not raise taxes on working americans, not cut funding education or environmental enforcement, , maintains individual health guarantee under medicaid and, third, provides tax reductions in budget agreement go americans making under $100,000; motion concur agreed to, , motion reconsider laid upon table.</speaking> <speaker name="the acting president pro tempore">the acting president pro tempore</speaker>.<speaking name="the acting president pro tempore">is there objection?</speaking> <speaker name="mr. dole">mr. dole</speaker>.<speaking name="mr. dole">mr. president, want few words. object.</speaking> <speaking name="mr. dole">we working on lot of these things in our meetings @ white house, have both been number of hours. think have made progress. long way solution yet.</speaking> <speaking name="mr. dole">i think of things listed democratic leader areas of concern in meetings have had. , meetings start again on tuesday. seems me not appropriate proceed under terms, , therefore object.</speaking> <speaker name="the acting president pro tempore">the acting president pro tempore</speaker>.<speaking name="the acting president pro tempore">objection heard.</speaking> </crdoc>
the code using came previous , has worked treat far. however, format of xml files has changed , left code unusable. code have this:
doc.xpath("//speech/speaking/@name").map(&:text).uniq.each |name| speaker = nokogiri::xml('<root/>') doc.xpath('//speech').each |speech| speech_node = nokogiri::xml('<speech/>') speech.xpath("*[@name='#{name}']").each |speaking| speech_node.root.add_child(speaking) end speaker.root.add_child(speech_node.root) unless speech_node.root.children.empty? end file.open("test/" + name + "-" + year + ".xml", 'a+') |f| f.write speaker.root.children end end
i create new xml file each speaker , in each new xml file have said. code needs able cycle through various xml files in directory , place each speech in appropriate speaker file. thinking accomplished find -exec
command.
ultimately, code should:
- create xml file speakers name , year i.e.,
mr. boehner_2011.xml
- the xml file hold of speeches year.
- the xml file have
crdoc
root node.
since don't have <speech>
element anymore, need remove code:
doc.xpath("//speaking/@name").map(&:text).uniq.each |name| speaker = nokogiri::xml('<root/>') doc.xpath('//crdoc').each |speech| speech_node = nokogiri::xml('<speech/>') speech.xpath("*[@name='#{name}']").each |speaking| speech_node.root.add_child(speaking) end speaker.root.add_child(speech_node.root) unless speech_node.root.children.empty? end file.open("test/" + name + "-" + year + ".xml", 'a+') |f| f.write speaker.root.children end end
Comments
Post a Comment