ruby - How to parse multiple XML files -


i trying parse multiple xml files nokogiri. in following format:

<?xml version="1.0" encoding="utf-8"?> <crdoc>[congressional record volume<volume>141</volume>, number<number>213</number>(<weekday>sunday</weekday>,<month>december</month>   <day>31</day>,<year>1995</year>)] [<chamber>senate</chamber>] [page<pages>s19323</pages>]<congress>104</congress>   <session>1</session>   <document_title>unanimous-consent request--house message on s. 1508</document_title>   <speaker name="mr. daschle">mr. daschle</speaker>.<speaking name="mr. daschle">mr. president, said on floor yesterday  afternoon, , repeat afternoon. know  distinguished majority leader wants agreement as do, ,  not hold him responsible fact not  able overcome impasse. commend him efforts @ trying  again today.</speaking>   <speaking name="mr. daschle">let me try 1 other option. have been unable agree  continuing resolution have put federal employees  work pay. have been unable agree  agreed last friday, 22d of december, have @ least  sent them offices without pay. perhaps can try this.</speaking>   <speaking name="mr. daschle">i ask unanimous consent senate proceed message  house on s. 1508, senate concur in house amendment  substitute amendment includes text of senator dole's  back-to-work bill, , house-passed expedited procedures shall take  effect if budget agreement not cut medicare more  necessary ensure solvency of medicare part trust fund and,  second, not raise taxes on working americans, not cut funding  education or environmental enforcement, , maintains  individual health guarantee under medicaid and, third, provides  tax reductions in budget agreement go americans making  under $100,000; motion concur agreed to, , motion  reconsider laid upon table.</speaking>   <speaker name="the acting president pro tempore">the acting president pro tempore</speaker>.<speaking name="the acting president pro tempore">is there objection?</speaking>   <speaker name="mr. dole">mr. dole</speaker>.<speaking name="mr. dole">mr. president, want few words.  object.</speaking>   <speaking name="mr. dole">we working on lot of these things in our meetings @ white  house, have both been number of hours. think have  made progress. long way solution yet.</speaking>   <speaking name="mr. dole">i think of things listed democratic leader areas  of concern in meetings have had. , meetings start  again on tuesday. seems me not appropriate  proceed under terms, , therefore object.</speaking>   <speaker name="the acting president pro tempore">the acting president pro tempore</speaker>.<speaking name="the acting president pro tempore">objection heard.</speaking> </crdoc> 

the code using came previous , has worked treat far. however, format of xml files has changed , left code unusable. code have this:

doc.xpath("//speech/speaking/@name").map(&:text).uniq.each |name|   speaker = nokogiri::xml('<root/>')   doc.xpath('//speech').each |speech|     speech_node = nokogiri::xml('<speech/>')     speech.xpath("*[@name='#{name}']").each |speaking|       speech_node.root.add_child(speaking)     end     speaker.root.add_child(speech_node.root) unless speech_node.root.children.empty?   end   file.open("test/" + name + "-" + year + ".xml", 'a+') |f|     f.write speaker.root.children   end end 

i create new xml file each speaker , in each new xml file have said. code needs able cycle through various xml files in directory , place each speech in appropriate speaker file. thinking accomplished find -exec command.

ultimately, code should:

  1. create xml file speakers name , year i.e., mr. boehner_2011.xml
  2. the xml file hold of speeches year.
  3. the xml file have crdoc root node.

since don't have <speech> element anymore, need remove code:

doc.xpath("//speaking/@name").map(&:text).uniq.each |name|   speaker = nokogiri::xml('<root/>')   doc.xpath('//crdoc').each |speech|     speech_node = nokogiri::xml('<speech/>')     speech.xpath("*[@name='#{name}']").each |speaking|       speech_node.root.add_child(speaking)     end     speaker.root.add_child(speech_node.root) unless speech_node.root.children.empty?   end   file.open("test/" + name + "-" + year + ".xml", 'a+') |f|     f.write speaker.root.children   end end 

Comments

Popular posts from this blog

C# random value from dictionary and tuple -

cgi - How do I interpret URLs without extension as files rather than missing directories in nginx? -

.htaccess - htaccess convert request to clean url and add slash at the end of the url -