beautifulsoup - Extracting data from a web page using BS4 in Python -
i trying extract data site: http://www.afl.com.au/fixture
in way such have dictionary having date key , "preview" links values in list, like
dict = {saturday, june 07: ["preview url-1, "preview url-2","preview url-3","preview url-4"]}
please me it, have used code below:
def extractdata(): ldateinfomatchcase = false # ldateinfomatchcase = [] global gdict row in table_for_players.findall("tr"): ldaterowindex in row.findall("th", {"colspan" : "4"}): ldatelist.append(ldaterowindex.text) print ldatelist index in ldatelist: #print index lpreviewlinklist = [] row in table_for_players.findall("tr"): ldaterowindex in row.findall("th", {"colspan" : "4"}): if ldaterowindex.text == index: ldateinfomatchcase = true else: ldateinfomatchcase = false if ldateinfomatchcase == true: linforowindex in row.findall("td", {"class": "info"}): link in linforowindex.findall("a", {"class" : "preview"}): lpreviewlinklist.append("http://www.afl.com.au/" + link.get('href')) print lpreviewlinklist gdict[index] = lpreviewlinklist
my main aim player names playing match in home , in away team according date in data structure.
i prefer using css selectors. select first table, rows in tbody
ease of processing; rows 'grouped' tr th
rows. there can select next siblings don't contain th
headers , scan these preview links:
previews = {} table = soup.select('table.fixture')[0] group_header in table.select('tbody tr th'): date = group_header.string next_sibling in group_header.parent.find_next_siblings('tr'): if next_sibling.th: # found next group, end scan break preview in next_sibling.select('a.preview'): previews.setdefault(date, []).append( "http://www.afl.com.au" + preview.get('href'))
this builds dictionary of lists; current version of page produces:
{u'monday, june 09': ['http://www.afl.com.au/match-centre/2014/12/melb-v-coll'], u'sunday, june 08': ['http://www.afl.com.au/match-centre/2014/12/gcfc-v-syd', 'http://www.afl.com.au/match-centre/2014/12/fre-v-adel', 'http://www.afl.com.au/match-centre/2014/12/nmfc-v-rich']}
Comments
Post a Comment