python - Run-Length encoding gives the same number to all repeating values -
i building compressor short strings mixing different compression algorithms , rle 1 of , giving problem.
the script have following, altough pretty incomplete @ moment:
# -*- coding: utf-8 -*- import re dictionary = {'hello':'\§', 'world':'\°', 'the': '\@', 'for': '\]'} a_test_string = 'hello******** world****!' def compress(string, dictionary): pattern = re.compile( '|'.join(dictionary.keys() )) result = pattern.sub(lambda value: dictionary[value.group() ], string) ''' here should implement snippet check characters beginning "\" won't replaced , screw result. ''' character in string: occurrence = string.count(character*2) there_is_more_than_one_occurrence = occurrence > 1 if there_is_more_than_one_occurrence: second_regex_pass_for_multiple_occurrences = re.sub('\*\*\*+', '/'+character+str(occurrence), result) result = second_regex_pass_for_multiple_occurrences print 'original string:', string print 'compressed string:', result print 'original size:', len(string) print 'compressed size:', len(result) compress(a_test_string, dictionary)
when run function this:
original string: hello******** world****! compressed string: \§/*6 \@ \°/*6! original size: 31 compressed size: 20
but should getting:
original string: hello******** world****! compressed string: \§/*8 \@ \°/*4! original size: 31 compressed size: 20
what i'm doing wrong here both 6 count of repeating chars?
i'm not going try understand you're doing, debug method add "print" statements inside loop or use python debugger , see what's happening. try running of these calls , see what's being returned.
i think main problem "string.count" returns count entire string when checks 2 *
s first time sees 12 (or technically 6 patterns of **
). when loop checks next set of *
s still checking entire string. hope helps.
Comments
Post a Comment