python - Run-Length encoding gives the same number to all repeating values -

- April 15, 2015

i building compressor short strings mixing different compression algorithms , rle 1 of , giving problem.

the script have following, altough pretty incomplete @ moment:

# -*- coding: utf-8 -*-  import re  dictionary = {'hello':'\§', 'world':'\°', 'the': '\@', 'for': '\]'} a_test_string = 'hello******** world****!'  def compress(string, dictionary):     pattern = re.compile( '|'.join(dictionary.keys() ))      result = pattern.sub(lambda value: dictionary[value.group() ], string)      '''     here should implement snippet check characters beginning "\" won't replaced , screw result.     '''      character in string:         occurrence = string.count(character*2)         there_is_more_than_one_occurrence = occurrence > 1          if there_is_more_than_one_occurrence:                  second_regex_pass_for_multiple_occurrences = re.sub('\*\*\*+', '/'+character+str(occurrence), result)                 result = second_regex_pass_for_multiple_occurrences      print 'original string:', string      print 'compressed string:', result      print 'original size:', len(string)      print 'compressed size:', len(result)   compress(a_test_string, dictionary)

when run function this:

original string: hello******** world****! compressed string: \§/*6 \@ \°/*6! original size: 31 compressed size: 20

but should getting:

original string: hello******** world****! compressed string: \§/*8 \@ \°/*4! original size: 31 compressed size: 20

what i'm doing wrong here both 6 count of repeating chars?

i'm not going try understand you're doing, debug method add "print" statements inside loop or use python debugger , see what's happening. try running of these calls , see what's being returned.

i think main problem "string.count" returns count entire string when checks 2 *s first time sees 12 (or technically 6 patterns of **). when loop checks next set of *s still checking entire string. hope helps.

Search This Blog

Backgorund

python - Run-Length encoding gives the same number to all repeating values -

Comments

Post a Comment

Popular posts from this blog

c++ - Visual Leak Detector detects leak on new blank MFC project -

cgi - How do I interpret URLs without extension as files rather than missing directories in nginx? -

.htaccess - htaccess convert request to clean url and add slash at the end of the url -