python - Why are Pickle files in Pickle protocol 4 twice as large as those in protocol 3 without having any gains in speed? -


i testing python 3.4, , noticed pickle module has new protocol. therefore, benchmark 2 protocols.

def test1():     pickle3=open("pickle3","wb")     in range(1000000):         pickle.dump(i,pickle3,3)     pickle3.close()     pickle3=open("pickle3","rb")     in range(1000000):         pickle.load(pickle3)  def test2():     pickle4=open("pickle4","wb")     in range(1000000):         pickle.dump(i, pickle4,4)     pickle3.close()     pickle4=open("pickle4","rb")     in range(1000000):         pickle.load(pickle4) 

test1 mark: 2000007 function calls in 6.473 seconds

test2 mark: 2000007 function calls in 6.740 seconds

protocol 4 slower protocol 3. kind of difference can ignored. however, hard disk usage different.

pickle3 uses 7,868,672 bytes.

pickle4 uses 16,868,672 bytes.

that's no reason. continue dig out. after read pep3154, understand protocol.

for tuple(1,2,3,4,5,6,7) of protocol 3

    0: \x80 proto      3     2: (    mark     3: k        binint1    1     5: k        binint1    2     7: k        binint1    3     9: k        binint1    4    11: k        binint1    5    13: k        binint1    6    15: k        binint1    7    17: t        tuple      (mark @ 2)    18: q    binput     0    20: .    stop 

for tuple(1,2,3,4,5,6,7) of protocol 4

    0: \x80 proto      4     2: \x95 frame      18    11: (    mark    12: k        binint1    1    14: k        binint1    2    16: k        binint1    3    18: k        binint1    4    20: k        binint1    5    22: k        binint1    6    24: k        binint1    7    26: t        tuple      (mark @ 11)    27: \x94 memoize    28: .    stop 

the unpickler of protocol 3 cannot know length of data until reads position 17.

for protocol 4, position 2 postion 18, there heading shows length.

however, still not why pay price (almost double hard disk usage in extreme situation) speed same or potentially slower?

you pickling ints. there no gain in knowing size of structure in advance such simple datatype. more complex structures, knowing frame size huge gain in processing speed. besides, protocol 4 lifts many restrictions 64bit systems.


Comments

Popular posts from this blog

database - VFP Grid + SQL server 2008 - grid not showing correctly -

jquery - Set jPicker field to empty value -

.htaccess - htaccess convert request to clean url and add slash at the end of the url -