python - Why are Pickle files in Pickle protocol 4 twice as large as those in protocol 3 without having any gains in speed? -
i testing python 3.4, , noticed pickle module has new protocol. therefore, benchmark 2 protocols.
def test1(): pickle3=open("pickle3","wb") in range(1000000): pickle.dump(i,pickle3,3) pickle3.close() pickle3=open("pickle3","rb") in range(1000000): pickle.load(pickle3) def test2(): pickle4=open("pickle4","wb") in range(1000000): pickle.dump(i, pickle4,4) pickle3.close() pickle4=open("pickle4","rb") in range(1000000): pickle.load(pickle4)
test1 mark: 2000007 function calls in 6.473 seconds
test2 mark: 2000007 function calls in 6.740 seconds
protocol 4 slower protocol 3. kind of difference can ignored. however, hard disk usage different.
pickle3 uses 7,868,672 bytes.
pickle4 uses 16,868,672 bytes.
that's no reason. continue dig out. after read pep3154, understand protocol.
for tuple(1,2,3,4,5,6,7) of protocol 3
0: \x80 proto 3 2: ( mark 3: k binint1 1 5: k binint1 2 7: k binint1 3 9: k binint1 4 11: k binint1 5 13: k binint1 6 15: k binint1 7 17: t tuple (mark @ 2) 18: q binput 0 20: . stop
for tuple(1,2,3,4,5,6,7) of protocol 4
0: \x80 proto 4 2: \x95 frame 18 11: ( mark 12: k binint1 1 14: k binint1 2 16: k binint1 3 18: k binint1 4 20: k binint1 5 22: k binint1 6 24: k binint1 7 26: t tuple (mark @ 11) 27: \x94 memoize 28: . stop
the unpickler of protocol 3 cannot know length of data until reads position 17.
for protocol 4, position 2 postion 18, there heading shows length.
however, still not why pay price (almost double hard disk usage in extreme situation) speed same or potentially slower?
you pickling ints. there no gain in knowing size of structure in advance such simple datatype. more complex structures, knowing frame size huge gain in processing speed. besides, protocol 4 lifts many restrictions 64bit systems.
Comments
Post a Comment