python - Very slow MongoDB count in big database -


i have database collection in large amount of documents (several millions). in database have (amongst others) fields _violationtype (int) , _duration (int). count amount of documents have _violationtype of 15 or less , _duration of 10 or less. execute following python script:

#!/usr/bin/env python import pymongo import timeit   client = pymongo.mongoclient('localhost', 27017) database = client['bgp_route_leaks']  collection = database['valleys']  collection.ensure_index('_violationtype', unique=false) collection.ensure_index('_duration', unique=false)  start = timeit.default_timer()  cursor = collection.find({'$and': [{'_violationtype': {'$lt': 16}}, {'_duration': {'$lt': 10}}]}, {'_duration': 1, '_id': 0})  print('explain: {}'.format(cursor.explain())) print('count: {}'.format(cursor.count())) print('time: {}'.format(timeit.default_timer() - start)) 

this prints out:

explain: {u'nyields': 4, u'nscannedallplans': 6244545, u'allplans': [{u'cursor': u'btreecursor _violationtype_1', u'indexbounds': {u'_violationtype': [[-1.7976931348623157e+308, 16]]}, u'nscannedobjects': 124, u'nscanned': 124, u'n': 34}, {u'cursor': u'btreecursor _duration_1', u'indexbounds': {u'_duration': [[-1.7976931348623157e+308, 10]]}, u'nscannedobjects': 6244298, u'nscanned': 6244298, u'n': 5678070}, {u'cursor': u'basiccursor', u'indexbounds': {}, u'nscannedobjects': 123, u'nscanned': 123, u'n': 36}], u'millis': 30815, u'nchunkskips': 0, u'server': u'area51:27017', u'n': 5678107, u'cursor': u'btreecursor _duration_1', u'scanandorder': false, u'indexbounds': {u'_duration': [[-1.7976931348623157e+308, 10]]}, u'nscannedobjectsallplans': 6244545, u'ismultikey': false, u'indexonly': true, u'nscanned': 6244298, u'nscannedobjects': 6244298} count: 5678107 time: 52.4030768871 

while running executed db.currentop() in window, returned

{         "inprog" : [                 {                         "opid" : 15,                         "active" : true,                         "secs_running" : 4,                         "op" : "query",                         "ns" : "bgp_route_leaks.valleys",                         "query" : {                                 "$query" : {                                         "$and" : [                                                 {                                                         "_violationtype" : {                                                                 "$lt" : 16                                                         }                                                 },                                                 {                                                         "_duration" : {                                                                 "$lt" : 10                                                         }                                                 }                                         ]                                 },                                 "$explain" : true                         },                         "client" : "127.0.0.1:46819",                         "desc" : "conn1",                         "threadid" : "0x7fd69b31c700",                         "connectionid" : 1,                         "locks" : {                                 "^" : "r",                                 "^bgp_route_leaks" : "r"                         },                         "waitingforlock" : false,                         "numyields" : 5,                         "lockstats" : {                                 "timelockedmicros" : {                                         "r" : numberlong(8816104),                                         "w" : numberlong(0)                                 },                                 "timeacquiringmicros" : {                                         "r" : numberlong(4408723),                                         "w" : numberlong(0)                                 }                         }                 }         ] } 

now have read common source slow mongodb queries missing indexes. however, have ensured indexes both _violationtype , _duration , explain tells me u'indexonly': true. read numa architecture slow things down , should start service via command

sudo numactl --interleave=all /usr/bin/mongod --dbpath=/var/lib/mongodb (/proc/sys/vm/zone_reclaim_mode set 0) 

which know have done, still takes minute count , longer others, wondering make query faster.

running

db.runcommand({compact: 'bgp_route_leaks'}) 

in mongo shell has been tried no luck.

any suggestions on how counts faster?

the mongodb version 2.4.9.

if @ explain output see query using _violationtype scanning 124 objects , the query using _duration scanning 6244545 objects.

although mongodb 2.6+ can use index intersection, compound index faster.

you need create compound index on fields:

collection.create_index([("_violationtype", ascending),("_duration", ascending)]); 

edit

in versions 2.4 mongodb performance improved (jira-1752).

also, it's worth noticing explain command displaying details query not count command.

unfortunately, can't use explain on count command, there ticket opened issue.

to measure performance of count command should remove explain test. also, need repeat query multiple times (100x, 1000x...) , take average correct value.


Comments

Popular posts from this blog

C# random value from dictionary and tuple -

cgi - How do I interpret URLs without extension as files rather than missing directories in nginx? -

.htaccess - htaccess convert request to clean url and add slash at the end of the url -