Graph databases utilizing locality -
dag = directed acyclic graph; roots = vertices without incoming edges.
i have dag larger available ram, need disk-based graph database work it.
my dag shallow: have billions of roots nodes, each node dozens of nodes reachable.
it not connected: majority of nodes have 1 incoming edge. couple of root nodes reachable subgraphs have few nodes in common.
so dag can thought of large number of small trees, few of intersect.
i need perform following queries on dag in bulk numbers: given root node, nodes reachable it.
it can thought batch query: given few thousands of root nodes, return nodes reachable there.
as far know there algorithms improve disk storage locality graphs. 3 examples are:
- http://ceur-ws.org/vol-733/paper_pacher.pdf
- http://www.cs.ox.ac.uk/dan.olteanu/papers/g-store.pdf
- http://graphlab.org/files/osdi2012-kyrola-blelloch-guestrin.pdf
it seems there older generation graph databases don't utilize graph locality. example popular neo4j graph database:
http://www.ibm.com/developerworks/library/os-giraph/
neo4j relies on data access methods graphs without considering data locality, , processing of graphs entails random data access. large graphs cannot stored in memory, random disk access becomes performance bottleneck.
my question is: there graph databases suited workload?
support win64 , possibility work database else java plus.
from task doesn't seem need graph database. can use external-memory programming library, such stxxl. first perform topological sort on graph (in edge format). sequentially scan until finish "root nodes". i/o complexity bounded topological sort. don't need topo sort, need identify root nodes. can done join edge table , node table, linear time.
Comments
Post a Comment