Graph databases utilizing locality -


dag = directed acyclic graph; roots = vertices without incoming edges.

i have dag larger available ram, need disk-based graph database work it.

my dag shallow: have billions of roots nodes, each node dozens of nodes reachable.

it not connected: majority of nodes have 1 incoming edge. couple of root nodes reachable subgraphs have few nodes in common.

so dag can thought of large number of small trees, few of intersect.

i need perform following queries on dag in bulk numbers: given root node, nodes reachable it.

it can thought batch query: given few thousands of root nodes, return nodes reachable there.

as far know there algorithms improve disk storage locality graphs. 3 examples are:

it seems there older generation graph databases don't utilize graph locality. example popular neo4j graph database:

http://www.ibm.com/developerworks/library/os-giraph/

neo4j relies on data access methods graphs without considering data locality, , processing of graphs entails random data access. large graphs cannot stored in memory, random disk access becomes performance bottleneck.

my question is: there graph databases suited workload?

support win64 , possibility work database else java plus.

from task doesn't seem need graph database. can use external-memory programming library, such stxxl. first perform topological sort on graph (in edge format). sequentially scan until finish "root nodes". i/o complexity bounded topological sort. don't need topo sort, need identify root nodes. can done join edge table , node table, linear time.


Comments

Popular posts from this blog

C# random value from dictionary and tuple -

cgi - How do I interpret URLs without extension as files rather than missing directories in nginx? -

.htaccess - htaccess convert request to clean url and add slash at the end of the url -