Visualizing the ZODB with graphviz
While digging around in the ZEXP export code, I realized that it wouldn’t be too hard to modify it to dump a representation of a ZODB in graphviz .dot format. Here’s a Zope external method I devised to do that:
# Generic ZODB walker and graphviz exporter
####################################################################
#
# Copyright (c) 2003 Zope Corporation and Contributors.
# All Rights Reserved.
#
# This software is subject to the provisions of the Zope Public License,
# Version 2.1 (ZPL). A copy of the ZPL should accompany this distribution.
# THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED
# WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS
# FOR A PARTICULAR PURPOSE.
#
####################################################################
import logging
import cPickle, cStringIO
from ZODB.utils import u64
logger = logging.getLogger('ZODB.ExportImport')
def get_reference_dumper(refs):
# This is a callback which will be called whenever a reference is found.
def dump_reference(oid, roid):
refs.append('%s -> %sn' % (u64(oid), u64(roid)))
return dump_reference
def export_graphviz(self):
"""
Walks a ZODB database and dumps the object graph in graphviz .dot format.
"""
context = self
f = open('plone.dot', 'w')
f.write('digraph plone {n')
refs = []
reference_dumper = get_reference_dumper(refs)
for oid, p in walk_database(context, reference_callback=reference_dumper):
# Walk to all the objects in the database and examine their references.
# Whenever a reference is found, it will be recorded via the
# reference_dumper. Whenever a new object is found, it will be yieled
# to this loop.
# Read the module and class from the pickle bytestream without actually
# loading the object.
module, klass = p.split('n')[:2]
module = module[2:]
f.write('%s [label="%s.%s"]n' % (u64(oid), module, klass))
for ref in refs:
f.write(ref)
f.write('}n')
f.close()
def walk_database(context, reference_callback=None):
# Get the object ID and database connection of the starting object.
base_oid = context._p_oid
conn = context._p_jar
# oids is used to keep track of found oids that need to be visited.
# done_oids is used to keep track of which oids have already been yielded.
oids = [base_oid]
done_oids = {}
while oids:
# loop while references remain to objects we haven't exported yet
oid = oids.pop(0)
if oid in done_oids:
continue
done_oids[oid] = True
try:
# fetch the pickle
p, serial = conn._storage.load(oid, conn._version)
except:
logger.debug("broken reference for oid %s", repr(oid),
exc_info=True)
else:
# If the Unpickler's persistent_load attribute is set to a list,
# then that list will be populated with the references found in
# the pickle when noload is called, without actually loading the
# object.
refs = []
u = cPickle.Unpickler(cStringIO.StringIO(p))
u.persistent_load = refs
# noload must be called the same # of times it was called when
# pickling
u.noload()
u.noload()
# loop through the references found on this object
for ref in refs:
# look for the various reference types supported by the ZODB
# (see the docs in ZODB/serialize.py for details)
if isinstance(ref, tuple):
roid = ref[0]
elif isinstance(ref, str):
roid = ref
else:
try:
ref_type, args = ref
except ValueError:
# weakref - not supported
continue
else:
if ref_type in ('m', 'n'):
# cross-database ref - not supported
continue
if roid:
# record this reference
if reference_callback:
reference_callback(oid, roid)
# add the referenced object to the list of objects we need
# to visit
oids.append(roid)
# yield the oid and pickle
yield oid, p
And after running this on a fresh Plone site, sending the result through dot and loading it in zgrviewer, here’s the result:

The site root is toward the upper right; most of the graph is persistent tools and such rather than actual content, since there is minimal content in a fresh Plone installation. That hairy mess on the left is the mimetype registry. Any resemblance to the shape of the BFG logo is entirely coincidental.
I’m not really sure what sort of useful information one might be able to get using this sort of technique, but I’m sure there are some possibilities, so please let me know if you have ideas or if you modify this to do something cool.
I want to try this on a site that has real data in it, but at the moment I’m waiting for the latest XCode to download so that I can build the newest graphviz which includes sfdp which is supposed to be better for handling really big graphs.




