The Fifth DIMACS Challenge -- Dictionary Tests
This page describes the set of core experiments for participants in the
Dictionary part of DIMACS Challenge 5. We expect you will
perform more extensive experiments when addressing your research questions,
and we encourage you to perform more experiments to track down
interesting questions arising from these listed here.
Send Cathy a note if you have
questions or need help.
-
The test driver. To run the tests, you need
a little driver program that invokes procedures according to commands
presented in an input file.
- The specifications document that describes the driver and
the input file formats is available in
postscript . Or you can get
a Latex version together with
a required macro file.
-
If
you are implementing in C, a driver program is available from DIMACS.
You can get the
compressed tar file . The documentation explains how to
to combine the driver with your dictionary implementation.
- The DIMACS driver uses compile-time switches according to
whether the
ky_type
is an integer or a string.
It is currently set to STRING_KEY
, and the ky_type
is str16
. You will have to change this
to run experiments with longer string keys or with integer keys. Check
the documentation for help.
- With the compile-time switch #DEFINE DETAIL 1, the
driver prints a trace of its procedure invocations.
Each output line shows the command and its parameters,
followed by what was returned by the procedure
(these two parts are separated by a colon).
Be sure to run some tests with output traces to check
correctness of the implementation and the interface.
Remove the #define DETAIL line to turn off this output.
-
General testing procedures. Below we describe
several test problems. In addition to running these problems for
each implementation you are testing, you should do the following.
- Find out whether your implementation's performance is sensitive
to the size of the info part in each item.
If it is, you should run two sets of tests, using
in_type = str4 and in_type = str128. If it isn't,
just use the smaller in_type. (You will have to
change the driver for this too).
- Choose a few combinatorial
performance measures that are most closely associated with running
times (like number of nodes touched). Instrument
your data structure to report those counts for each test problem.
- Also report user runtimes for each problem. Since you don't want to
include the cost of counting in your runtimes, you should
recompile without the combinatorial counting code. Use the
same input files (or same random seed) to get identical runs.
- When reporting runtimes, you can let the driver and data structure
produce no output. To make running times meaningful, you should
also report: The
machine type, the operating system and version number, the compiler
and version number, and the optimization level. We strongly urge you to get
in touch with other participants
to swap implementations for tests on different machines.
- For the middle problem size in each Random test,
run the demo dictionary implementation that is provided with the
DIMACS driver program,
and report the runtimes. We hope this will maximize comparability
across different environments.
- You are welcome to run additional random trials to improve your
estimate of mean performance. For each quantity measured, state
the number of random trials, the mean, the min,
and the max values observed.
-
Object Reference Traces. (20 files).
Darko Stefanovic has contributed 20
files of dictionary operation traces. These are from data
references in object-oriented code and are DIMACS format. The keys
are big integers. Use these files as inputs when you
run the tests with
your dictionary driver (with switches set for integer keys).
-
Library Call Numbers(25 files).
Craig Silverstein has contributed dictionary trace files containing
Dictionary lookup and insert operations. The keys are large strings
corresponding to library call numbers. Use these files as inputs
when you run the tests with your
dictionary driver (with
ky_type
set to
str36
).
-
Random Numeric Key Tests
(6 problems x 3 problem sizes).
These tests use a generator of random dictionary operations.
The generator takes several parameters to structure the operation
sequence. Keys are random integers with a decimal timestamp added
to produce unique keys; you can treat them as strings of length 16.
The generator was written by Marc Brooks with modifications by C. McGeoch.
-
Generic English Key Tests
(12 problems ). Here the operation sequences are the same as for the
6 Random Numeric Key problems, but the keys are English words taken
from two text files. Keys are strings at most 20 characters long.
- Other Tests. Participants are expected to perform
more extensive tests than those listed here. You can use the problem
generators with other parameter settings, or you can invent your own problems.
You can perform more random trials with these problems, to improve
the estimate of mean behavior.
- We would especially like to hear from someone who has a problem
generator for persistent dictionaries. Persistent dictionaries should
be much faster than others when item names are used and the
dli
command replaces dlk
. See the problem
specifications document for more about persistence.
- We also have available some
Wordcount Problems (6 files) that are not required
for the core tests. These are dictionary traces from a program that
counts unique words in English text.
There is one lookup for every word in the text and one insert the first
time it appears. Keys are strings at most 20 characters long.
File sizes range from about 550 to 93200 operations.
Back to the Challenge.
This page is maintained by
C.C. McGeoch, who can be reached at
ccm@cs.amherst.edu