The Fifth DIMACS Challenge Dictionaries

The Fifth DIMACS Challenge -- Dictionary Tests

This page describes the set of core experiments for participants in the Dictionary part of DIMACS Challenge 5. We expect you will perform more extensive experiments when addressing your research questions, and we encourage you to perform more experiments to track down interesting questions arising from these listed here. Send Cathy a note if you have questions or need help.

The test driver. To run the tests, you need a little driver program that invokes procedures according to commands presented in an input file.
- The specifications document that describes the driver and the input file formats is available in postscript . Or you can get a Latex version together with a required macro file.
- If you are implementing in C, a driver program is available from DIMACS. You can get the compressed tar file . The documentation explains how to to combine the driver with your dictionary implementation.
- The DIMACS driver uses compile-time switches according to whether the ky_type is an integer or a string. It is currently set to STRING_KEY, and the ky_type is str16. You will have to change this to run experiments with longer string keys or with integer keys. Check the documentation for help.
- With the compile-time switch #DEFINE DETAIL 1, the driver prints a trace of its procedure invocations. Each output line shows the command and its parameters, followed by what was returned by the procedure (these two parts are separated by a colon). Be sure to run some tests with output traces to check correctness of the implementation and the interface. Remove the #define DETAIL line to turn off this output.
General testing procedures. Below we describe several test problems. In addition to running these problems for each implementation you are testing, you should do the following.
- Find out whether your implementation's performance is sensitive to the size of the info part in each item. If it is, you should run two sets of tests, using in_type = str4 and in_type = str128. If it isn't, just use the smaller in_type. (You will have to change the driver for this too).
- Choose a few combinatorial performance measures that are most closely associated with running times (like number of nodes touched). Instrument your data structure to report those counts for each test problem.
- Also report user runtimes for each problem. Since you don't want to include the cost of counting in your runtimes, you should recompile without the combinatorial counting code. Use the same input files (or same random seed) to get identical runs.
- When reporting runtimes, you can let the driver and data structure produce no output. To make running times meaningful, you should also report: The machine type, the operating system and version number, the compiler and version number, and the optimization level. We strongly urge you to get in touch with other participants to swap implementations for tests on different machines.
- For the middle problem size in each Random test, run the demo dictionary implementation that is provided with the DIMACS driver program, and report the runtimes. We hope this will maximize comparability across different environments.
- You are welcome to run additional random trials to improve your estimate of mean performance. For each quantity measured, state the number of random trials, the mean, the min, and the max values observed.
Object Reference Traces. (20 files). Darko Stefanovic has contributed 20 files of dictionary operation traces. These are from data references in object-oriented code and are DIMACS format. The keys are big integers. Use these files as inputs when you run the tests with your dictionary driver (with switches set for integer keys).
Library Call Numbers(25 files). Craig Silverstein has contributed dictionary trace files containing Dictionary lookup and insert operations. The keys are large strings corresponding to library call numbers. Use these files as inputs when you run the tests with your dictionary driver (with ky_type set to str36).
Random Numeric Key Tests (6 problems x 3 problem sizes). These tests use a generator of random dictionary operations. The generator takes several parameters to structure the operation sequence. Keys are random integers with a decimal timestamp added to produce unique keys; you can treat them as strings of length 16. The generator was written by Marc Brooks with modifications by C. McGeoch.
Generic English Key Tests (12 problems ). Here the operation sequences are the same as for the 6 Random Numeric Key problems, but the keys are English words taken from two text files. Keys are strings at most 20 characters long.
Other Tests. Participants are expected to perform more extensive tests than those listed here. You can use the problem generators with other parameter settings, or you can invent your own problems. You can perform more random trials with these problems, to improve the estimate of mean behavior.
We would especially like to hear from someone who has a problem generator for persistent dictionaries. Persistent dictionaries should be much faster than others when item names are used and the dli command replaces dlk. See the problem specifications document for more about persistence.
We also have available some Wordcount Problems (6 files) that are not required for the core tests. These are dictionary traces from a program that counts unique words in English text. There is one lookup for every word in the text and one insert the first time it appears. Keys are strings at most 20 characters long. File sizes range from about 550 to 93200 operations.

Back to the Challenge.

This page is maintained by C.C. McGeoch, who can be reached at ccm@cs.amherst.edu