A Disk RAID array Volume Manager for Disk

makes it possible to perform comparison-based verification
even in live environments.
The paper details the design and implementation of a
NFSv3 Tee. To illustrate the use of a file server Tee,
we present the results of using our NFSv3 Tee to compare
several popular production NFS servers, including
FreeBSD, a Network Appliance box, and two versions
of Linux. A variety of differences are identified, including
some discrepancies that would affect correctness for
some clients. We also describe experiences using our
NFSv3 Tee to debug a prototype NFS server.
The remainder of this paper is organized as follows. Section
2 puts comparison-based server verification in context
and discusses what it can be used for. Section 3 discusses
how a file server Tee works. Section 4 describes
the design and implementation of our NFSv3 Tee. Section
5 evaluates our NFSv3 Tee and presents results of
several case studies using it. Section 6 discusses additional
issues and features of comparison-based file server
verification. Section 7 discusses related work.
2 Background
Distributed computing based on the client-server model
is commonplace. Generally speaking, this model consists
of clients sending RPC requests to servers and receiving
responses after the server finishes the requested
action. For most file servers, for example, system calls
map roughly to RPC requests, supporting actions like file
creation and deletion, data reads and writes, and fetching
of directory entry listings.
Developing functional servers can be fairly straightforward,
given the variety of RPC packages available and
the maturity of the field. Fully debugging them, however,
can be tricky. While the server interface is usually
codified in a specification, there are often aspects
that are insufficiently formalized and thus open to interpretation.
Different client or server implementors may
interpret them differently, creating a variety of de facto
standards to be supported (by servers or clients).
There are two common testing strategies for servers. The
first, based on RPC-level test suites, exercises each individual
RPC request and verifies proper responses in specific
situations. For each test case, the test scaffolding
sets server state as needed, sends the RPC request, and
compares the response to the expected value. Verifying
that the RPC request did the right thing may involve
additional server state checking via follow-up RPC requests.
After each test case, any residual server state
is cleaned up. Constructing exhaustive RPC test suites
is a painstaking task, but it is a necessary step if serious
robustness is desired. One challenge with such test
suites, as with almost all testing, is balancing coverage
with development effort and test completion time. Another
challenge, related to specification vagueness, is accuracy:
the test suite implementor interprets the specification,
but may not do so the same way as others.
The second testing strategy is to experiment with applications
and benchmarks executing on one or more client
implementation(s).2 This complements RPC-level testing
by exercising the server with specific clients, ensuring
that those clients work well with the server when executing
at least some important workloads; thus, it helps
with the accuracy issue mentioned above. On the other
hand, it usually offers much less coverage than RPClevel
testing. It also does not ensure that the server will
work with clients that were not tested.
2.1 Comparison-based verification
Comparison-based verification complements these testing
approaches. It does not eliminate the coverage problem,
but it can help with the accuracy issue by conforming
to someone else’s interpretation of the specification.
It can help with the coverage issue, somewhat, by exposing
problem “types” that recur across RPCs and should
be addressed en masse.
Comparison-based verification consists of comparing the
server being tested to a “gold standard,” a reference
server whose implementation is believed to work correctly.
Specifically, the state of the SUT is set up to match
that of the reference server, and then each RPC request
is duplicated so that the two servers’ responses to each
request can be compared. If the server states were synchronized
properly, and the reference server is correct,
differences in responses indicate potential problems with
the SUT.
Comparison-based verification can help server development
in fourways: debugging client-perceived problems,
achieving bug compatibility with existing server implementations,
testing in live environments, and isolating
performance differences.
1. Debugging: With benchmark-based testing, in particular,
bugs exhibit themselves as situations where the
benchmark fails to complete successfully. When this
happens, significant effort is often needed to determine
exactly what server response(s) caused the client to
fail. For example, single-stepping through client actions
might be used, but this is time-consuming and may alter
client behavior enough that the problem no longer arises.
Another approach is to sniff network packets and interpret
the exchanges between client and server to identify
the last interactions before problems arise. Then, one
2Research prototypes are almost always tested only in this way.