A Disk RAID array Volume Manager for Disk

Figure 6: Effect of prioritized synchronization ordering on speed of convergence. The graph on the left illustrates the base case, with no
synchronization ordering enhancements. The graph on the right illustrates the benefit of prioritized synchronization ordering. Although the overall
speed with which the entire file system is synchronized does not increase (in fact, it goes down a bit due to contention on the SUT), the percentage
of comparable responses quickly grows to a large value.
To evaluate synchronization, we ran an OpenSSH compile
(the compile phase of the ssh-build benchmark used
by Seltzer, et al. [12]) on a client that had mounted the
reference server through the Tee. The compilation process
was started immediately after starting the plugin.
Both reference server and SUT had the same hardware
configuration and ran the same version of Linux. No
other workloads were active during the experiment. The
OpenSSH source code shared a mount point with approximately
25,000 other files spread across many directories.
The sum of the file sizes was 568MB.
To facilitate our synchronization evaluation, we instrumented
the Tee to periodically write internal counters to
a file. This mechanism provides us with two point-intime
values: the number of objects that are in a synchronized
state and the total number of objects we have discovered
thus far. It also provides us with two periodic
values (counts within a particular interval): the number
of requests enqueued for duplication to the SUT and the
number of requests received by the plugin from the relay.
These values allow us to compute two useful quantities.
The first is the ratio of requests enqueued for duplication
to requests received, expressed as a moving average; this
ratio serves as a measure of the proportion of operations
that were comparable in each time period. The second
is the ratio of synchronized objects to the total number
of objects in the file system; this value measures how far
the synchronization process has progressed through the
file system as a whole.
Figure 6 shows how both ratios grow over time for two
Tee instances: one (on the left) without the synchronization
ordering enhancements and one with them. Although
synchronization of the entire file system requires
over 90 seconds, prioritized synchronization ordering
quickly enables a high rate of comparable responses.
Ten seconds into the experiment, almost all requests
produced comparable responses with the enhancements.
Without the enhancements, we observe that a high rate of
comparable responses is reached at about 40 seconds after
the plugin was started. The rapid increase observed in
the unoptimized case at that time can be attributed to the
synchronization module reaching the OpenSSH source
code directory during its traversal of the directory tree.
The other noteworthy difference between the unordered
case and the ordered case is the time required to synchronize
the entire file system. Without prioritized synchronization
ordering, it took approximately 90 seconds.
With it, this figure was more than 100 seconds. This
difference occurs because the prioritized ordering allows
more requests to be compared sooner (and thus duplicated
to the SUT), creating contention for SUT resources
between synchronization-related requests and client requests.
The variation in the rate with which objects are
synchronized is caused by a combination of variation in
object size and variation in client workload (which contends
with synchronization for the reference server).
6 Discussion
This section discusses several additional topics related to
when comparison-based server verification is useful.
Debugging FS client code: Although its primary raison
d’etre is file server testing, comparison-based FS verification
can also be used for diagnosing problems with
client implementations. Based on prior experiences, we
believe the best example of this is when a client is observed
to work with some server implementations and
not others (e.g., a new version of a file server). Detailed
insight can be obtained by comparing server responses to
request sequences with which there is trouble, allowing
one to zero in on what unexpected server behavior the
client needs to cope with.
Holes created by non-comparable responses:
Comparison-based testing is not enough. Although
it exposes and clarifies some differences, it is not able
to effectively compare responses in certain situations,
as described in Section 4. Most notably, concurrent
writes to the same data block are one such situation—the
Tee cannot be sure which write was last and, therefore,
cannot easily compare responses to subsequent reads
of that block. Note, however, that most concurrency
situations can be tested.
More stateful protocols: Our file server Tee works for
NFS version 3, which is a stateless protocol. The fact
that no server state about clients is involved simplifies
Tee construction and allows quick ramp up of the percentage
of comparable operations. Although we have not
built one, we believe that few aspects would change significantly
in a file server Tee for more stateful protocols,
such as CIFS, NFS version 4, and AFS [5]. The most
notable change will be that the Tee must create duplicate
state on the SUT and include callbacks in the set
of “responses” compared—callbacks are, after all, external
actions taken by servers usually in response to client
requests. A consequence of the need to track and duplicate
state is that comparisons cannot begin until both
synchronization completes and the plug-in portion of the
Tee observes the beginnings of client sessions with the
server. This will reduce the speed at which the percentage
of comparable operations grows.
7 Related work
On-line comparison has a long history in computer faulttolerance
[14]. Usually, it is used as a voting mechanism
for determining the right result in the face of problems
with a subset of instances. For example, the triple
modular redundancy concept consists of running multiple
instances of a component in parallel and comparing
their results; this approach has been used, mainly, in
very critical domains where the dominant fault type is
hardware problems. Fault-tolerant consistency protocols
(e.g., Paxos [11]) for distributed systems use similar voting
approaches.
With software, deterministic programs will produce the
same answers given the same inputs, so one accrues little
benefit from voting among multiple instances of the
same implementation. With multiple implementations
of the same service, on the other hand, benefits can accrue.
This is generally referred to as N-version programming
[2]. Although some argue that N-version programming
does not assist fault-tolerance much [8, 9], we view
comparison-based verification as a useful application of
the basic concept of comparing one implementation’s results
to those produced by an independent implementation.
One similar use of inter-implementation comparison is
found in the Ballista-based study of POSIX OS robustness
[10]. Ballista [3] is a tool that exercises POSIX
interfaces with various erroneous arguments and evaluates
how an OS implementation copes. In many cases,
DeVale, et al. found that inconsistent return codes were
used by different implementations, which clearly creates
portability challenges for robustness-sensitive applications.
Use of a server Tee applies the proxy concept [13] to
allow transparent comparison of a developmental server
to a reference server. Many others have applied the
proxy concept for other means. In the file system domain,
specifically, some examples include Slice [1],
Zforce [17], Cuckoo [7], and Anypoint [16]. These all
interpose on client-server NFS activity to provide clustering
benefits to unmodified clients, such as replication
and load balancing. Most of them demonstrate that such
interposing can be done with minimal performance impact,
supporting our belief that the slowdown of our Tee’s
relaying could be eliminated with engineering effort.
8 Summary
Comparison-based server verification can be a useful addition
to the server testing toolbox. By comparing a SUT
to a reference server, one can isolate RPC interactions
that the SUT services differently. If the reference server
is considered correct, these discrepancies are potential
bugs needing exploration. Our prototype NFSv3 Tee
demonstrates the feasibility of comparison-based server
verification, and our use of it to debug a prototype server
and to discover interesting discrepancies among production
NFS servers illustrates its usefulness.
Acknowledgements
We thank Raja Sambasivan and Mike Abd-El-Malek for
help with experiments. We thank the reviewers, including
Vivek Pai (our shepherd), for constructive feedback
that improved the presentation. We thank the members
and companies of the PDL Consortium (including
EMC, Engenio, Hewlett-Packard, HGST, Hitachi, IBM,
Intel, Microsoft, Network Appliance, Oracle, Panasas,
Seagate, Sun, and Veritas) for their interest, insights,
feedback, and support. This material is based on research
sponsored in part by the National Science Foundation,
via grant #CNS-0326453, by the Air Force Research
Laboratory, under agreement number F49620–
01–1–0433, and by the Army Research Office, under
agreement number DAAD19–02–1–0389.
References
[1] D. C. Anderson, J. S. Chase, and A. M. Vahdat. Interposed
request routing for scalable network storage.
Symposium on Operating Systems Design and
Implementation (San Diego, CA, 22–25 October
2000), 2000.
[2] L. Chen and A. Avizienis. N-version programming:
a fault tolerance approach to reliability of
software operation. International Symposium on
Fault-Tolerant Compter Systems, pages 3–9, 1978.
[3] J. P. DeVale, P. J. Koopman, and D. J. Guttendorf.
The Ballista software robustness testing service.
Testing Computer Software Conference (Bethesda,
MD, 14–18 June 1999). Unknown publisher, 1999.
[4] G. R. Ganger, J. D. Strunk, and A. J. Klosterman.
Self-* Storage: Brick-based storage with automated
administration. Technical Report CMU–
CS–03–178. Carnegie Mellon University, August
2003.
[5] J. H. Howard, M. L. Kazar, S. G. Menees, D. A.
Nichols, M. Satyanarayanan, R. N. Sidebotham,
and M. J. West. Scale and performance in a distributed
file system. ACM Transactions on Computer
Systems (TOCS), 6(1):51–81. ACM, February
1988.
[6] J. Katcher. PostMark: a new file system benchmark.
Technical report TR3022. Network Appliance, October
1997.
[7] A. J. Klosterman and G. Ganger. Cuckoo: layered
clustering for NFS. Technical Report CMU–
CS–02–183. Carnegie Mellon University, October
2002.
[8] J. C. Knight and N. G. Leveson. A reply to the criticisms
of the Knight & Leveson experiment. ACM
SIGSOFT Software Engineering Notes, 15(1):24–
35. ACM, January 1990.
[9] J. C. Knight and N. G. Leveson. An experimental
evaluation of the assumptions of independence
in multiversion programming. Trnsactions on Software
Engineering, 12(1):96–109, March 1986.
[10] P. Koopman and J. DeVale. Comparing the robustness
of POSIX operating systems. International
Symposium on Fault-Tolerant Compter Systems
(Madison, WI, 15–18 June 1999), 1999.
[11] L. Lamport. Paxos made simple. ACM SIGACT
News, 32(4):18–25. ACM, December 2001.
[12] M. I. Seltzer, G. R. Ganger, M. K. McKusick, K. A.
Smith, C. A. N. Soules, and C. A. Stein. Journaling
versus Soft Updates: Asynchronous Meta-data Protection
in File Systems. USENIX Annual Technical
Conference (San Diego, CA, 18–23 June 2000),
pages 71–84, 2000.
[13] M. Shapiro. Structure and encapsulation in distributed
systems: the proxy principle. International
Conference on Distributed Computing Systems
(Cambridge, Mass), pages 198–204. IEEE Computer
Society Press, Catalog number 86CH22293-
9, May 1986.
[14] D. P. Siewiorek and R. S. Swarz. Reliable computer
systems: design and evaluation. Digital Press, Second
edition, 1992.
[15] SPEC SFS97 R1 V3.0 benchmark, Standard Performance
Evaluation Corporation, August, 2004.
http://www.specbench.org/sfs97r1/.
[16] K. G. Yocum, D. C. Anderson, J. S. Chase, and
A. M. Vahdat. Anypoint: extensible transport
switching on the edge. USENIX Symposium on Internet
Technologies and Systems (Seattle, WA, 26–
28 March 2003), 2003.
[17] Z-force, Inc., 2004. www.zforce.com.