Figure 6: Effect of prioritized synchronization ordering on speed of convergence. The graph on the left illustrates the base case, with no synchronization ordering enhancements. The graph on the right illustrates the benefit of prioritized synchronization ordering. Although the overall speed with which the entire file system is synchronized does not increase (in fact, it goes down a bit due to contention on the SUT), the percentage of comparable responses quickly grows to a large value. To evaluate synchronization, we ran an OpenSSH compile (the compile phase of the ssh-build benchmark used by Seltzer, et al. [12]) on a client that had mounted the reference server through the Tee. The compilation process was started immediately after starting the plugin. Both reference server and SUT had the same hardware configuration and ran the same version of Linux. No other workloads were active during the experiment. The OpenSSH source code shared a mount point with approximately 25,000 other files spread across many directories. The sum of the file sizes was 568MB. To facilitate our synchronization evaluation, we instrumented the Tee to periodically write internal counters to a file. This mechanism provides us with two point-intime values: the number of objects that are in a synchronized state and the total number of objects we have discovered thus far. It also provides us with two periodic values (counts within a particular interval): the number of requests enqueued for duplication to the SUT and the number of requests received by the plugin from the relay. These values allow us to compute two useful quantities. The first is the ratio of requests enqueued for duplication to requests received, expressed as a moving average; this ratio serves as a measure of the proportion of operations that were comparable in each time period. The second is the ratio of synchronized objects to the total number of objects in the file system; this value measures how far the synchronization process has progressed through the file system as a whole. Figure 6 shows how both ratios grow over time for two Tee instances: one (on the left) without the synchronization ordering enhancements and one with them. Although synchronization of the entire file system requires over 90 seconds, prioritized synchronization ordering quickly enables a high rate of comparable responses. Ten seconds into the experiment, almost all requests produced comparable responses with the enhancements. Without the enhancements, we observe that a high rate of comparable responses is reached at about 40 seconds after the plugin was started. The rapid increase observed in the unoptimized case at that time can be attributed to the synchronization module reaching the OpenSSH source code directory during its traversal of the directory tree. The other noteworthy difference between the unordered case and the ordered case is the time required to synchronize the entire file system. Without prioritized synchronization ordering, it took approximately 90 seconds. With it, this figure was more than 100 seconds. This difference occurs because the prioritized ordering allows more requests to be compared sooner (and thus duplicated to the SUT), creating contention for SUT resources between synchronization-related requests and client requests. The variation in the rate with which objects are synchronized is caused by a combination of variation in object size and variation in client workload (which contends with synchronization for the reference server). 6 Discussion This section discusses several additional topics related to when comparison-based server verification is useful. Debugging FS client code: Although its primary raison d’etre is file server testing, comparison-based FS verification can also be used for diagnosing problems with client implementations. Based on prior experiences, we believe the best example of this is when a client is observed to work with some server implementations and not others (e.g., a new version of a file server). Detailed insight can be obtained by comparing server responses to request sequences with which there is trouble, allowing one to zero in on what unexpected server behavior the client needs to cope with. Holes created by non-comparable responses: Comparison-based testing is not enough. Although it exposes and clarifies some differences, it is not able to effectively compare responses in certain situations, as described in Section 4. Most notably, concurrent writes to the same data block are one such situation—the Tee cannot be sure which write was last and, therefore, cannot easily compare responses to subsequent reads of that block. Note, however, that most concurrency situations can be tested. More stateful protocols: Our file server Tee works for NFS version 3, which is a stateless protocol. The fact that no server state about clients is involved simplifies Tee construction and allows quick ramp up of the percentage of comparable operations. Although we have not built one, we believe that few aspects would change significantly in a file server Tee for more stateful protocols, such as CIFS, NFS version 4, and AFS [5]. The most notable change will be that the Tee must create duplicate state on the SUT and include callbacks in the set of “responses” compared—callbacks are, after all, external actions taken by servers usually in response to client requests. A consequence of the need to track and duplicate state is that comparisons cannot begin until both synchronization completes and the plug-in portion of the Tee observes the beginnings of client sessions with the server. This will reduce the speed at which the percentage of comparable operations grows. 7 Related work On-line comparison has a long history in computer faulttolerance [14]. Usually, it is used as a voting mechanism for determining the right result in the face of problems with a subset of instances. For example, the triple modular redundancy concept consists of running multiple instances of a component in parallel and comparing their results; this approach has been used, mainly, in very critical domains where the dominant fault type is hardware problems. Fault-tolerant consistency protocols (e.g., Paxos [11]) for distributed systems use similar voting approaches. With software, deterministic programs will produce the same answers given the same inputs, so one accrues little benefit from voting among multiple instances of the same implementation. With multiple implementations of the same service, on the other hand, benefits can accrue. This is generally referred to as N-version programming [2]. Although some argue that N-version programming does not assist fault-tolerance much [8, 9], we view comparison-based verification as a useful application of the basic concept of comparing one implementation’s results to those produced by an independent implementation. One similar use of inter-implementation comparison is found in the Ballista-based study of POSIX OS robustness [10]. Ballista [3] is a tool that exercises POSIX interfaces with various erroneous arguments and evaluates how an OS implementation copes. In many cases, DeVale, et al. found that inconsistent return codes were used by different implementations, which clearly creates portability challenges for robustness-sensitive applications. Use of a server Tee applies the proxy concept [13] to allow transparent comparison of a developmental server to a reference server. Many others have applied the proxy concept for other means. In the file system domain, specifically, some examples include Slice [1], Zforce [17], Cuckoo [7], and Anypoint [16]. These all interpose on client-server NFS activity to provide clustering benefits to unmodified clients, such as replication and load balancing. Most of them demonstrate that such interposing can be done with minimal performance impact, supporting our belief that the slowdown of our Tee’s relaying could be eliminated with engineering effort. 8 Summary Comparison-based server verification can be a useful addition to the server testing toolbox. By comparing a SUT to a reference server, one can isolate RPC interactions that the SUT services differently. If the reference server is considered correct, these discrepancies are potential bugs needing exploration. Our prototype NFSv3 Tee demonstrates the feasibility of comparison-based server verification, and our use of it to debug a prototype server and to discover interesting discrepancies among production NFS servers illustrates its usefulness. Acknowledgements We thank Raja Sambasivan and Mike Abd-El-Malek for help with experiments. We thank the reviewers, including Vivek Pai (our shepherd), for constructive feedback that improved the presentation. We thank the members and companies of the PDL Consortium (including EMC, Engenio, Hewlett-Packard, HGST, Hitachi, IBM, Intel, Microsoft, Network Appliance, Oracle, Panasas, Seagate, Sun, and Veritas) for their interest, insights, feedback, and support. This material is based on research sponsored in part by the National Science Foundation, via grant #CNS-0326453, by the Air Force Research Laboratory, under agreement number F49620– 01–1–0433, and by the Army Research Office, under agreement number DAAD19–02–1–0389. References [1] D. C. Anderson, J. S. Chase, and A. M. Vahdat. Interposed request routing for scalable network storage. Symposium on Operating Systems Design and Implementation (San Diego, CA, 22–25 October 2000), 2000. [2] L. Chen and A. Avizienis. N-version programming: a fault tolerance approach to reliability of software operation. International Symposium on Fault-Tolerant Compter Systems, pages 3–9, 1978. [3] J. P. DeVale, P. J. Koopman, and D. J. Guttendorf. The Ballista software robustness testing service. Testing Computer Software Conference (Bethesda, MD, 14–18 June 1999). Unknown publisher, 1999. [4] G. R. Ganger, J. D. Strunk, and A. J. Klosterman. Self-* Storage: Brick-based storage with automated administration. Technical Report CMU– CS–03–178. Carnegie Mellon University, August 2003. [5] J. H. Howard, M. L. Kazar, S. G. Menees, D. A. Nichols, M. Satyanarayanan, R. N. Sidebotham, and M. J. West. Scale and performance in a distributed file system. ACM Transactions on Computer Systems (TOCS), 6(1):51–81. ACM, February 1988. [6] J. Katcher. PostMark: a new file system benchmark. Technical report TR3022. Network Appliance, October 1997. [7] A. J. Klosterman and G. Ganger. Cuckoo: layered clustering for NFS. Technical Report CMU– CS–02–183. Carnegie Mellon University, October 2002. [8] J. C. Knight and N. G. Leveson. A reply to the criticisms of the Knight & Leveson experiment. ACM SIGSOFT Software Engineering Notes, 15(1):24– 35. ACM, January 1990. [9] J. C. Knight and N. G. Leveson. An experimental evaluation of the assumptions of independence in multiversion programming. Trnsactions on Software Engineering, 12(1):96–109, March 1986. [10] P. Koopman and J. DeVale. Comparing the robustness of POSIX operating systems. International Symposium on Fault-Tolerant Compter Systems (Madison, WI, 15–18 June 1999), 1999. [11] L. Lamport. Paxos made simple. ACM SIGACT News, 32(4):18–25. ACM, December 2001. [12] M. I. Seltzer, G. R. Ganger, M. K. McKusick, K. A. Smith, C. A. N. Soules, and C. A. Stein. Journaling versus Soft Updates: Asynchronous Meta-data Protection in File Systems. USENIX Annual Technical Conference (San Diego, CA, 18–23 June 2000), pages 71–84, 2000. [13] M. Shapiro. Structure and encapsulation in distributed systems: the proxy principle. International Conference on Distributed Computing Systems (Cambridge, Mass), pages 198–204. IEEE Computer Society Press, Catalog number 86CH22293- 9, May 1986. [14] D. P. Siewiorek and R. S. Swarz. Reliable computer systems: design and evaluation. Digital Press, Second edition, 1992. [15] SPEC SFS97 R1 V3.0 benchmark, Standard Performance Evaluation Corporation, August, 2004. http://www.specbench.org/sfs97r1/. [16] K. G. Yocum, D. C. Anderson, J. S. Chase, and A. M. Vahdat. Anypoint: extensible transport switching on the edge. USENIX Symposium on Internet Technologies and Systems (Seattle, WA, 26– 28 March 2003), 2003. [17] Z-force, Inc., 2004. www.zforce.com. |