C. A. R. M. E. N. :
A General-Purpose Computing Facility For
Scientific Research Using OSCAR
B. Becker a M. J. N. Horner b
aUCT-CERN Research Centre, Department of Physics, University of Cape Town,
Rondebosch 7701, Cape Town, South Africa.
bCurrently based at Lawrence Berkeley National Lab, 1 Cyclotron Road, Berkeley
Many fields of scientific research today require large amounts of computing power
in order to simulate or analyse large data sets. The UCT-CERN Research Centre
has used OSCAR to build a general-purpose computing facility in order to fulfill its
commitment to the ALICE experiment at CERN. The system, dubbed CARMEN1
has been found to be general enough to become an established tool for a diverse
group of scientists from ultra-relativistic heavy-ions physics to trauma unit medical
The UCT-CERN Research Centre 2 at the University of Cape Town, South Africa,
is involved in aspects of ultra-relativistic heavy-ion nuclear physics which range
from theoretical physics to software engineering and data analysis for current and
future heavy-ion collider experiments. The purpose of this field of research is
to explore Quantum Chromodynamics (QCD), the fundamental theory of strong
interactions which governs nuclear behaviour, in a novel regime of extremely high
temperature and pressure similar to the conditions thought to have prevailed at
the very earliest stages of the universe. It is predicted that in such regime the
fundamental constituents of nuclear matter which are normally bound inside stable
nuclei become free and form a deconfined state, the Quark-Gluon Plasma (QGP)
The combined knowledge gained from current experiments and studies of the interactions
of all particles with various materials has allowed the community to build
detailed simulations of the expected physics environments in future experiments as
well as simulate and test the design of future detectors. Particle physics and related
1 Cluster for African Research in Massive Energetic Nuclei
2 UCT-CERN Research Centre home page at http://hep.phy.uct.ac.za
Preprint submitted to OSCAR symposium at HPC04, Winnipeg 2 November 2004
fields involve many random processes and are thus well suited for Monte-Carlo simulations.
A number of standard packages exist which are used to predict the number
and spectra of particles created in heavy-ion collisions and to model the response
of various detectors to these particles. One of the most popular, HIJING 3 , predicts
104 − 105 particles produced per simulated collision of PbPb nuclei (hereafter
referred to as “events”) at 5.5 TeV4 .
Usually, in order for a Monte-Carlo study not to be limited by statistical noise, one
needs of the order of 104 such events. Typical simulations of expected physics and
detector response produce data files of about 1 GB. Required data samples thus
reach sizes in the TBs and for systematic studies to study the model dependance
of predictions, this increases by 2-3 orders of magnitude. In addition, the ALICE
experiment 5 at the LHC is designed to record an average of about 1-2 PB of data per
ALICE year 6 . To ensure that efficiencies are calculated properly and to eliminate
false signals, the real data set must be complemented by a simulated sample of
similar size. Of course, the data then has to be processed and analysed, which
requires enormous computing power. It was for this reason that the ALICE Data
Grid is being implemented, in order to provide distributed storage and computing
for the ALICE experiment.
It was decided in 2002 that a computing centre would have to be built in order to
facilitate the work at the Centre, which had two goals. The primary and long-term
goal was to become a Tier2 7 centre of the ALICE Data Grid. A secondary goal was
to provide the users at the UCT-CERN Research Centre with a general computing
facility in order to perform simulations of heavy-ion collisions and theoretical
models of such events[17,4,5].
After a brief investigation into field of high performance computing, a Linux-cluster
setup was decided upon. The reasons for this choice were primarily
• the high performance to cost ratio, using COTS8 components and open-source
software, and the fairly restricted funding sources.
• the existence of a set of tools already developed in the market of cluster computing[
• the potential for customization of the system, given the open-source nature of
most tools and
• the potential for using the final system as an instructive teaching environment for
3 HIJING: Heavy Ion Jet INteraction Generator
4 nominal values quoted for Large Hadron Collider energies
5 ALICE : A Large Hadron Collider Experiment - the deidicated heavy ion experiment
at the Large Hadron Collider, CERN
6 The LHC will only collide heavy ions for about 1 month of the year, with the rest of
the time dedicated to other species
7 The ALICE Data Grid is composed of hierarchical Tiers. Tier0 is CERN itself, where
the data is stored initially. Tier1 are major computing centres where copies of all the data
are stored, while Tier2 are university and smaller labs around the world, where processing
power is concentrated
8 Commodity Off The Shelf
Due to our close collaboration with CERN9 , which officially uses Red Hat Linux,
we decided on this distribution of the Linux operating system.
2 Design Considerations and Requirements
Apart from financial constraints, the design of the computing centre was founded
on the basis of our primary and secondary goals, which translated into a need for a
specific solution as well as a more general one.
1 The need for a specific solution
The work done at the UCT-CERN Research Centre for the ALICE experiment
had to be co-ordinated by the ALICE Data Grid, and it’s computing environment
AliEn. This required a specific solution which translated roughly into the following
• NFS-mounted, writable /home directories
• large storage capacity, of the order of a TB.
• writable disks near to the CPUs 10 (i.e. local storage as well as network-attached
• a reasonably fast network connection on the cluster subnet and a stable network
connection of the server to the world-wide web.
2 The need for a general solution
In order to satisfy the diverse needs of the users at the Research Centre, the design
of the computing facility had to provide a scalable, flexible and cheap solution with
a simple interface to interactive use, as well as batch-queue systems. The possibility
of collaborative use with other research groups in the university also had to be left
open, which put a constraint on the level of optimization possible.
The above design considerations led us to settle on the OSCAR toolkit, since it
provided ready-made solutions to the problem of cluster creation and maintenance
with tools for semi-automated server configuration, node imaging, networking, administration[
10] and monitoring and interfaces to an array of distributed memory[
12] and batch-queue applications.
9 Centre Europ´een pour la Recherche Nucl´eaire - the European Centre for Nuclear Physics
10 Typical disk I/O rates of MC simulations are about 100 MB/min, per event, which
translates to a required bandwidth of 25 MB/s when all nodes are accessing NFS-mounted
volumes. This is easily saturated on a 10/100 MB/s subnet, hence the requirement to write
data to a local disk, instead of an NFS-mounted one
1 Physical and Network Setup
The 20 nodes 11 are stacked on a 5-level rack, capable of 6 boxes per level and 3
switches. Ethernet cables machined into the frame of the rack allow the nodes are
connected to the server via a fast Ethernet switch, capable of handling a total of 26
connections. This number is made up of 24 10/100 MBps ports and 2 gigabit ports
to the backplane, which are unused.
The /home directories on the server are all NFS mounted on the nodes, as well as
a 500 GB capacity partition on the RAID array. This acts as redundant massstorage
to which data which is generated on the nodes is copied and stored. This
external hardware RAID system is permanently mounted and in semi-permanent
use, which causes frequent disk failures 12 . There are plans in the near future to
extend the mass-storage to include a distributed network-attached, fault-tolerant
storage system, “clusterRAID”.
2 Data Storage Capacity
In order to satisfy our requirement of large storage space, a SCSI RAID controller
with 8 IDE disk capacity was used as mass storage. The controller currently houses
100 GB disks, giving a usable storage space of 650 GB. This could be upgraded
easily by using higher capacity disks. Provision was made for local data storage on
a /data partition, which is used for staged-in files, temporary writing during long
jobs (auto-saving) and distributed data storage. The distributed disk space is about
1.4 TB, which along with the RAID brings our total storage space to about 2 TB13 .
4 Load-Balancing and Queuing
Simply stringing 20 computers together and calling it a cluster does not solve any
particular problem; this involves using them efficiently. For this purpose, the batchqueue
system OpenPBS comes shipped with OSCAR, as well as an open-source
scheduler, MAUI. We chose however to take advantage of PBSPro’s educational
lincense and fee waiver. This version of PBS is stable, scales well to very many
nodes, and allows many scheduler optimisations. Integration of PBSPro with other
OSCAR software was simple and painless.
11 full description of the nodes available elsewhere
12 It is not clear yet whether the disk failures are due to a batch of faulty disks, incompatible
scsi driver or a faulty controller. The problem is under investigation
13 This is the current situation; future upgrades to clusteRAID are not discussed yet.
5 Discussion - Applications of CARMEN
1 ROOT - based code
Much of our work is high-statistics Monte-Carlo heavy-ion event generation, detector
simulation and analysis. The basis of most of the toolkits used for this work is an
object-oriented analysis framework developed at CERN, ROOT. This framework
is fast becoming the favorite tool amongst a diverse group of physicists and and many
experiment-specific toolkits have been built around it 14 , in particular[21,16]. ROOT
provides a parallel analysis facility which integrates very well into a Linux-cluster
ROOT also provides common interfaces to different Monte-Carlo generators for both
physics collisions and particles interactions in materials. Once the particle spectra
have been determined or predicted these can be placed in full detailed simulations of
the detectors if sufficient parametrisations of particle interactions in materials exist.
This allows complete design studies to be done for vastly reduced costs compared
to a full-blown prototype. This is not restricted to heavy-ion physics.
2 ALICE dimuon High-Level Trigger Prototyping
As part of our commitment to the ALICE experiment, the UCT-CERN Research
Centre’s has the responsibility for implementing the High Level trigger for the
dimuon arm (dHLT). This is a software and hardware system which interfaces with
the front-end electronics of the ALICE dimuon arm detector, in order to calculate
on the fly whether an interesting signal has been detected in a particular event or
not, in order to reduce the data-to-tape cost of the experiment. It implements a
data transport framework developed at the University of Heidelberg and integrates
with a trigger algorithm which is implemented partly on FPGA’s and partly
on CPU’s of cluster nodes. OSCAR has allowed us to quickly and easily build the
properly networked cluster needed in order to build the dHLT prototype.
The first large-scale, non heavy-ion Monte-Carlo study done on CARMEN was of the
Low Dosage X-Ray (LODOX) 15 . Using GEANT, a model of the imaging process of
this high resolution, low dosage scanning X-Ray machine for use in medical trauma
units is being constructed. The purpose of an initial study in this project was to
predict the image quality for a given detector and source configuration. The study
was completed in about 2 months, with the user having the bare minimum knowledge
of Linux and scheduling systems. A similar study with a prototype machine would
have taken at least 4 months and the costs would have been prohibitive.
14 See http://root.cern.ch/root/ExApplications.html
15 A joint project between LODOX and the Faculty of Biomedical Engineering at UCT
4 Other Applications
A system like CARMEN, built completely on open source software, lends itself
naturally to the widest possible use by researchers. Apart from the large projects
described above, some smaller projects have made use of our system. These are
mainly in the form of locally compiled custom code, usually Monte-Carlo based.
Most recently, a full study of core-softened fluid dynamics was completed in less
than 1000 hours of CPU time, including compilation and setup 16 . A similar-sized
study would’ve taken of the order of 2 months on a single CPU. This one case
demonstrates the flexibiltiy and usefulness of a system like CARMEN.
With the availability of OSCAR and a detailed simulation and analysis framework,
we have been able to quickly produce a computing solution to meet the needs of
our Centre. In addition to providing our own computing solution it has attracted a
diverse community of scientists together and has become the proof-of-concept for a
much larger initiative to build a 1000 node computing farm in the near future, the
Centre for High Performance Computing in South Africa.
During the time of that this work was done, the authors were supported by an
National Research Foundation grantholder-linked bursary. The cluster was built
with a hardware bought with a UCT University Research Council Grant and a
grant from the National Research Foundation.
 ALICE Experiment Technical Proposal : CERN/LHCC-95-71 LHCC/P3 (home page
 T. Matsui and H. Satz, Physics Letters B 178, 416 (1986)
 European Centre for Nuclear Research : http://cern.ch
 M. Gyulassy and X-N. Wang, Comp. Phys. Commun 83, 307 (1994), X-N. Wang and
M. Gyulassy, Phys. Rev. D44, 3501 (1991)
 T. Sj¨ostrand et. al, Comput. Phys. Commun. 135, 238 (2001)
16 Again, the user started from the bare minimum of knowledge of PBS and Linux in
 T. Naughton et. al, Proceedings of“Commodity, High Performance Cluster Computing
Technologies and Application”, Orlando (FL), USA (2002) oscar.sourceforge.net
 Alice Environment system home page : http://alien.cern.ch
 System Installation Suite : http://sisuite.org/
 M. Brim, R. Flanery, A. Geist and S. Scott, “Parallel and Distributed Computing
Practices”, DAPSYS Special Edition (2002)
 Ganglia home page : http://ganglia.sourceforge.net
 http://www.lam-mpi.org/, http://www-unix.mcs.anl.gov/mpi/mpich/,
 www.openpbs.org/, http://www.supercluster.org
 Arne Wiebalck, Peter T. Breuer, Volker Lindenstruth, and Timm M. Steinbeck,
”Fault-tolerant Distributed Mass Storage for LHC Computing”, in Proceedings of
the Third IEEE/ACM International Symposium on Cluster Computing and the Grid
(CCGrid 2003) 12-15 May 2003, Tokyo, Japan, pp. 266-273. http://www.kip.uniheidelberg.
 Timm M. Steinbeck, “A Modular and Fault-Tolerant Data Transport Framework” -
 THERMUS; thesis of Spencer Wheaton (firstname.lastname@example.org)
 Red Hat Linux distribution home page : http://www.redhat.com
 R. Brun, F. Rademakers, Nucl.Instrum. Meth. A 389, 81 (1997)
 “The PROOF Distributed Parallel Analysis Framework based on ROOT” M.
Ballantijn, G. Roland, R. Brun and F. Rademakers, presented at “Computing in
High Energy and Nuclear Physics”, (CHEP03), La Jolla, CA, USA, March 2003.
 PHOBOS Experiment computing home page : http://www.phobos.bnl.gov/Phat/
 GEANT Detector Simulation and Description Tool : CERN Program Library Long
 P. Mausbach and H.-O. May, Fluid Phase Equilibria 213 (2003)