
This site presents a technology for the distribution of software in a computational grid. It makes use of a special feature of the ARC middleware, which allows grid jobs to specify Runtime Environments (REs) as part of their job description.
Prior to the here presented development, it was cumbersome to install software on the grid and subsequently keep it up to date as a large part of this work has to be done manually. This put a heavy load on the grid sites' administrators and thus renders resources unusable because of lacking software.
The here presented Janitor improves this situation in such a way that REs are installed automatically and dynamically. Automated means that the installation process will run without human supervision. Dynamic means that the installation might be started upon job submission to a compute element. In this case the deployed RE is removed if it was not used for some time.
To implement Dynamic REs, a new service was introduced that is referred to as the Janitor. The information about the REs is stored in a database called the Catalog. We decided on using RDF as the framework for the Catalog. The Janitor itself is written in Perl using only two non-standard packages: Log::Log4perl and RDF::Redland. The package RDF::Redland is used for parsing the Catalog. Log::Log4perl is optionally used for logging.
This technology can be added to an existing installation of ARC with no risk and it can help administrators to install ARC runtime environments right from the first minute. Only the dynamics by an automated initiation of the commands by the Grid Manager do need a few lines to be patched that have not yet found their way to the ARC middleware's main distribution.
The interface of the janitor is simple. There are four commands, which are meant to be executed by the grid-manager:
To remove unused REs the command "Janitor.pl sweep" is used.
To integrate with the Grid Infosystem the Janitor has an additional interface. The Grid Infosystem retrieves its informations by executing some Perl Scripts. As the Janitor is also written in Perl, the integration is done with help of a small Perl module called RuntimeEnvironments.
The current implementation of the Janitor is targeted at the small site case, i.e. it uses a shared directory for deploying software. Support for other environments will be added. Feedback on this is welcome.
The Catalog describes in RDF how to deploy REs. It contains three different types of entities. These are called MetaPackage, Package, and BaseSystem.
The MetaPackage entities describe the REs. In the current system used within the NorduGrid community REs consists of an ID, version information and an informal description. Also a link to a web page describing how to manually install the particulare RE is provided. In the Catalog this information is stored in MetaPackage entities. Additionally the MetaPackage entities have associated Package entities which describe how to deploy automatically.
The BaseSystem entities are used to distinguish different operating systems of the worker nodes. The only mandatory value they have is their name. But if virtualisation is used, then the BaseSystem may also have a URL value, which describe where to find a minimal image of this kind of basesystem. In the non-virtualised case this node is only used as a hook.
The Package entities describe how to deploy some MetaPackage on some BaseSystem. So each MetaPackage links to possible multiple Packages which in turn link to exactly one BaseSystem.
Actually, there are several subtypes of Packages. The subtype implicitly describes how to install a Package. One of these subtypes is the TarPackage. It has a URL describing where to find a tar file containing the needed software. Another subtype is DebianPackage. This subtype has no URL attribute but a list of Debian packages, which must be installed to provide the requested RTE. Further Package nodes may depend on each other and be implicitly installed.
As the Catalog is in RDF the most comfortable way to edit it is using Protégé. For the user of a grid site it is translated into HTML. This HTML page for grid.inb.uni-luebeck.de is here available. Please note that REs which are either forbidden by the admin or not installable are not listed. The RDF file itself is available here.
To use the Janitor some changes to the arc.conf are necessary. An example is given below.
[janitor] logconf="/opt/janitor/work/log.conf" registrationdir="/opt/janitor/work/reg" installationdir="/grid/runtime/janitor" downloaddir="/opt/janitor/work/download" jobexpirytime="604800" catalogrefresh="90" uid="janitor" gid="janitor" allow_base="*tar*" allow_rte="*" deny_rte="*/JAVA/*" [janitor/nordugrid] catalog="/opt/janitor/catalog/knowarc.rdf" source="http://dre.knowarc.eu/knowarc.rdf"
[janitor] is the main section. In it the position of the Log4perl config file, the registration directory and the installation directory is set. Also the uid and gid to use is given. The section [janitor/nordugrid] describes where to find the Catalog named "nordugrid". Multiplie Catalog sections are supported.
After startup all entries of the Catalog are disabled. The allow_*-entries are used to enable desired ones. After enabling the deny_*-entries are interpreted to again forbid some entries.
Currently the Janitor only supports tar packages. Such a package contains two directories: data/ and control/. To install the data/-directory is extracted and the script control/install is executed. The file control/runtime is used as a template for the runtime script. Before removal control/remove is executed.