--DocumentContent--
NAVIGATION MENU
-------------------
User Menu
Home
FAQ
News
Known Problems
Message Board
The Project
The Status of Replication Server
Globus wrappers
General Scheme
Job Status panel (authorization required)
PHENIX Grid Status
Mailing Lists
Data disk space status
Backed UP data history
 
Manager Menu
Changes LOG
Cron Menu
Manage Board
Manage Mailing Lists

RAMData Project

This page and the following documents are meant to tell you about the RAMData project history and how it came to life. Although, put in those words, this sounds boring to death, we will put here documents such as
  • The project as submitted to the DOE.
  • The Chemistry Day poster (PR, things have changed since).
  • Companies we have contacted for this project. Tips on what (who) to avoid and what to get. This will be based on our personal experience.
    • RAID technology. What to get ?
    • How to store/archive data efficiently. Our approach and why we chose it.
    • Software we really like and would advise you to get.
  • Final quote pricing.
    • RAID array.
    • Tape Library.
    • Digital Server 4000.
    • Linux Racks.
  • Details on the actual setup (not bad to see the actual installations scripts):
    • Facility architecture.
    • Security issues.
    • Software layout.
    • Freeware layout.
    • Disk storage
    • Cloning an AFS tree (the RHIC/Phenix problem).
  • Pieces we really like.
  • Future plans and what one can also do.

Details on the actual setup

Facility architecture

Security issues

Software layout

Freeware layout

Cloning an AFS tree

For the RHIC/Phenix experiment, one of our goal is to have a copy of the relevant AFS tree containing the Phenix software. This cannot be done automatically if one has to change login scripts and path references by hand. In addition, one does not want to copy files indiscriminately. Copy of the entire tree takes a long time over the internet and this would be an over-kill (actually takes up to 12 hours). Finally, AFS has a bunch of special directory name such as @sys which would disappear if copied to a local disk.

To answer those problems, we developed a set of simple scripts which would then run automatically once a week and update what we have in our local AFS tree copy but only if the original file has been modified. The soft-link are re-created every time , the @sys are taken care of automatically etc ...
On our facility, we decided to copy the AFS trees in our /Ram partition. So, the path modifications are really simple : what was /afs/... before, becomes /Ram/afs... now ; each Linux boxes in our facility has their /opt/phenix pointing onto the local area.
Every Saturday (and Sunday), the following cron job is executed

1 0 * * 0,6 /Users/jlauret/afslog/cpafs.csh
You do not need to use the same approach that what is in cpafs.csh script but let us describe what we have there :
  • The script is executed in a separate directory and log files kept there, sorted by dated-directories.
  • The script is run under an account which locally has the same uid/gid than on the rcf.
  • Several tree branches are copied/updated onto our local disk. Those branches are defined in the variable zones. Please, note that you currently do not need as much to run the Phenix software locally (only the PHENIX_LIB area should in principle, suffice). But what we are doing also allows us to have a complete clone of the environment setup on the RCF.
  • The cron-job takes advantage of a more general command called TreeCopy, the core of the file and AFS tree copy. This script reports all operations into the standard output which is there redirected to a log file. The log file can be large (you can also redirect to /dev/null if you want) but might be the only way to trace back what has been done as we will later illustrate.
  • The /afs/rhic/phenix/login zone is treated separately. After execution, the ConvertPHScript script is executed on each updated file. We sort out the updated file using the TCsummary.pl script but you can easily emulate this with a one line awk, sed or whatever else.

For TreeCopy to consider that a file has to be updated, either of the following requirements has to be true :

  • The original file time stamp is greater than the local file's time stamp
  • The file size is different
Those tests should be sufficient to consistently update any tree.

The TreeCopy script also skips many trees defined in the perl associative array %EXCLUDE. Any directories with those names will be skipped and not imported. This limits the amount of file to transfer but is not the greatest way to implement exclusion for two reasons :
First, some directories in (let's say) i386_redhat61 are cross linked to equivalent trees in other Operating System's sub-directories. Those cross link are usually directories containing Operating System independent files (such as Pixmap files etc ...) but unfortunately, are not necessarily consistent i.e. does not point to directories within the same Operating System tree, that OS acting as a reference place holder. Our current list takes this into account.
Second, this is not suitable for a dynamic expansion of directories in AFS. To illustrate this, take for example the oodb tree : if for the moment, we have to copy some of the branches, (because of above existing cross link of include directories and only one place holder), this may break in future because the place-holder changes. The "ideal" solution would be to copy files and eliminating soft-links but this is not realistically realizable because of the amount of data which would be duplicated. Another approach (we will implement in future) would be to thread the directory tree reference while processing and attempt to bring all required reference directories in a later pass.

TreeCopy script should work perfectly for you as much as it works for us (although, improvements can be made as described) even though you will have some warning messages such as [fail] Link creation .... The reasons is that the afs tree is currently quite screwed up and some files are soft-linked to themselves while others are soft-link to nowhere. Other cases are files with characters such as ', spaces, combinations like `@ etc... You will see it all. In those case, TreeCopy will attempt to do a copy (and will succeed) but will automatically fail on self soft-link creation (file soft-linked to themselves). This minimizes mess duplication.

TreeCopy also performs a second pass attempt to copy/update files whenever a problem has occurred the first time. This was implemented to take into account diverse problems including AFS timeout, files being update at BNL while copy was on-going etc ... The final failure list will be output in a file named /tmp/TreeCopy.*. TreeCopy cannot eliminate all problems ... An example of this output (29-Jul-2000) is

Command: cp -f '/afs/rhic/phenix/PHENIX_ONCS/postmdc/online_distribution/bin/Linux.i686/daq_start.sh'
'/Ram/afs/rhic/phenix/PHENIX_ONCS/postmdc/online_distribution/bin/Linux.i686/daq_start.sh'
Error  : No such file or directory
The reason is self explanatory :
% ls -l /afs/rhic/phenix/PHENIX_ONCS/postmdc/online_distribution/bin/Linux.i686/daq_start.sh
lrwxr-xr-x   1 1980     system        84 Feb 12  1999
/afs/rhic/phenix/PHENIX_ONCS/postmdc/online_distribution/bin/Linux.i686/daq_start.sh@ ->
/afs/rhic/phenix/PHENIX_ONCS/postmdc/online_distribution/bin/Linux.i686/daq_start.sh
This file is a soft-linked to itself ...

TreeCopy re-creates all soft-link each time it runs in order to maintain full update and link integrity. Note that all soft-link are created as absolute path and relative path are all converted to absolute.

All of our scripts depend on the knowledge of what is the "string" to use in order to substitute the @sys path. This string can usually be passed as an argument, has a default value, and can be setup via the content of the file /Ram/conf/LinuxV (Linux version place holder). For us, the content of that last file is simply i386_redhat61 i.e. every link to @sys will be replaced by an equivalent i386_redhat61 link. Within this scheme, the cpafs.csh script is NOT bound to run on a Linux box. It can be executed anywhere, under any Operating system. On our facility, we have it running on our Digital Unix server.

Final notes:

  • An initial import of a tar/gzipped archive of the different trees or branches has been the method by which we brought the data over on our facility. If an update takes only a few minutes, complete copy takes several hours (or even days) depending on the size of the tree you want to bring. YOU DON'T WANT TO DO THAT !!! Create first an archive on the rcf, gzip it (to save internet transfer time), bring and unpack it on your facility, then run the update script to ensure you are in sync with the rcf.
    As a guideline, we have performed partial tree reconstruction on 29-Jul-2000. The first value in the above table corresponds to the full cloning time, the update time is the second value.
    BranchFull cloneUpdate
    /afs/rhic/phenix/PHENIX_ONCS 23 mnts 2 mnts
    /afs/rhic/phenix/PHENIX_BP 0.33 mnts 0.08 mnts
    /afs/rhic/phenix/PHENIX_LIB 2 hours, 15 mnts7 mnts
    /afs/rhic/phenix/PHENIX_CVS 45 mnts 3 mnts
    /afs/rhic/phenix/build    
    /afs/rhic/i386_redhat61   2 mnts
    /afs/rhic/asis   2 hours
    Although the above numbers depends on how good your AFS connection is, you can see that the difference between an update and a full cloning is striking. The worst section is, of course, the /afs/rhic/asis which in update mode already takes hours (I will never try to do a full cloning of this branch myself). You may even consider eliminating it from the zones variable of the cpafs.csh script. Since our goal is to clone not only the libraries but the environment as well, we keep it for the moment. Note that time are automatically evaluated by cpafs.csh and will be in a file named all.log.
  • The main problem with a Tree update is that if files are removed in AFS, you will still have a copy of those files in your local tree. Especially, conflicts may appear with source code and/or relocated include files depending on access precedence. The implication is that we rely on a reasonably good standing of the original file layout as well as good information about changes. For example, if one leaf of our tree is being reshaped at BNL, we would delete or move it locally prior to the tree update. A future release of the scripts distributed here will implement this feature.
  • What about /cern ??
    On our site, we soft-link /cern to our local copy /Ram/afs/rhic/asis/@sys/cern/ via a different procedure not listed here (where @sys is automatically sorted according to the same /Ram/conf/LinuxV scheme). This has to be done only once and included in our main kernel and system update procedure (one procedure does it all) BUT is not mandatory. You can also have the cernlib installed on each machine within your facility and/or copy it in a different area as you see fit.
  • What about /opt/phenix ??
    This is soft-linked to /Ram/afs/rhic/@sys/opt/phenix/ and updated as above.
  • The phenix_login.csh is updated everytime using ConvertPHScript in cpafs.csh and copied into /Ram/afs/rhic/phenix/login. This is not the best way to do it and will be improved later. All Phenix users on our facility executes the copy in this directory.

If you have comments and/or suggestions on how to improve our scheme, we would surely appreciate it. If we forgot to mention some details, please, let us know.