Eazy-Photoz EasyBuild Script and Test Cases

"EAZY is a photometric redshift code designed to produce high-quality redshifts for situations where complete spectroscopic calibration samples are not available", which is a pretty excellent project. However, it has a few issues that are illustrative of typical problems when developers aren't thinking in terms of operations. I like this software, it carries out a valuable scientific task with ease, which will be awesome in an HPC environment, especially when run as a job array. However, the developers have assumed a single-user, single-machine environment and haven't been very flexible with their design. Hopefully, these notes will make life easier for others in a similar situation.

1. There are no releases
It is perfectly acceptable to have a continuous process of code improvement that can be saved as commits to a repository. It is, in fact, very good practise to do so. However, it makes life a lot easier for people if one picks a point in time and commits a release version. Why? Because when software changes results can change. Following the instructions provided (download latest repository, make, and run) may mean that researchers will get different results using the same software, on the same machine, with the same datasets because the software has changed, and the researchers don't realise it. This is not conducive to the basic principles of science. Releasing versions makes a
world of difference in that regard. The sugggested process of updating the checked-out repository is not particularly helpful, as it is important that researchers be able to check against prior versions as well.

As a work around for this issue the following is offered as an EasyBuild script. It is far from ideal; the version number is date, based on the last commit to the repository. The source tarball has to be created manually. But it is better than the alternative, as it specifies a version (of sorts), a compiler, and dependencies used. The EasyBuild script will generate an LMod environment module.

easyblock = 'MakeCp'
name = 'eazy-photoz'
version = '20201002'

homepage = 'https://github.com/gbrammer/eazy-photoz/ '
description = """EAZY is a photometric redshift code designed to produce high-quality redshifts for situations where complete spectroscopic calibration samples are not available."""

toolchain = {'name': 'GCC', 'version': '8.3.0'}

# source tarball needs to be created manually,
# because of lack of proper releases
# git clone https://github.com/gbrammer/eazy-photoz
# tar cfvz eazy-photos-.tar.gz easzy-photos

builddependencies = [('binutils', '2.32')]

files_to_copy = ["*" ]

start_dir = 'src'

sanity_check_paths = {
        'files': ["eazy"],
        'dirs': [""]

moduleclass = 'phys'

2. Hard-coded Symlinks
It is typical in many environments to have an installation in significantly different location to the source-code. In multi-user environments it is pretty normal to separate access to the code, the binaries, and user datasets. In this case the hard-coded symlinks used in EAZY will cause problems as the expected files will not be present.

To work around this I have copied the symlinked files FILTER.RES.latest and FILTER.RES.latest.info from the filters directory. The source specifies the following symlinks in the input directory.

FILTER.RES.latest -> ../filters/FILTER.RES.latest
FILTER.RES.latest.info -> ../filters/FILTER.RES.latest.info
templates -> ../templates

A test-case is created with the input directory and the symlinks as included files. The suggested example, using HDF-N catalog Fernandez-Soto et al. 1999 will work with the following Slurm script. The name of modules may vary according to the system.

#SBATCH --job-name=eazy-photoz-test.slurm
#SBATCH --ntasks=1
#SBATCH -t 0:15:00

# Load the environment variables
module purge
module load eazy-photoz/20201002

cd inputs
mkdir OUTPUT
eazy # generates param file
eazy -p zphot.param.default