3. User guide

This user guide is meant for researchers who would like to structure their data analysis and run them on REANA cloud.

3.1. Reusable analyses

The revalidation, reinterpretation and reuse of research data analyses requires having access not only to the original experimental datasets and the analysis software, but also to the operating system environment and the computational workflow steps which were used by the researcher to produce the original scientific results in the first place.

3.2. Four questions

REANA helps to make the research analysis reusable by providing a structure helping to answer the “Four Questions”:

  1. What is your input data?
    • input files
    • input parameters
    • live database calls
  2. What is your environment?
    • operating systems
    • software packages and libraries
    • CPU and memory resources
  3. Which code analyses it?
    • analysis frameworks
    • custom analysis code
    • Jupyter notebooks
  4. Which steps did you take?
    • simple shell commands
    • complex computational workflows
    • local or remote workflow step execution

3.3. Structure your analysis

It is advised to structure your research data analysis repository into “inputs”, “code”, “environments”, “workflows” directories, following up the model of the Four questions:

$ ls .
code/mycode.py
docs/mynotes.txt
inputs/mydata.csv
environments/mypython/Dockerfile
workflow/myworkflow.cwl
outputs/
reana.yaml

The reana.yaml describing this structure look as follows:

version: 0.2.0
code:
  files:
  - code/mycode.py
inputs:
  files:
    - inputs/mydata.csv
  parameters:
    myparameter: myvalue
environments:
  - type: docker
    image: johndoe/mypython:1.0
workflow:
  type: cwl
  file: workflow/myworkflow.cwl
outputs:
  files:
  - outputs/myplot.png

Note that this structure is fully optional and you can simply store everything in the same working directory. You can see some real-life Examples for inspiration.

3.4. Use REANA client

REANA is coming with a convenience reana-client script that you can install using pip, for example:

$ # install reana-client
$ mkvirtualenv reana-client -p /usr/bin/python2.7
$ pip install reana-client

You can run reana-client --help to obtain help.

There are several convenient environment variables you can set when working with reana-client:

  • REANA_SERVER_URL Permits to specify to which REANA cloud instance the client should connect. For example:
$ export REANA_SERVER_URL=http://reana.cern.ch
  • REANA_WORKON Permits to specify a concrete workflow run for the given analysis. (As an alternative to specifying --workflow name in commands.) For example:
$ export REANA_WORKON=myanalysis.17

The typical usage scenario of reana-client goes as follows:

$ # create new workflow
$ export REANA_WORKON=$(reana-client workflow create)
$ # upload runtime code and inputs
$ reana-client code upload ./code/*
$ reana-client inputs upload ./inputs/*
$ # start workflow and check progress
$ reana-client workflow start
$ reana-client workflow status
$ # download outputs
$ reana-client outputs list
$ reana-client outputs download myplot.png

For more information, please see REANA-Client’s Getting started guide.

3.5. Examples

This section lists several REANA-compatible research data analysis examples that illustrate how to a typical research data analysis can be packaged in a REANA-compatible manner to facilitate its future reuse.

3.6. Next steps

For more information, you can explore REANA-Client documentation.