Skip to content

Using Rucio to find Protodune files at CERN

Steven Timm edited this page Feb 7, 2024 · 17 revisions

Thanks to Steven Timm

The Rucio Documentation: http://rucio.cern.ch/documentation/

Introduction--during the ProtoDUNE II run we will attempt to hold all current data on disk at CERN in the Rucio Storage Element DUNE_CERN_EOS. This is similar to what was done in ProtoDUNE I except that the directory structure has changed to have some hex hashes in the directory paths. Rucio is the way going forward and we want to get people used to using it.

As of the writing of this article in January 2024 not all DUNE users yet have read access to Rucio. The instructions will not work unless you contact Steven Timm [email protected] and ask him to add you to Rucio.

First setup DUNE software on lxplus7 (or any other rhel7 machine that has cvmfs)

(note at the moment these instructions only work on lxplus7 (rhel7/cc7 machines)because kx509 is only available for cvmfs in that platform)

source /cvmfs/dune.opensciencegrid.org/products/dune/setup_dune.sh

Setting up larsoft UPS area... /cvmfs/larsoft.opensciencegrid.org
Setting up DUNE UPS area... /cvmfs/dune.opensciencegrid.org/products/dune/

Then setup python 3_9, required for rucio v33_3

setup python v3_9_15

(note if you have dunesw set up already this is the python that comes with that. Necessary because system python on rhel7 machines is too old to work with rucio)

Then setup the rucio package

setup rucio setup kx509

Next get your x509 grid proxy.

One can get their grid proxy by doing kx509 with a valid kerberos ticket.

At CERN this requires use on lxplus7 and first doing

kdestroy
kinit <username>@FNAL.GOV
kx509

(it is not necessary to do a voms-proxy-init unless you want to upload files to Rucio, which regular users do not have the rights to do).

Define your Rucio account name – it should be your FNAL account

export RUCIO_ACCOUNT=<username>

Check to see if you can authenticate against the Rucio server

rucio whoami

created_at : 2021-10-19T20:01:12
account    : benjamin
status     : ACTIVE
email      : None
deleted_at : None
updated_at : 2021-10-19T20:01:12
account_type : USER
suspended_at : None

How to find the locations of a file in EOS if you know the name of the file already:

In Rucio/MetaCat files are specified by scope:filename For ProtoDUNE II the 4 scopes that matter are vd-coldbox, hd-coldbox, vd-protodune, and hd-protodune

So for instance to find the location of file np02vdcoldbox_raw_run023784_0000_dataflow0_datawriter_0_20240118T140023.hdf5

you would say rucio list-file-replicas vd-coldbox:np02vdcoldbox_raw_run023784_0000_dataflow0_datawriter_0_20240118T140023.hdf5

+------------+------------------------------------------------------------------------------+------------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | vd-coldbox | np02vdcoldbox_raw_run023784_0000_dataflow0_datawriter_0_20240118T140023.hdf5 | 13.390 MB | 308b18ba | DUNE_CERN_EOS: root://eospublic.cern.ch:1094//eos/experiment/neutplatform/protodune/dune/vd-coldbox/13/71/np02vdcoldbox_raw_run023784_0000_dataflow0_datawriter_0_20240118T140023.hdf5 | +------------+------------------------------------------------------------------------------+------------+-----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

You can access the file either by the root:// URL or by going directly on to EOSPUBLIC. Once the file has been replicated to Fermilab a URL at Fermilab will show up too.

Eventually these will also be visible in SAM but that is going to take some work.

Example: Check which RSE’s the dataset found using the scope and dataset discovered from a Metacat query

for example - dc4-vd-coldbox-bottom:dc4-vd-coldbox-bottom_307151901

rucio list-dataset-replicas --deep dc4-vd-coldbox-bottom:dc4-vd-coldbox-bottom_307151901

DATASET: dc4-vd-coldbox-bottom:dc4-vd-coldbox-bottom_307151901
+-------------------------+---------+---------+
| RSE                     |   FOUND |   TOTAL |
|-------------------------+---------+---------|
| PRAGUE                  |      60 |      60 |
| DUNE_ES_PIC             |      60 |      60 |
| DUNE_CERN_EOS           |      60 |      60 |
| DUNE_US_FNAL_DISK_STAGE |      60 |      60 |
| MANCHESTER              |      60 |      60 |
| DUNE_US_BNL_SDCC        |      60 |      60 |
+-------------------------+---------+---------+

Example: List the datasets at a given RSE for example DUNE_ES_PIC

rucio list-datasets-rse DUNE_ES_PIC