-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add docs and script to fetch rucio dataset files
- Loading branch information
1 parent
eef00cd
commit 37c18fe
Showing
2 changed files
with
50 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,16 @@ | ||
# Pull dataset from Rucio data lake | ||
# Interact with Rucio dataset files | ||
|
||
The following script assumes that all the files within a DID are present in a RSE (Rucio Storage Element), and that this RSE is accessible locally. | ||
- A DID is compose of a scope plus a dataset names in the `SCOPE:DataSet` format. | ||
- If the files are not present in the RSE, replicate the dataset on the desired RSE before running the script. | ||
|
||
Run the following bash script | ||
|
||
```bash | ||
> ./rucio_dataset_files.sh <SCOPE:DataSet> <output_file> <output_symlink_dir> | ||
|
||
# Example | ||
> ./rucio_dataset_files.sh calorimeter:training_data_hdf5 calorimeter_files.txt calorimeter_symlink_dir | ||
> cat calorimeter_files.txt | ||
> ls -l calorimeter_symlink_dir | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
#/bin/bash | ||
# | ||
# G. Guerrieri & E. Garcia (CERN) - Jun 2024 | ||
# | ||
# This script runs only on VEGA | ||
# | ||
# Usage - on a terminal run | ||
# > ./rucio_dataset_files.sh <SCOPE:DataSet> <output_file> <output_symlink_dir> | ||
|
||
set -e | ||
|
||
ds=$1 | ||
name=$2 | ||
location=$3 | ||
|
||
pw=`pwd -P` | ||
|
||
if [[ -f "${name}" ]]; then rm ${name}.txt; fi | ||
touch ${name}.txt | ||
|
||
if [ -d "${location}" ]; then echo -e "Directory exists. Exiting\n${pw}/${location}" ; exit 1 ; fi | ||
mkdir $location | ||
|
||
for file in `rucio list-file-replicas --rse VEGA-DCACHE $ds | awk '{ print $12 }' | sed 's|https://dcache.sling.si:2880|/dcache/sling.si|g'` | ||
do | ||
if [[ $file == "|" ]]; then continue; fi | ||
fileReduced=`basename $file` | ||
echo linking $fileReduced "..." | ||
link=$location/${ds/:/.}.$fileReduced | ||
ln -s $file $link | ||
echo ${pw}/$link >> ${name}.txt | ||
done | ||
|
||
chmod -R 777 $3 |