A Command Line tool for managing data in Globus Search as well as transferring corresponding data to and from a Globus Endpoint.
Globus Pilot has now been retired as a method of cataloguing records in Globus Search in favor of new and better methods. Most search record keeping is now automated through the use of Globus Flows: https://gladier.readthedocs.io/en/latest/gladier_tools/publish/publishv2.html
Pilot requires python 3.6+, you can install with the following:
pip install globus-pilot
See the Read-The-Docs Page for more options.
For a full walkthrough, see the User Guide. Administrators can also view the Admin Guide.
A quick walkthrough is below.
First, login using Globus:
pilot login
Set your Search Index:
pilot index set <myindex>
Then choose your project. See pilot project info
for info on any listed project:
pilot project
pilot project set <myproject>
You can use list
to get a high level overview of the data:
pilot list
If you want more detail about a specific search record, you can use describe
to view details:
pilot describe dose_response/rescaled_combined_single_drug_growth
You can also download the data associated with the search record:
pilot download dose_response/rescaled_combined_single_drug_growth
When you want to add more data to the collection, you can use the upload
command. This will upload the
data in addition to creating a record in Globus Search to track it.
touch my_data.tsv
pilot upload my_data.tsv test_dir --dry-run --verbose -j my_metadata.json
The two flags '--dry-run --verbose' are optional but handy for testing. '-j my_metadata.json' is for providing any extra metadata the pilot tool can't automatically determine. Here is an example of the metadata:
{
"title": "Drug Identifiers",
"description": "Drug identifiers, including InChIKey, SMILES, and PubChem.",
"data_type": "Drug Response",
"dataframe_type": "List",
"source": [
"InChIKey",
"SMILES",
"PubChem"
]
}
Ensure packages in test-requirements.txt are installed, then run:
pytest
And for coverage:
pytest --cov pilot