Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for login with user/password to access restricted dataset #303

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gmaze
Copy link

@gmaze gmaze commented Apr 25, 2023

Hi !
I recently needed to access a protected dataset on Ifremer server and I had to implement a solution for argopy (work in progress in here euroargodev/argopy#256)

I figured that could be useful for erddapy as well.
At this point, the only scenario that we can easily handle is for an erddap server using a simple login form with user/password. All other login solutions available to erddap are 2 or 3 legs authentication mechanisms.

In this use case (an erddap server using a simple login form with user/password), what we need to access a protected dataset is to make the get request during a httpx session where the user has been logged-in, using the online form.

I gave it a try (using a test server here at Ifremer) and modified the core.url._urlopen method to post user/password information before sending the get request.
user/password are retrieved from environment variables "ERDDAP_USERNAME" and "ERDDAP_PASSWORD".
It seems to work !

What do you think ?

allow for login user/password
@gmaze gmaze marked this pull request as draft April 25, 2023 14:56
@gmaze gmaze changed the title Allow for login with user/password to access restricted Allow for login with user/password to access restricted dataset Apr 25, 2023
@ocefpaf ocefpaf marked this pull request as ready for review April 25, 2023 15:22
@ocefpaf ocefpaf marked this pull request as draft April 25, 2023 15:32
@ocefpaf
Copy link
Member

ocefpaf commented Apr 25, 2023

@gmaze let us know when you are done here. I can take over to fix the small details when you do.

@gmaze
Copy link
Author

gmaze commented Apr 26, 2023

@ocefpaf I think we need to discuss here about the most appropriate design for this in erddapy. Sorry for the long post.

Access to protected dataset is not the typical use case, so I don't want to detriment performances with a systematic login attempt. But the fact that the ERDDAP class does not handle http requests directly, but only builds url, and delegate to interfaces data fetching task, makes this new feature difficult to implement.

They are several possibilities:

  1. Optional arguments at the ERDDAP instantiation level:
e = ERDDAP(server="https://gliders.ioos.us/erddap", login={"user": "john_doe", "password": "please"})

user/password are then carried by the e object and could be forwarded internally to the core.url._urlopen method via the requests_kwargs of all interfaces. When a 1st fetch attempt raises a 401 HTTPerror, then login is triggered. But this would require to modify ALL the interfaces ...

  1. Optional arguments at the interfaces level:
e = ERDDAP(server="https://gliders.ioos.us/erddap")
# [...]
df = e.to_pandas(login={"user": "john_doe", "password": "please"})

the login argument could be passed to core.interfaces.to_pandas with the requests_kwargs, and followed to core.url.urlopen and core.url._urlopen where, if detected, login will be attempted, otherwise avoided (by default). This would also require to modify ALL the interfaces.

  1. Optional arguments at the ERDDAP instantiation level and new attribute to use a single http session:
e = ERDDAP(server="https://gliders.ioos.us/erddap")
e = ERDDAP(server="https://gliders.ioos.us/erddap", login={"user": "john_doe", "password": "please"})

In both cases, when the instance is created, a new httpx Client instance is also created and attached to the e object. This Client (which acts as a http "session") will then be passed internally to interfaces down to the core.url._urlopen. If the login argument is detected, then furthermore try to connect the user at ERDDAP instantiation. This design would also require to modify ALL the interfaces (but only internally, not for users), but would also modify the overall erddapy design that makes clear separation of the ERDDAP url builder from url requests.


All these possibilities require quite some work.
There is no perfect solution because, in practice, access level is controlled on the erddap server for each dataset ID, and erddapy does not have properties for each dataset.

From the user perspective, I think solutions 1 or 3 are the most appropriate, because users would have to provide login details only once.
From the erddapy perspective, solution 2 is the one the most inline with the current design of limiting data fetching details to interfaces.

@ocefpaf
Copy link
Member

ocefpaf commented Apr 26, 2023

From the user perspective, I think solutions 1 or 3 are the most appropriate, because users would have to provide login details only once.

Agreed,

From the erddapy perspective, solution 2 is the one the most inline with the current design of limiting data fetching details to interfaces.

For the current code? Yes. But we do want to improve upon that. With that said. I know that folks want to improve that upstream, making scripting access to password protected ERDDAP servers a bit easier. That means whatever we do here may change soon. Making choosing the easiest implementation a bit appealing and let future us worry about a more permanent solution.

What do you think?

Ping @abkfenris here who usually have a clearer view than me of the best way around these issues.

@gmaze
Copy link
Author

gmaze commented Apr 27, 2023

I know that folks want to improve that upstream, making scripting access to password protected ERDDAP servers a bit easier. That means whatever we do here may change soon.

Ok, so may be a simple solution like the one proposed in this PR could make the trick until this changes upstream...

In this case, the key missing is a mechanism to determine if we should try to login or not, in order to avoid performance degradation for the vast majority of unprotected dataset

I'll try to figure something out

@abkfenris
Copy link
Contributor

abkfenris commented Apr 27, 2023

I personally don't like the idea of having fixed environment variables for erddapy. ERDDAP servers can be thought of as more of a federation, rather than a centralized service like AWS. As such it's unlike that a user will end up having the same username/password combination on multiple servers. By pulling the environment variables for auth, it makes it hard to change credentials per server. I think it's also likely that this implementation will try to leak credentials to other servers.

In the work towards refactoring (#228), I sketched out an ERDDAPConnection class that could be overridden to provide different auth methods which could be set at a server or dataset level.

@ocefpaf
Copy link
Member

ocefpaf commented Apr 27, 2023

As such it's unlike that a user will end up having the same username/password combination on multiple servers. By pulling the environment variables for auth, it makes it hard to change credentials per server. I think it's also likely that this implementation will try to leak credentials to other servers.

I guess that we are biased b/c we hit only 1 password protected server most of the time. However, you are correct.
We can implement just a user/pass directly in the function and let the user decide how they'll hide it with env vars or something else.

sketched out an ERDDAPConnection class that could be overridden to provide different auth methods which could be set at a server or dataset level.

While that is a better way I'm afraid of investing time on it and then ERDDAP implements something that would render it obsolete. We should touch base with ERDDAP devs upstream develop a roadmap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants