Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to log into server with wrong credentials times out instead of returning an error #192

Open
XANi opened this issue Nov 11, 2023 · 11 comments

Comments

@XANi
Copy link
Contributor

XANi commented Nov 11, 2023

when trying to connect to server

	conn, err := autopaho.NewConnection(t.mqttCtx, t.mqttCfg)
	if err != nil {
		return err
	}
	connectCtx, _ := context.WithTimeout(t.mqttCtx, time.Second*30)
	if err = conn.AwaitConnection(connectCtx); err != nil {
		return fmt.Errorf("error connecting to mq: %w", err)
	}

and password is wrong (or unset, btw why it doesn't take password from url.URL automatically but need separate call?) this will time out instead of returning authorization error. I've snooped on tcpdump and server is returning code 135 not authorized to the client but client just hangs till timeout.

Version: v0.12.0

@MattBrittan
Copy link
Contributor

MattBrittan commented Nov 11, 2023

This is by design; autopaho will just keep attempting to connect until you stop it (by cancelling the context or calling Disconnect). AwaitConnection is just a utility function to allow you to easily detect when/if the connection comes up. AwaitConnection was mainly added to simplify examples; I would not expect to be used all that often in production code (it's generally better to use the OnConnectionUp callback because the connection may drop/be re-established at any time).

The aim of autopaho is to establish/maintain a connection in whatever way it can, so it will retry until you stop it (if you want to manage the connection yourself then use paho). In many production environments there is little you can do when there is a connection issue (perhaps report it to some logging server) and retrying constantly is the right approach. We don't want to make assumptions about what action should be taken if there is an error (even in your case this issue could be caused by an issue on the server so retrying may eventually lead to a connection).

If you want to handle such errors then add an OnConnectError handler and take whatever action is needed to handle the issue.

btw why it doesn't take password from url.URL automatically but need separate call?

Partially because it can be confusing; for example when establishing a web-socket connection should the username/password be passed when establishing the http connection (e.g. Authorization header), in the MQTT connect packet, or both?.

@XANi
Copy link
Contributor Author

XANi commented Nov 12, 2023

This is by design; autopaho will just keep attempting to connect until you stop it (by cancelling the context or calling Disconnect). AwaitConnection is just a utility function to allow you to easily detect when/if the connection comes up.

Okay, I get it now. I was just looking for simple way to error out on first non-network-related error when the app was starting as "your credentials is wrong" is almost always non-recoverable/config error in my use case, thanks for the info.

btw why it doesn't take password from url.URL automatically but need separate call?

Partially because it can be confusing; for example when establishing a web-socket connection should the username/password be passed when establishing the http connection (e.g. Authorization header), in the MQTT connect packet, or both?.

Okay but Websockets already have option for customizing requests with WebsocketConfig and add required headers, and it's the only protocol that requires such double authentication.

It also blocks from having 2 different configs, say if someone wanted to migrate from old to new server with different creds.

@MattBrittan
Copy link
Contributor

I'm open to PR's on this but am keen to avoid handling lots of options (don't want this to get as complicated as the V3 client).

One approach might be to pass the URL to ConnectPacketBuilder; this would enable you to set the username/password pretty simply but also allow the flexibility to do other stuff too (for instance set Authentication Method based on the URL). If doing this we may as well add ClientConfig too (making the signature closer to AttemptConnection).

@XANi
Copy link
Contributor Author

XANi commented Nov 13, 2023

Maybe I'd describe how I hit it.

I was trying to use library (v0.12.0) via examples and I haven't noticed any for using username and password, nor any docs about it.

So I looked at what config struct exported, saw no user/password whatsoever and thought "well, it wants url.URL, that parses user and password from it just fine, surely the lib just uses it? After all different URLS might need different auth". So it "worked" on my test setup (I forgot that server I used for testing had enabled guest users) and didn't on actual target server so I digged thru the source code and found out how to do it "properly".

4ee8c92 fixed that discoverability issue (at least for me), I just think it's a bit ugly to have to throw

	if cfg.MQTTURL[0].User != nil && len(cfg.MQTTURL[0].User.Username()) > 0 {
		mqttTr.mqttCfg.ConnectUsername = cfg.MQTTURL[0].User.Username()
		p, _ := cfg.MQTTURL[0].User.Password()
		mqttTr.mqttCfg.ConnectPassword = []byte(p)
	}

boilerplate every time I want to just use URL alone. And that is obviously wrong for the rare case multiple URLs would require different credentials.

I guess the question is whether that kind of feature is desired.

If the assumption is "multiple server support is there just to support a cluster of identical machines", not for using it as crutch during infrastructure migration to server with different config then it isn't worth changing.

But if "you put URL of old server as first, new server as second, to allow seamless migration for clients when the old one is down" is desired use case, then the "Server" should be separate struct with URL + credentials + TLS config so there is no global server config values, just per server one.... and that makes config uglier for majority of cases where user doesn't need more than one server or they differ only in URL.

One approach might be to pass the URL to ConnectPacketBuilder; this would enable you to set the username/password pretty simply but also allow the flexibility to do other stuff too (for instance set Authentication Method based on the URL). If doing this we may as well add ClientConfig too (making the signature closer to AttemptConnection).

That sounds like good way to leave a way to do it without messing the config too much; I'll give it a go and throw a PR when I find a moment.

@MattBrittan
Copy link
Contributor

I'm open to retrieving the username from the URL; however, as I said, this can get confusing (see this workaround in the v3 client). As such neither solution is perfect...

Given the need to introduce a bit of a hack for the username/password within the URL to work with websockets I think my preference is for the modification to ConnectPacketBuilder I suggested (however I'm open to being convinced that another way is better!).

But if "you put URL of old server as first, new server as second, to allow seamless migration for clients when the old one is down"

The issue with this is that when using QOS1+ this can result in lost messages (because the sessions between the two servers will not be in sync). I was even considering only supporting a single URL (to try and avoid users making mistakes that lead to lost messages). Another alternative might be to have a NextServer() *url.URL type callback that the library calls to get the next server (for advanced users, with appropriate warnings added etc).

@hspaay
Copy link

hspaay commented Nov 17, 2023

Interesting discussion, Just ran into the connection issue as well.

The aim of autopaho is to establish/maintain a connection in whatever way it can, so it will retry until you stop it (if you want to manage the connection yourself then use paho). In many production environments there is little you can do when there is a connection issue (perhaps report it to some logging server) and retrying constantly is the right approach. We don't want to make assumptions about what action should be taken if there is an error (even in your case this issue could be caused by an issue on the server so retrying may eventually lead to a connection).

If you want to handle such errors then add an OnConnectError handler and take whatever action is needed to handle the issue.

Hmm, bad authentication is not really a connection error though, as the connection was successful.
As such there is no need to retry the connection as it was working in the first place.
A second point is that bad authentication is not something you want to retry as this is an application level problem and can only be resolved by the application.

I love the simplicity argument but in this case every single user of paho will now have to hook into OnError and implement a workaround for a use-case that is over-simplified.

I hope this makes a case for detecting authentication errors on connect and return an error instead of retrying until the cows come home.

@MattBrittan
Copy link
Contributor

bad authentication is not something you want to retry as this is an application level problem and can only be resolved by the application.

This is one perspective; however you are assuming that the issue is client side. A server side issue is also possible (and potentially has a much larger impact); for example:

  • You are adding a user to the Mosquitto passwd file and delete a line in error; this de-authenticates as significant number of clients
  • AWS/Azure/Hive etc has a temporary auth issue and briefly denies all incoming auth requests.

In my case this has the potential to knock hundreds of clients offline; some of these cannot be accessed remotely (e.g. behind NAT) so, without retry, would need to be physically visited to re-establish connectivity. Hence I feel that the safer approach, for a production system, is to retry (so I, for one, am not one of the "every single user of paho" that has to implement a work around :-) ).

There are also a range of other issues with returning on an auth (or any other error) including:

  • NewConnection is async; it may take days for a connection to come up (not uncommon with dodgy network connections) so blocking that long does not seem appropriate.
  • Multiple servers are supported; auto failing on one does not mean that it will fail on the second.
  • PublishViaQueue allows messages to be published before a connection is established (this would break if NewConnection blocked). This functionality is important to me (I want to be able to call NewConnection and start sending messages leaving everything related to connection management, message queuing etc to the library).
  • In a significant number of deployments the auth will be via TLS client certs, so should a failure there also abort NewConnection? (current design is that NewConnection should only return an error if there is a fatal issue with the configuration which means there is no chance a connection will ever be established; e.g. no servers specified).

Hopefully this provides an explanation of my reasoning for the design decisions; however I should also note that part of the reason I created autopaho was to meet my use-case (lots or remote units with dodgy connections and a requirement that no data is lost); so the design will be influenced by that (and also by the V3 client, paho async c client, paho python etc). Autopaho will not be suitable for all use-cases (I've tried to keep it reasonably flexible but there are limits to that).

If this is an issue that multiple users are encountering then we should definitely improve the documentation (this needs work anyway!).

@XANi
Copy link
Contributor Author

XANi commented Nov 17, 2023

I think there are 2 cases to consider there:

  1. Auth failed immediately on first connection
  2. Auth failed sometime during application working

From my experience the case 2 is what you described, it might be just temporary server misconfiguration but case 1. happens almost exclusively during testing&setup phase and is user error OR it is something that should be communicated to user rather than just retried in the background. So it should be pretty easy for user to configure.

I think it might be sensible to give ability to easily split off "network errors" (mostly network timeouts), from "authentication errors" (we got connected to server but it returned error).

One way would be via documented error types (currently it's pretty random, bad auth will return *autopaho.ConnackError, wrong host will return *fmt.wrapError); other way would be splitting off hook (there is already OnClientError and OnConnectError so OnServerError would be natural), but that's a bit more work for setup. Whether TLS errors should be considered connection or server errors is open question but I'd veer on side on it being ServerError.

So ability to do either

mqttCfg.OnServerError = func(err error) {
        if err.ErrorCode == paho.ReasonUnauthorized || e.ErrorCode == paho.ReasonBadMethod {
            log.Error("wrong username or password")
            authFail++
        } else {
            log.Errorf("Server refused connection: %s",err)
        }
        if authFail  > 100 { botherUser() }
}

or

mqttCfg.OnError = func(err error) {
    switch e := err.(type) {
        case *autopaho.ConnectionError: log.Warnf("connection error: %s",e)
        case *autopaho.ServerError: 
            if e.ErrorCode == paho.ReasonUnauthorized || e.ErrorCode == paho.ReasonBadMethod{
                log.Error("wrong username or password")
                authFail++
            } else {
                log.Errorf("Server refused connection: %s",err)
            }
            if authFail  > 100{ botherUser() }
        default: log.Warnf("error[%T]: %s",e,e)
}

I do think second one is a bit cleaner as there is only one hook for errors and "simple" app might just log it, while more complex needs can be handled via types

@MattBrittan
Copy link
Contributor

case 1. happens almost exclusively during testing&setup phase and is user error OR it is something that should be communicated to user rather than just retried in the background

This was really the intention with OnError; at a minimum the errors should be logged so they can be dealt with, but the library should not dictate how this logging occurs (because that's application specific). The challenge here is having a solution that is workable for most users. The issue with making decisions about errors in the library is that the impact of errors is situational (e.g. a DNS lookup failure could be because the wrong URL has been specified, or because a remote solar powered device has just reached sufficient charge to startup up and the cell connection is not yet up).

I do think second one is a bit cleaner

I would vote for this option. As you say, in many cases the error callback will just be used to log errors so having a single callback simplifies users code (and with error types users can differentiate the errors should they need to).

currently it's pretty random, bad auth will return *autopaho.ConnackError, wrong host will return *fmt.wrapError

This kind of inconsistency tends to creep into open source software; originally everything was just a wrapped error; then a user submitted a PR to add ConnackError. I'd be happy to see a PR cleaning this up (but would ask that Unwrap() error be implemented so that code currently relying on errors.Is continues to run).

@hspaay
Copy link

hspaay commented Nov 18, 2023

Thank you for elaborating @MattBrittan and @XANi. The distinction between an initial connection and successive connection problems is a good one.
If the intent is that OnError provides the hook to deal with an unauthorized case, then it should provide the proper information to do so. @XANi is bang on with his example.
Now I understand the issue better I'll see if XANI's approach can solve it, assuming there is sufficient information in the error.

@MattBrittan
Copy link
Contributor

assuming there is sufficient information in the error.

As per this PR you can use a type assertion to check if the error is a *ConnackError and, if it is, access the ReasonCode/test.

XANi added a commit to XANi/paho.golang that referenced this issue Nov 19, 2023
As per discussion in eclipse-paho#192 I've added URL as a parameter, and also an example on how to use it to extract password from URL
XANi added a commit to XANi/paho.golang that referenced this issue Nov 19, 2023
As per discussion in eclipse-paho#192 I've added URL as a parameter, and also an example on how to use it to extract password from URL
XANi added a commit to XANi/paho.golang that referenced this issue Nov 19, 2023
As per discussion in eclipse-paho#192 I've added URL as a parameter, and also an example on how to use it to extract password from URL
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants