Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MDEV-31585] Stop trusting or relying on client identifying information sent prior to the TLS handshake #2684

Open
wants to merge 2 commits into
base: 10.4
Choose a base branch
from

Conversation

dlenski
Copy link
Contributor

@dlenski dlenski commented Jul 3, 2023

  • The Jira issue number for this PR is: MDEV-31585

Description

The server has heretofore improperly mishandled—and TRUSTED—information sent
in the plaintext login request packet sent prior to the TLS handshake.

As a result of this, the client is forced to send excessive and
exploitable identifying information in the pre-TLS-handshake plaintext login
packet. That client-side vulnerability is CONC-654.

This modifies the server to stop relying on any of the information in the
pre-TLS-handshake plaintext login packet EXCEPT for the single bit that
tells it that a TLS handshake will follow. It furthermore adds a
capability bit to the server greeting packet, which informs the client that
it is safe to send a bare-bones dummy packet containing ONLY the instruction
that a TLS handshake will follow:

/* This capability is set if:
 *
 * - The CLIENT knows how to send a truncated 2-byte SSLRequest
 *   packet, containing no information other than the CLIENT_SSL flag
 *   which is necessary to trigger the TLS handshake, and to send its
 *   complete capability flags and other identifying information after
 *   the TLS handshake.
 * - The SERVER knows how to receive this truncated 2-byte SSLRequest
 *   packet, and to receive the client's complete capability bits
 *   after the TLS handshake.
 *
 */
#define CLIENT_CAN_SSL_V2    (1ULL << 37)

Because the client cannot safely send the SSL_V2 SSLRequest packet unless
the server has advertised support for it in its (plaintext) Server Greeting
packet, an active MITM could strip the CLIENT_CAN_SSL_V2 bit from that
Server Greeting packet. This downgrade attack will force the client to
continue exhibiting the CONC-654 vulnerability. The server is also modified
to detect this case and abort the connection; this won't fix the one-time
client information leakage of the CONC-654 vulnerability, but it is intended
to discourage the MITM attack by making it highly visible.

How can this PR be tested?

  • TODO: automated MTR tests to verify backwards compatibility with older clients
  • TODO: automated tests to verify active MITM detection
  • Build MariaDB Connector/C based on [CONC-654] Stop leaking client identifying information to the server before the TLS handshake mariadb-corporation/mariadb-connector-c#227 (the corresponding client-side fix)
    1. Observe from Wireshark/tcpdump that the pre-TLS-handshake login request packet now contains nothing except for the CLIENT_SSL bit, but the connection continues to work correctly.
      image
    2. Observe that the updated server continues to allow connections from older clients, without the CONC-654 fix, that do send excessive information in the pre-TLS-handshake packet.
    3. Observe that the updated client continues to connect correctly to older servers, without the MDEV-31585 fix, that require it to send excessive information in the pre-TLS-handhsake packet.

Basing the PR against the correct MariaDB version

  • This is a new feature and the PR is based against the latest MariaDB development branch.

  • This is a bug fix and the PR is based against the earliest maintained branch in which the bug can be reproduced.

    This is a bugfix and it is based on 10.2 10.4, but in fact the bug goes back much further in MariaDB's history; as far as I can tell, there has never been a version of MariaDB that supports TLS and does not have this bug.

PR quality check

  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

@CLAassistant
Copy link

CLAassistant commented Jul 3, 2023

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Comment on lines -13754 to -13269
if (pkt_len < 32)
return packet_error;
client_capabilities|= ((ulong) uint2korr(net->read_pos+2)) << 16;
if (!(client_capabilities & CLIENT_MYSQL))
{
// it is client with mariadb extensions
ulonglong ext_client_capabilities=
(((ulonglong)uint4korr(net->read_pos + 28)) << 32);
client_capabilities|= ext_client_capabilities;
}
}

/* Disable those bits which are not supported by the client. */
compile_time_assert(sizeof(thd->client_capabilities) >= 8);
thd->client_capabilities&= client_capabilities;
Copy link
Contributor Author

@dlenski dlenski Jul 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff makes this look more complicated than it really is.

All that is really changing is that these few lines move from before the TLS handshake to after the TLS handshake.

These line tells the server to read capability bits sent by the client and to trust them: the server alters its view of how the connection should work (thd->client_capabilities&= ) based on them.

@dlenski dlenski force-pushed the MDEV-31585 branch 2 times, most recently from 7550af8 to 84aa88a Compare July 3, 2023 18:55
@dlenski dlenski changed the base branch from 11.2 to 10.2 July 3, 2023 18:55
@dlenski
Copy link
Contributor Author

dlenski commented Jul 3, 2023

I've rebased this on top of 10.2, since it's a bugfix. This required an additional cherry-pick commit, e54c24c, to add some support code (vio_type_name and safe_vio_type_name functions) which had never been added to 10.2.

Merging this correctly into all versions ranging from 10.2-11.2 is non-trivial.

The version of this branch targeted at 11.2 is available at: https://github.com/dlenski/mariadb-server/tree/dlenski-github/MDEV-31585_targeting_11.2

@vuvova
Copy link
Member

vuvova commented Jul 4, 2023

Looks rather risky to me. The server was ignoring capability flags from the after-ssl client packet for >20 years. It is totally possible that some third-party clients don't even send legitimate flags in that packet.

The part with MARIADB_CLIENT_CAN_SEND_DUMMY_HANDSHAKE_PACKET is safe, of course, I'll comment more in mariadb-corporation/mariadb-connector-c#227

@dlenski
Copy link
Contributor Author

dlenski commented Jul 4, 2023

@vuvova wrote:

Looks rather risky to me. The server was ignoring capability flags from the after-ssl client packet for >20 years. It is totally possible that some third-party clients don't even send legitimate flags in that packet.

Indeed, I would consider that highly probable…

It's essentially a consequence of Hyrum's Law: "with enough users, all the observable behaviors of a system are relied on by at least one user".

And indeed, it is an observable behavior of MariaDB servers that they do not pay attention to to the client capability flags sent after TLS handshake. That is precisely what this PR is addressing.

But so what 🤷‍♂️? To the extent that the client/server protocol is documented, the protocol documentation clearly states that the post-handshake packet SHOULD INCLUDE the capability flags, so any client that doesn't do this is not conforming to the documented protocol.

Unless you think that the server must continue indefinitely to support buggy and insecure behaviors by default in the name of backwards compatibility, with an unknown universe of client implementations? That basically means these vulnerabilities will never get fixed.

A reasonable approach, followed by much other software, would be to aggressively fix the TLS vulnerabilities in both the client and server in a backwards-incompatible way… but to allow the client and/or server to "opt in" to the old/buggy/insecure behavior (--backwards-compatible-insecure-ssl) in order to continue using it while the incompatibilities are worked out.

@dlenski
Copy link
Contributor Author

dlenski commented Jul 4, 2023

A reasonable approach, followed by much other software, would be to aggressively fix the TLS vulnerabilities in both the client and server in a backwards-incompatible way… but to allow the client and/or server to "opt in" to the old/buggy/insecure behavior (--backwards-compatible-insecure-ssl) in order to continue using it while the incompatibilities are worked out.

I've already implemented this.

See:

  • dlenski/mariadb-connector-c@force_client_to_opt_in_to_old_insecure_TLS:
    • Built on top of this PR
    • Adds the backwards-compatible-insecure-ssl option to Connector/C
    • Client will not downgrade itself to CONC-648 or CONC-654 vulnerabilities unless specified
  • dlenski@allow_client_to_opt_in_to_old_insecure_TLS
    • Adds the --backwards-compatible-insecure-ssl option to the mariadb client.
    • Trying to connect to a server without the MDEV-31585 fix will fail:
      $ client/mariadb --host 127.0.0.1  --port=3306 --ssl
      ERROR 2026 (HY000): Client requires secure TLS/SSL, but server only supports old, buggy, insecure TLS/SSL. Specify backwards-compatible-insecure-ssl to override.
    • Adding --backwards-compatible-insecure-ssl allows it to succeed, with prominent warnings:
      $ client/mariadb --host 127.0.0.1 --port=3306 --ssl
      WARNING: Server only supports old, buggy, insecure TLS/SSL, and client has
        opted to use this (backwards-compatible-insecure-ssl set)
      ...
    • The same option will allow the client to continue accepting pre-(TLS handshake) error packets:
      $ client/mariadb --host 127.0.0.1 --port=3306 --ssl
      WARNING: Received error packet sent by server before completion of TLS
         handshake. This is insecure, but we allow it because client is configured
         with backwards-compatible-insecure-ssl.
      ...
      

@vuvova
Copy link
Member

vuvova commented Jul 6, 2023

@vuvova wrote:

Looks rather risky to me. The server was ignoring capability flags from the after-ssl client packet for >20 years. It is totally possible that some third-party clients don't even send legitimate flags in that packet.

Indeed, I would consider that highly probable…
Unless you think that the server must continue indefinitely to support buggy and insecure behaviors by default in the name of backwards compatibility, with an unknown universe of client implementations?

That's pretty much what I'd suggest. But don't forget, after your mariadb-corporation/mariadb-connector-c#227 this new mode will be enabled by default, so only old third-party clients will still have the issue and it'll be their vulnerability, not MariaDB's — we'll update all our connectors by the next release.

@dlenski
Copy link
Contributor Author

dlenski commented Jul 7, 2023

@vuvova, I've added an additional commit 056d083 which will make the server:

  1. Detect the case where an active MITM is downgrading the client by stripping the CLIENT_CAN_SEND_DUMMY_HANDSHAKE_PACKET bit from our server greeting packet.
  2. Abort the connection if this happens.

This won't prevent an active MITM attacker from stripping the bit to gather information on the client, but it will cause a connection MITM'ed in this way to promptly fail, which should discourage attackers from using it.

server/sql/sql_acl.cc

Lines 12793 to 12811 in 056d083

if (pre_tls_client_packet_wants_ssl
&& post_tls_client_packet_indicates_dummy_support
&& !pre_tls_client_packet_is_dummy)
{
/* 1. We told the client in our server greeting that we support the pre-TLS client packet containing only the TLS/SSL flag.
* [Our server greeting packet is sent in the clear; no guarantee that it is delivered unmodified to the client.]
* 2. The client told us in its pre-TLS packet that it wants to use SSL.
* 3. The client told us in its post-TLS packet that it knows how to send the pre-TLS client packet containing only the TLS/SSL flag.
* [We received this information via TLS; it is authentic.]
* 4. Nevertheless, the client DID NOT SEND a pre-TLS client packet containing only the TLS/SSL flag.
*
* The only way this can happen is if the client is being downgraded by an active MITM attacker which
* disables the CLIENT_CAN_SEND_DUMMY_HANDSHAKE_PACKET bit in our server greeting packet.
*/
sql_print_warning("Aborting connection %lld because it is being actively MITM'ed to downgrade TLS security (attacker "
"is stripping the CLIENT_CAN_SEND_DUMMY_HANDSHAKE_PACKET bit from our server capabilities)",
thd->thread_id);
DBUG_RETURN(packet_error);
}

@markus456
Copy link
Contributor

Could you document any changes to the MariaDB protocol on the MariaDB KB when the pull request gets accepted? This makes it easier to refer to the protocol without having to dig through the server source code.

@dlenski
Copy link
Contributor Author

dlenski commented Jul 11, 2023

@markus456 wrote:

Could you document any changes to the MariaDB protocol on the MariaDB KB when the pull request gets accepted?

Yes, I'm definitely planning on it.

There's a structural problem with the MariaDB KB, though, which is that it's just a big unversioned wiki. There's no consistent way to indicate substantial differences between individual versions. Compare this with, say, the MySQL docs which at least have separate 5.7 and 8.0 docs:
image

@LinuxJedi
Copy link
Contributor

The CI won't run on this because it is against 10.2, which is way past EOL. If you make this against 10.4, it should run.

This additional logging is intended to demonstrate exactly when/where/how
the switch from a plaintext TCP/IP socket to a TLS/SSL-wrapped socket
occurs.

The short summary is that the TLS-ification of the connection is completely
entangled with the authentication process:

- The mechanism for ensuring confidentiality and authenticity of the
  client/server communications at the TRANSPORT-layer is TLS (which
  literally stands for Transport Layer Security)
- The APPLICATION-layer authentication mechanisms involve traffic exchanged
  above the transport layer, and support multiple plugins, such as the
  default "native" authentication plugin using usernames and passwords.
- The transport-layer security mechanism is logically distinct from the
  application-layer authentication mechanism, but in the MariaDB server
  codebase these are thoroughly entangled and interdependent.

This is a network-mislayering design problem with significant consequences
for code complexity and flexibility.  It leads to several potential and
actual security vulnerabilities whereby information is improperly
transmitted and accepted in plaintext prior to the TLS handshake.

All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
@dlenski dlenski changed the base branch from 10.2 to 10.4 September 14, 2023 23:57
@dlenski
Copy link
Contributor Author

dlenski commented Sep 15, 2023

@LinuxJedi wrote:

The CI won't run on this because it is against 10.2, which is way past EOL. If you make this against 10.4, it should run.

Done, rebased on 10.4.

Are you actually planning to merge this and mariadb-corporation/mariadb-connector-c#227? The Connector/C PR has received absolutely no feedback from its maintainer.

@LinuxJedi
Copy link
Contributor

@LinuxJedi wrote:

The CI won't run on this because it is against 10.2, which is way past EOL. If you make this against 10.4, it should run.

Done, rebased on 10.4.

Are you actually planning to merge this and mariadb-corporation/mariadb-connector-c#227? The Connector/C PR has received absolutely no feedback from its maintainer.

This one I would like to see included in some form, it won't be my final decision to make. But I don't see the intention being difficult to argue. That being said, the submodule update is breaking tests and I suspect this would either need to go or be done differently.

The Connector/C one I have zero influence over. This is a decision for @9E0R9 to make. I agree that "something" needs to be done. I have not looked at the current state of the protocol on the MySQL side to see whether this is the correct way of resolving the problem. The spare bit appears to be contentious.

In an ideal world that would be a separate port that does a TLS handshake first, but I can not see a way of doing that in a MySQL compatible way. At which point it might as well be part of a new protocol implementation.

@dlenski
Copy link
Contributor Author

dlenski commented Sep 15, 2023

@LinuxJedi wrote:

the submodule update is breaking tests and I suspect this would either need to go or be done differently.

What does "be done differently" mean?

There is absolutely nothing that I as an internal contributor can do to remedy this situation.

  • The PR correctly identifies the commit (mariadb-corporation/mariadb-connector-c@23cbf7c) to which the libmariadb submodule needs to be updated in order for the PR to function as advertised.
  • Nevertheless, the submodule update seems to breaks Buildbot's brain. It doesn't seem to know how to check out and build correctly using the submodule.
  • If you or another of the upstream devs can check this out and build it locally, you'll see that it works as advertised in the “How can this PR be tested?” section of the description.

The Connector/C one I have zero influence over. This is a decision for @9E0R9 to make. I agree that "something" needs to be done. I have not looked at the current state of the protocol on the MySQL side to see whether this is the correct way of resolving the problem.

The substantial flaws in the protocol appears to have been ignored for many years, which is why we're in this situation. The server and client implementation are tightly entangled, in terms of how they implement the {MySQL,MariaDB}-over-TLS protocol.

How can either side possibly make a decision about how to proceed him if the developers aren't actively communicating around such issues?

The spare bit appears to be contentious.

I don't know what this means.

Are you referring to the discussion (starting with mariadb-corporation/mariadb-connector-c#227 (comment)) about exactly where we should place the "SSL_V2" capability bit?

Or something else?

In an ideal world that would be a separate port that does a TLS handshake first, but I can not see a way of doing that in a MySQL compatible way. At which point it might as well be part of a new protocol implementation.

Indeed, that might be ideal, but it is also not necessary.

There is a ton of low-hanging fruit in terms of poor design of the {MySQL,MariaDB}-over-TLS protocol, and poor implementation.

All of this can be simplified, improved, and made more secure without switching to an entirely separate TCP port.

@LinuxJedi
Copy link
Contributor

@LinuxJedi wrote:

the submodule update is breaking tests and I suspect this would either need to go or be done differently.

What does "be done differently" mean?

There is absolutely nothing that I as an internal contributor can do to remedy this situation.

Sorry, I should have expanded on that. I meant that it appears the submodule is pushing Connector/C quite far ahead of its current position, to a different major version. Which is not something we normally do in an older release. Usually, such changes would be in the major version of Connector/C paired with this MariaDB version.

* The PR _correctly identifies_ the commit ([mariadb-corporation/mariadb-connector-c@23cbf7c](https://github.com/mariadb-corporation/mariadb-connector-c/commit/23cbf7cc72ba56c93a99b6462a6774b939f8918f)) to which the `libmariadb` submodule needs to be updated in order for the PR to function as advertised.

Is it not possible to make the server rely on post-TLS handshake without a Connector/C change?

* Nevertheless, the submodule update seems to breaks Buildbot's brain. It doesn't seem to know how to check out and build correctly using the submodule.

It breaks in buildbot because it is switching major versions which have different error messages.

* If you or another of the upstream devs can check this out and build it locally, you'll see that it works as advertised in the “How can this PR be tested?” section of the description.

I suspect it would still break the regression suite.

The Connector/C one I have zero influence over. This is a decision for @9E0R9 to make. I agree that "something" needs to be done. I have not looked at the current state of the protocol on the MySQL side to see whether this is the correct way of resolving the problem.

The substantial flaws in the protocol appears to have been ignored for many years, which is why we're in this situation. The server and client implementation are tightly entangled, in terms of how they implement the {MySQL,MariaDB}-over-TLS protocol.

How can either side possibly make a decision about how to proceed him if the developers aren't actively communicating around such issues?

I know this situation is frustrating, and things in the background are actively changing to improve this situation. The Unconference in particular will help with trying to put better processes in place.

The spare bit appears to be contentious.

I don't know what this means.

Are you referring to the discussion (starting with mariadb-corporation/mariadb-connector-c#227 (comment)) about exactly where we should place the "SSL_V2" capability bit?

Yes, this.

In an ideal world that would be a separate port that does a TLS handshake first, but I can not see a way of doing that in a MySQL compatible way. At which point it might as well be part of a new protocol implementation.

Indeed, that might be ideal, but it is also not necessary.

There is a ton of low-hanging fruit in terms of poor design of the {MySQL,MariaDB}-over-TLS protocol, and poor implementation.

All of this can be simplified, improved, and made more secure without switching to an entirely separate TCP port.

I agree, but we also have to be careful not to break compatibility with MySQL applications and connectors. This is why we are particularly cautious with protocol changes.

…on sent prior to the TLS handshake

The server has heretofore improperly mishandled—and TRUSTED—information sent
in the plaintext login request packet sent prior to the TLS handshake.

As a result of this, the client is *forced* to send excessive and
exploitable identifying information in the pre-TLS-handshake plaintext login
packet. That client-side vulnerability is CONC-654.

This modifies the server to stop relying on any of the information in the
pre-TLS-handshake plaintext login packet EXCEPT for the single bit that
tells it that a TLS handshake will follow.  It furthermore adds a
capability bit to the server greeting packet, which informs the client that
it is safe to send a bare-bones dummy packet containing ONLY the instruction
that a TLS handshake will follow:

    /* This capability is set if:
     *
     * - The CLIENT knows how to send a truncated 2-byte SSLRequest
     *   packet, containing no information other than the CLIENT_SSL flag
     *   which is necessary to trigger the TLS handshake, and to send its
     *   complete capability flags and other identifying information after
     *   the TLS handshake.
     * - The SERVER knows how to receive this truncated 2-byte SSLRequest
     *   packet, and to receive the client's complete capability bits
     *   after the TLS handshake.
     *
     */
    #define CLIENT_CAN_SSL_V2    (1ULL << 37)

Because the client cannot safely send the SSL_V2 SSLRequest packet unless
the server has advertised support for it in its (plaintext) Server Greeting
packet, an active MITM could strip the CLIENT_CAN_SSL_V2 bit from that
Server Greeting packet.  This downgrade attack will force the client to
continue exhibiting the CONC-654 vulnerability.  The server is also modified
to detect this case and abort the connection; this won't fix the one-time
client information leakage of the CONC-654 vulnerability, but it is intended
to discourage the MITM attack by making it highly visible.

All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
dlenski referenced this pull request Sep 18, 2023
…iners

Often there are small "nitpicky" changes that need to be done to a PR in order to
get it merged. To speed up the merge process, we will ask contributors
if they are ok with the reviewer making those changes.

Other fixups:
Improve PR template rendering in browser's textarea field. Line wrapping at 80
characters provides a bad user experience within the browser, which
handles word wrapping separately. Thus, prefer long lines in this
markdown file.

Remove the "backwards compatibility section". While the contributor
should ideally care about the impact of their patch on the server's
backwards compatibility, this is an advanced topic that is better
presented in a separate document.
@dlenski
Copy link
Contributor Author

dlenski commented Oct 3, 2023

Is it not possible to make the server rely on post-TLS handshake without a Connector/C change?

Indeed, it is not possible to do so in a way that would be guaranteed to be backwards-compatible with existing clients.

Personally, I think the value of such bug-for-bug backwards-compatibility is very low, but if the project insists on maintaining it, then there's very little wiggle room here. 🤷‍♂️

  • If you or another of the upstream devs can check this out and build it locally, you'll see that it works as advertised in the “How can this PR be tested?” section of the description.

I suspect it would still break the regression suite.

What is the regression suite? How could I run or check this?

@markus456
Copy link
Contributor

I'll also comment the same thing here that I commented on the MariaDB Zulip. Instead of reserving a bit in the MariaDB extended capabilities, you could reserve the first byte of the filler space for this capability. This way the patch could be applied to all MariaDB and MySQL compatible databases without forcing them to adapt the MariaDB extended capability bits. This would allow you to offer the patch to all implementations which would make it easier to adopt into use.

@dlenski
Copy link
Contributor Author

dlenski commented Oct 6, 2023

So far I have proposed…

  • Using one of the extended MariaDB capability bits (#define CLIENT_CAN_SSL_V2 (1ULL << 37))
    • Criticized because MySQL doesn't use these capability bits (although there is no reason they couldn't start, as @bgrainger from MySQL.NET pointed out).
  • Using (#define CLIENT_CAN_SSL_V2 (1ULL << 28))

@markus456 wrote:

Instead of reserving a bit in the MariaDB extended capabilities, you could reserve the first byte of the filler space for this capability.

That would be fine with me but… do you think that Oracle/MySQL is more likely to adopt a protocol change that uses a bit in the "unused" area than they are to agree with using a bit in the area that MariaDB currently uses for its extended capabilities?

From the perspective of Oracle/MySQL servers and clients, the "unused" bits and the "MariaDB extended capabilities" are all unused bits 🥲

image

@dlenski
Copy link
Contributor Author

dlenski commented Oct 6, 2023

Based on the discussion on Zulip, @markus456 thinks this will have a better chance of adoption by MySQL if we pick a new part of the unused bits and declare that to be something like the "cross-implementation extended capabilities".

I'm all for it. 👍

I assume it makes sense to start using the bits at the beginning, rather than the end, of the space that's currently unused by any implementation.

@LinuxJedi @vuvova, are you on board with proceeding in this direction?

@LinuxJedi
Copy link
Contributor

Based on the discussion on Zulip, @markus456 thinks this will have a better chance of adoption by MySQL if we pick a new part of the unused bits and declare that to be something like the "cross-implementation extended capabilities".

I'm all for it. 👍

I assume it makes sense to start using the bits at the beginning, rather than the end, of the space that's currently unused by any implementation.

@LinuxJedi @vuvova, are you on board with proceeding in this direction?

I am a little cautious that it is possible for MySQL to use those bits for anything in the future and could in-theory break other connectors from talking to us. I'm wondering if this is something we could somehow work together with the MySQL community on, to ensure that whatever change is made will be supported both sides in the future.

My personal ideal solution (which @markus456 mentioned) is to have a MARIADBS port that does an immediate initial TLS handshake. Then a vast majority of problems immediately go away. This will require adoption over time by other connectors. But with some upstream community work for things such as PHP, as long as the rest of the protocol doesn't deviate too much, I think this is doable.

@LinuxJedi
Copy link
Contributor

Alternatively, if enough of the community group together, we can build a protocol definition working group (suggested by @grooverdan)

@markus456
Copy link
Contributor

markus456 commented Oct 7, 2023

From what I've seen, one of the major arguments against the changes in pull request has been that it doesn't improve the security enough to be worth the effort of getting this change accepted in all connectors and server implementations. As for whether the changes are "big enough" is not something that can be logically proven and the discussion around it is, in my opinion, going in circles.

I think that nobody can deny the fact that a separate TLS port is more secure and would immediately improve the security situation. It also improves the performance of the connection creation by removing the redundant SSLRequest packet. Performance is always something that people care about even if security is ranked above it. In the TLS port case, you get the best of both worlds: better security with less data sent over the wire.

Something which has not been discussed is the need for documenting how the protocol is used and what sorts of combinations of packets might be sent. This pull request introduces a new packet type that requires its own documentation and it changes the "state machine" required to implement the MariaDB/MySQL protocol. As someone who works on a project that will eventually have to implement both the client and the server side implementations of this, simplicity of the protocol extensions is a factor that affects us the most. The more complex a packet is to parse, the more open it is for errors and misinterpretations of the protocol. The TLS-first approach, on the other hand, simplifies the protocol by eventually removing the need for the SSLRequest packet. The TLS-first approach also simplifies the server-side implementation compared to this pull request by removing the need to check if the capabilities have been downgraded.

The big problem with the TLS port is the need to add a new option flag for all client programs and any connector that implements it. The practical effort of having to then configure two ports for nearly all installations is something that cannot be ignored. This is a heroic amount of effort from the community but it allows all MariaDB/MySQL compatible client and server programs to implement it rather trivially.

The only other approach is a new network protocol but I believe that this is an idea that's not going to be picked up simply because it requires several competing server implementations to implement a new protocol in unison and a large change on the client side for all connectors. The current protocol also offers no backwards compatible way of downgrading to an earlier protocol which requires that the client side sends some form of a "I want the new protocol" packet to the server's handshake response that further slows down and complicates the implementation. This is kind of what this pull request implements by adding a new packet to the network protocol.

@dlenski also pointed out after I asked on Zulip about the importance of the server capabilities and the version string being "leaked" in the handshake packet: neither the current suggested protocol change nor the TLS port approach solve this problem as the server still sends the exact version in the first packet. The TLS port would make it encrypted so it's not broadcast over the network but it would still be trivial to find out if one is actively scanning for ports. I think that @dlenski 's suggestion to "freeze" the version to something like MariaDB 11.0 so that the exact version is only visible post-authentication would be a good thing to do. However, there already exists a way to set the version string to any arbitrary value so this isn't really a practical problem.

Like @LinuxJedi mentioned (and @grooverdan suggested), perhaps it would be smart to first reach out to the other involved parties (Oracle, community connectors, hosting providers etc.) and see if a modified version of this patch (the "shared capability bits" thing) is acceptable for them. In addition, asking how willing they'd be to adapt to a separate TLS port would be a good idea.

@vuvova
Copy link
Member

vuvova commented Oct 8, 2023

To clarify the comment

The big problem with the TLS port is the need to add a new option flag for all client programs and any connector that implements it.

A separate port, while, indeed, "nobody can deny the fact that a separate TLS port is more secure", would mean

  • new server option --tls-port (for example)
  • new server opton --extra-tls-port, because the server can listen on a second port, --extra-port
  • new server option to disable non-tls port, --port=0 won't do, it's already taken, so --disable-non-tls-port, for example (may be simply --disable-port ? it'll mean we'll also have --disable-tls-port and when both are disabled it'll mean --skip-networking)
  • let's say we don't want a different bind-address on tls port, so this won't need a new option like --tls-bind-address
  • new client option --tls-port. For all connectors and clients.
  • client likely won't need more options, --ssl --ssl-verify-server-cert would mean to connect to the tls port. --ssl --skip-ssl-verify-server-cert would mean connect to tls port and, if failed, connect to the non-tls port. Could be slow (so, discouraging the old fallback behavior, which is a nice side effect)
  • specifying a port on the command line would be less convenient, one cannot use mariadb --port 1234, one would need to know whether to specify --tls-port or --port. Which means knowing whether ssl is enabled in the config files. (why the user should need to know that?)

So, I suspect that this approach, while, indeed, undeniably more secure, has a rather slim chance of being adopted by other connectors and clients. Not unless it'll provide distinct benefits clearly outweighing the efforts of implementing (and explaining/documenting) it. Reaching out first, as @markus456 suggested, could be a sensible way forward.

@svaroqui
Copy link

svaroqui commented Oct 9, 2023

To give an humble opinion on this an extra port that only response to TLS from the first handshake looks better for any layer 4 proxies like nginx and proxysql that can route TCP traffic base on the Certificate SNI extension and this could really help adoption in the cloud and with security pentests

@vuvova vuvova self-assigned this Oct 23, 2023
@vaintroub
Copy link
Member

To clarify the comment

The big problem with the TLS port is the need to add a new option flag for all client programs and any connector that implements it.

A separate port, while, indeed, "nobody can deny the fact that a separate TLS port is more secure", would mean

  • new server option --tls-port (for example)
  • new server opton --extra-tls-port, because the server can listen on a second port, --extra-port
  • new server option to disable non-tls port, --port=0 won't do, it's already taken, so --disable-non-tls-port, for example (may be simply --disable-port ? it'll mean we'll also have --disable-tls-port and when both are disabled it'll mean --skip-networking)
  • let's say we don't want a different bind-address on tls port, so this won't need a new option like --tls-bind-address
  • new client option --tls-port. For all connectors and clients.
  • client likely won't need more options, --ssl --ssl-verify-server-cert would mean to connect to the tls port. --ssl --skip-ssl-verify-server-cert would mean connect to tls port and, if failed, connect to the non-tls port. Could be slow (so, discouraging the old fallback behavior, which is a nice side effect)
  • specifying a port on the command line would be less convenient, one cannot use mariadb --port 1234, one would need to know whether to specify --tls-port or --port. Which means knowing whether ssl is enabled in the config files. (why the user should need to know that?)

So, I suspect that this approach, while, indeed, undeniably more secure, has a rather slim chance of being adopted by other connectors and clients. Not unless it'll provide distinct benefits clearly outweighing the efforts of implementing (and explaining/documenting) it. Reaching out first, as @markus456 suggested, could be a sensible way forward.

While separate SSL port would require a couple of server options, there is no substantial effort for the client. Client only has to specify if he connects to SSL port, that's one bit of information. Many popular connectors have an idea of "connection string,URL, or DNS. and adding useSSLPort=true to that is not a huge deal. Doing SSL handshake before first packet, based on that bit, is not that huge deal either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

7 participants