Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements hints for Postgresql #530

Merged
merged 2 commits into from
Jun 23, 2023
Merged

Conversation

hissssst
Copy link
Contributor

@hissssst hissssst commented Jun 22, 2023

Overview

Implements FROM statement hints for postgresql

Use-case

This is useful for TABLESAMPLE hint in postgresql
https://www.postgresql.org/docs/current/sql-select.html#SQL-FROM

For example:

from x in Item, hints: ["TABLESAMPLE system_rows(1)"]

Fixes well-known problem with getting random rows from the table using Ecto
https://elixirforum.com/t/ecto-add-raw-sql-at-end-of-query-or-rather-choose-row-at-random/3386


If this change requires any tests, I am happy to implement them, just show me how.

@josevalim
Copy link
Member

You can write a unit test at test/ecto/adapters/postgres_test.exs that test we generate the proper SQL. :)

@hissssst
Copy link
Contributor Author

@josevalim done

@v0idpwn
Copy link
Member

v0idpwn commented Jun 22, 2023

I like the idea of supporting TABLESAMPLE, but this isn't a hint...

@greg-rychlewski
Copy link
Member

It feels a bit off to me too because Postgres pretty famously doesn't use query hints https://wiki.postgresql.org/wiki/OptimizerHintsDiscussion.

The only alternative I think is to add a new option like :sample. I think any generic option to add text behind FROM would conflict with the hints in the other databases. Or we just accept the minor abuse of the term hint.

@hissssst
Copy link
Contributor Author

@v0idpwn @greg-rychlewski other adapters here implement samples through this hint system. It is even in the documentation

https://hexdocs.pm/ecto/Ecto.Query.html#join/5-hints

@greg-rychlewski
Copy link
Member

Ah ok, then I guess that's that :).

@josevalim josevalim merged commit bd4c332 into elixir-ecto:master Jun 23, 2023
@josevalim
Copy link
Member

💚 💙 💜 💛 ❤️

@bbhoss
Copy link

bbhoss commented Jul 28, 2023

Glad this is finally supported but using the name “hints” because it’s used elsewhere even when it’s not the right term seems like a poor choice. It will likely be hard to find this functionality with documentation, would you ever expect the hints option to be used for tablesample?

@greg-rychlewski
Copy link
Member

@josevalim would you object if I implement this as :sample in ecto instead? I think the point above about discoverability is fair. It could be dead simple just a compile time string so we are not fighting against all the variability across dbs

@bbhoss
Copy link

bbhoss commented Jul 30, 2023

If naming can be improved I think it’s worth trying to facilitate FOR SYSTEM_TIME as well which I think the hints feature makes possible in addition to TABLESAMPLE. SQL Server supports this feature along with the popular MySQL fork mariadb.

@greg-rychlewski
Copy link
Member

@josevalim btw we actually have a bigger issue here. I was looking up how hints are implemented and the tuple allows dynamic hints. Probably the adapter is expected to handle those properly but here it's just being passed through. So we can have SQL injection:

one = ~s(1; drop table "schema")

Schema
 |> from(hints: ["TABLESAMPLE SYSTEM": one])
 |> select([r], r.x)
 |> plan()
 |> all()

# "SELECT s0.\"x\" FROM \"schema\" AS s0 TABLESAMPLE SYSTEM 1; drop table \"schema\""

So we have to at least rollback the tuple version. I think it's worth rolling back the whole thing tbh and thinking how to properly implement static/dynamic samples + possibly the system time thing mentioned above (but i never used it so can't say too much about it atm).

@hissssst
Copy link
Contributor Author

Yeah, but TABLESAMPLE SYSTEM 1 is an invalid expression. SQL escaping can be called on this one to prevent injections. I've just copy-pasted the code from Clickhouse, so I think that new PR is required, not a rollback

@greg-rychlewski
Copy link
Member

greg-rychlewski commented Jul 30, 2023

Let's see what Jose says because I completely agree with the comment from @bbhoss. Also not too sure we want to get into the SQL escaping game if it can be implemented in a better way.

@hissssst
Copy link
Contributor Author

Yeah, I understand that it makes sense, but static string is practically useless in majority of cases, Dynamic string will have the problem of injections too.

The best solution I can see is to implement it with ^ and be able to define hints with Query.API. For example hints: [tablesample(system(^n))] translates to

TABLESAMPLE SYSTEM($1)

@josevalim
Copy link
Member

I don't think calling it :sample will scale. Each database driver has its own names and, if we add one option per database, we will have really large structs.

We could pick another name but it has to be generic and then, would it actually help with discovery? If it won't, isn't it better to stick with "hints", which already works and at least makes sense for some of them?

Regarding the SQL part, I am pretty sure it was intentional, otherwise the uses are very limited. Maybe literal can help but we should make it very clear in the docs that you should not accept parameters as your interpolated hints.

@hissssst
Copy link
Contributor Author

hissssst commented Jul 30, 2023

@josevalim what about safe DSL for hints like one present for select, where, etc. ? It will be scalable and safe for parameters and anything like this

@josevalim
Copy link
Member

It doesn't scale because there is no standardization whatsoever. Then Ecto will have to know about specific behaviour of each adapter out there. That's why we just allow a string to be given. If we are worried about the interpolation, then we can require unsafe(^...) when used in hints. If not unsafe, then we just allow numbers or atoms or something.

@hissssst
Copy link
Contributor Author

Why doesn't it scale? Developers will write their own fragment(...) for each use-case and database they want. I am not an expert in all databases, but I am have a feeling that each one of them is okay with arguments like $1 in their hints.

@josevalim
Copy link
Member

Oh, I understood what you meant by safe DSL now, you mean fragment. That could work, but I don't think databases support $1 in hints.

@josevalim
Copy link
Member

The literal in query has the same issue, so I guess we need to call it unsafe_literal instead. We can then reuse it here.

@greg-rychlewski
Copy link
Member

The reason why I was partial to :sample is because TABLESAMPLE is part of the SQL Standard. So it wouldn't be a kind of weird database specific thing. Some might have slightly different syntax but they would all reference sampling in some way. And also the action being performed is a sample of the table so searching for sample in the docs would probably be one of the first things.

@greg-rychlewski
Copy link
Member

It would be similar to SQL Server using not having limit imo. The name is not exactly the same but it's still clear it's used for the same purpose.

@josevalim
Copy link
Member

I see. I think we could have a proposal for sampling then. But we most likely want to close the gaps introduced by hints and literals as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants