Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for named and explicit indexes #7481

Open
wants to merge 1 commit into
base: charlie/multi-sources
Choose a base branch
from

Conversation

charliermarsh
Copy link
Member

@charliermarsh charliermarsh commented Sep 18, 2024

Summary

This PR adds a first-class API for defining registry indexes, beyond our existing --index-url and --extra-index-url setup.

Specifically, you now define indexes like so in a uv.toml or pyproject.toml file:

[[tool.uv.index]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cu121"

You can also provide indexes via --index and UV_INDEX, and override the default index with --default-index and UV_DEFAULT_INDEX.

Index priority

Indexes are prioritized in the order in which they're defined, such that the first-defined index has highest priority.

Indexes are also inherited from parent configuration (e.g., the user-level uv.toml), but are placed after any indexes in the current project, matching our semantics for other array-based configuration values.

You can mix --index and --default-index with the legacy --index-url and --extra-index-url settings; the latter two are merely treated as unnamed [[tool.uv.index]] entries.

Index pinning

If an index includes a name (which is optional), it can then be referenced via tool.uv.sources:

[[tool.uv.index]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cu121"

[tool.uv.sources]
torch = { index = "pytorch" }

If an index is marked as explicit = true, it can only be used via such references, and will never be searched implicitly:

[[tool.uv.index]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cu121"
explicit = true

[tool.uv.sources]
torch = { index = "pytorch" }

Indexes defined outside of the current project (e.g., in the user-level uv.toml) can not be explicitly selected.

(As of now, we only support using a single index for a given tool.uv.sources definition.)

Default index

By default, we include PyPI as the default index. This remains true even if the user defines a [[tool.uv.index]] -- PyPI is still used as a fallback. You can mark an index as default = true to (1) disable the use of PyPI, and (2) bump it to the bottom of the prioritized list, such that it's used only if a package does not exist on a prior index:

[[tool.uv.index]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cu121"
default = true

Name reuse

If a name is reused, the higher-priority index with that name is used, while the lower-priority indexes are ignored entirely.

For example, given:

[[tool.uv.index]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cu121"

[[tool.uv.index]]
name = "pytorch"
url = "https://test.pypi.org/simple"

The https://test.pypi.org/simple index would be ignored entirely, since it's lower-priority than https://download.pytorch.org/whl/cu121 but shares the same name.

Closes #171.

Future work

  • Users should be able to provide authentication for named indexes via environment variables.
  • uv add should automatically write --index entries to the pyproject.toml file.
  • Users should be able to provide multiple indexes for a given package, stratified by platform:
[tool.uv.sources]
torch = [
  { index = "cpu", markers = "sys_platform == 'darwin'" },
  { index = "gpu", markers = "sys_platform != 'darwin'" },
]
  • Users should be able to specify a proxy URL for a given index, to avoid writing user-specific URLs to a lockfile:
[[tool.uv.index]]
name = "test"
url = "https://private.org/simple"
proxy = "http://<omitted>/pypi/simple"

@charliermarsh charliermarsh added the enhancement New feature or request label Sep 18, 2024
@charliermarsh
Copy link
Member Author

This still needs docs (beyond the inline documentation), but I'd like to align on the semantics described in the PR summary first.

// /// structured as a flat list of distributions (e.g., `--find-links`). In both cases, indexes
// /// can point to either local or remote resources.
// #[serde(default)]
// pub r#type: IndexKind,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this in for now but can remove before merging. Eventually I want this to support --find-links.

@charliermarsh charliermarsh force-pushed the charlie/index-api branch 4 times, most recently from 98c909f to e6f9d9a Compare September 18, 2024 02:14
@zanieb
Copy link
Member

zanieb commented Sep 18, 2024

Any indexes defined via [[tool.uv.index]] take priority over any indexes defined via --index-url or --extra-index-url.

(This is problematic, since as of now, there's no way to provide a [[tool.uv.index]] on the command line, so you can never override in-file configuration with command-line configuration.)

Yeah this seems wrong. Why is it this way?

Do we need an explicit explicit tag or can we just use the presence of a name? I guess the name is useful for providing credentials externally so... I guess the explicit tag is important.

Do we set the explicit tag during a uv add --index-url <url> <pkg> operation?

How does index pinning work for transitive dependencies?

How can we teach the PyPI fallback behavior? Are there any bad user experiences where we could suggest adding default = true to their index definition?

@charliermarsh
Copy link
Member Author

charliermarsh commented Sep 18, 2024

Yeah this seems wrong. Why is it this way?

Well... we could just invert the priority, if that's what you mean (such that the new index stuff comes last, and the legacy arguments come first). Then the CLI arguments would work as expected. As-is, though, we don't have a way to differentiate between indexes passed on the command line (with --index-url or --extra-index-url) and indexes defined in a file via index-url and extra-index-url... So, like, we can't have index-url from the uv.toml be lower priority than tool.uv.index, but --index-url be higher priority. Alternatively, we could add a new command-line argument (--index?) for these tool.uv.index sources, which would also solve the problem?

Do we need an explicit explicit tag or can we just use the presence of a name? I guess the name is useful for providing credentials externally so... I guess the explicit tag is important.

Yeah roughly this.

Do we set the explicit tag during a uv add --index-url operation?

We don't as of this PR... We can though. I think we probably should? I guess by default we add the index, and make it explicit for that package? The unfortunate thing is that we need a name for the index in that case (as we discussed on Discord). Alternatively, we could just add a tool.uv.index, and not make it explicit.

How does index pinning work for transitive dependencies?

It doesn't have any effect on transitive dependencies. I don't know that it can or should, honestly, because transitive dependencies can be required from multiple different first-party dependencies that could come from different indexes. I think this would be extremely hard to implement correctly and could lead to confusing behavior.

@inflation
Copy link

I prefer to make explicit=true the default behavior, since the first foot gun in the current situation after adding extra-index-url, are packages in the pytorch channel taking precedence over pypi.

@smphhh
Copy link

smphhh commented Sep 18, 2024

How does index pinning work for transitive dependencies?

It doesn't have any effect on transitive dependencies. I don't know that it can or should, honestly, because transitive dependencies can be required from multiple different first-party dependencies that could come from different indexes. I think this would be extremely hard to implement correctly and could lead to confusing behavior.

Did you consider allowing pinning packages to indexes using glob-patterns as mentioned here? AFAIK that is the only solution for pinning transitive dependencies that is straightforward enough but still useful for a set of real use-cases.

@charliermarsh
Copy link
Member Author

Did you consider allowing pinning packages to indexes using glob-patterns as mentioned #171 (comment)? AFAIK that is the only solution for pinning transitive dependencies that is straightforward enough but still useful for a set of real use-cases.

There's nothing stopping us from supporting that in addition to the schema described above. It strikes me as somewhat backwards but I understand why it's useful. My only concern is that we're complicating the schema and creating two ways to assign a package to an index.

@corleyma
Copy link

@charliermarsh I know it's painful/stupid/messy, but the regex assignment is still really useful for handling transitive dependencies in corporate environments. I don't think there are easy alternatives given the default assumption that python wants to make about indices being equivalent.

@zanieb
Copy link
Member

zanieb commented Sep 20, 2024

I wonder if we should have a separate schema for pinning transitive packages to a defined index? It could allow globs as well. I'm not sure what all the trade-offs are.

@charliermarsh
Copy link
Member Author

@zanieb -- That's a good call. Instead of putting this on the index schema, we could have a separate table for assigning packages to named indexes.

@charliermarsh
Copy link
Member Author

I'll probably be pushing to this branch a lot over the next few days, so you may want to unsubscribe if the notifications are annoying!

@charliermarsh charliermarsh force-pushed the charlie/index-api branch 7 times, most recently from 3208ab0 to 518f3b3 Compare September 27, 2024 01:39
@charliermarsh
Copy link
Member Author

Ok @zanieb -- I think this is ready for review. It now includes docs too.

@j178
Copy link
Contributor

j178 commented Sep 27, 2024

While playing with this, I found an issue that the lockfile is always ignored due to missing remote index:

DEBUG Ignoring existing lockfile due to missing remote index: `idna` `3.4` from `https://download.pytorch.org/whl/cu121`

A reproduction:

$ cat pyproject.toml
[project]
name = "foo"
version = "0.1.0"
requires-python = ">=3.12"
dependencies = [
    "idna>=3.4",
]

[[tool.uv.index]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cu121"
explicit = true

[tool.uv.sources]
idna = { index = "pytorch" }

$ cargo run -- lock
$ cargo run -- lock

Seems like explict indexes are not included in the locations.indexes(), so that remotes does not contain explict indexes:

// Collect the set of available indexes (both `--index-url` and `--find-links` entries).
let remotes = indexes.map(|locations| {
locations
.indexes()
.filter_map(|index_url| match index_url {
IndexUrl::Pypi(_) | IndexUrl::Url(_) => {
Some(UrlString::from(index_url.redacted()))
}
IndexUrl::Path(_) => None,
})
.chain(
locations
.flat_index()
.filter_map(|index_url| match index_url {
FlatIndexLocation::Url(_) => {
Some(UrlString::from(index_url.redacted()))
}
FlatIndexLocation::Path(_) => None,
}),
)
.collect::<BTreeSet<_>>()
});

@charliermarsh
Copy link
Member Author

Thanks, good catch.

@charliermarsh charliermarsh force-pushed the charlie/index-api branch 5 times, most recently from e331d72 to 2a8d89e Compare September 27, 2024 17:28
@charliermarsh charliermarsh force-pushed the charlie/index-api branch 2 times, most recently from e5c0b1e to f5aa096 Compare September 28, 2024 23:06
@charliermarsh charliermarsh changed the base branch from main to charlie/multi-sources September 28, 2024 23:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for pinning a package to a specific index
6 participants