Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixing mlflow logging to Databricks workspace file paths with /Shared/ prefix #3410

Merged
merged 6 commits into from
Jun 18, 2024

Conversation

JackZ-db
Copy link
Contributor

@JackZ-db JackZ-db commented Jun 18, 2024

In mlflow logging, users that wanted to log into the "/Shared/" file directory had an extra slash in their modified experiment names, leading to Mlflow not recognizing existing experiments, trying to create them, and then claiming they already exist.

This PR accounts for that edge case and removes the extra slash added.

Prior to this PR, there was a bug where users who directly wanted to mlflow-log to the /Shared/ workspace would be able to do so after the first run of an experiment name, but would not be able to for following runs of the same experiment name. We found that an extra '/' was prepended to the experiment name (yielding an experiment name of //Shared/... after '/' + os.path.join('Users', databricks_username, experiment_name)), leading to get_experiment_by_name not finding the existing experiment, and create_experiment claiming that it already exists (which it does). There are probably ways in which get_experiment_name and create_experiment name handle file directories differently, also contributing to this bug.

In the original code prior to this PR, inputting an experiment name starting with /Shared/ would throw this error: mlflow.exceptions.RestException: RESOURCE_ALREADY_EXISTS: Node named ___ already exists

Now, mlflow logger correctly logs to the /Shared/... folder path as an absolute path.

Before:

  • /Shared/... -> /Users/Shared/...
  • /Users/... -> /Users/...
  • path -> /Users/path

After:

  • /Shared/... -> /Shared/... (no change)
  • /Users/... -> /Users/...
  • path -> /Users/path

@JackZ-db JackZ-db requested a review from a team as a code owner June 18, 2024 04:32
@mvpatel2000
Copy link
Contributor

Im a little confused, can you give examples of initial user specified arg and transformed arg before and after this PR as an example?

@JackZ-db
Copy link
Contributor Author

Im a little confused, can you give examples of initial user specified arg and transformed arg before and after this PR as an example?

@mvpatel2000
Prior to this PR, there was a bug where users who directly wanted to mlflow-log to the /Shared/ workspace would be able to do so after the first run of an experiment name, but would not be able to for following runs of the same experiment name. We found that an extra '/' was prepended to the experiment name (yielding an experiment name of '//Shared/...' after '/' + os.path.join('Users', databricks_username, experiment_name)), leading to get_experiment_by_name not finding the existing experiment, and create_experiment claiming that it already exists (which it does). There are probably ways in which get_experiment_name and create_experiment name handle file directories differently, also contributing to this bug.

In the original code prior to this PR, inputting an experiment name starting with '/Shared/' would throw this error: mlflow.exceptions.RestException: RESOURCE_ALREADY_EXISTS: Node named ___ already exists

Now, mlflow logger correctly logs to the '/Shared/...' folder path as an absolute path. As mentioned above, there are probably some choices we want to make surrounding how we can most intelligently interpret absolute vs relative paths, but this serves as a functioning bandaid solution, because there are only '/Users/' and '/Shared/' under workspaces.

@JackZ-db JackZ-db changed the title fixing os file paths with /Shared/ prefix fixing mlflow logging to Databricks workspace file paths with /Shared/ prefix Jun 18, 2024
Copy link
Contributor

@mvpatel2000 mvpatel2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets move the skip on Shared to be where Users is checked

composer/loggers/mlflow_logger.py Outdated Show resolved Hide resolved
Copy link
Contributor

@mvpatel2000 mvpatel2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JackZ-db JackZ-db merged commit 152e528 into mosaicml:dev Jun 18, 2024
17 checks passed
mvpatel2000 added a commit to mvpatel2000/composer that referenced this pull request Jul 21, 2024
…/ prefix (mosaicml#3410)

* fixing os file path with /Shared/ prefix

* lstrip '/' from experiment name if not '/Shared/' or '/Users/'

Co-authored-by: Mihir Patel <[email protected]>

* doesnt modify experiment name if it has '/Shared/' as a prefix

* fix formatting

* lint

---------

Co-authored-by: Mihir Patel <[email protected]>
mvpatel2000 added a commit that referenced this pull request Jul 21, 2024
…/ prefix (#3410)

* fixing os file path with /Shared/ prefix

* lstrip '/' from experiment name if not '/Shared/' or '/Users/'

Co-authored-by: Mihir Patel <[email protected]>

* doesnt modify experiment name if it has '/Shared/' as a prefix

* fix formatting

* lint

---------

Co-authored-by: Mihir Patel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants