-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: databricks sync
to target on DBFS (similar to databricks fs cp
)
#1619
Comments
@sgerloff Thanks for opening the issue. Could you inform us a bit about the use case behind the feature request? |
At my company we use a tool implemented in python that syncs our local code to databricks for testing and development: https://github.com/getyourguide/db-rocket . This feature seems to be the only blocker to migrate to a databricks-cli native solution to sync local files with the DBFS. In addition, I guess any usecase that uses |
I wouldn't mind contributing this feature myself. It seems like a very straightforward implementation, as all the pieces are already in place. |
@sgerloff why do you choose to use DBFS and not workspace file system or UC volumes? |
@andrewnester The main reason is the usecase for python modules. To enable some amount of local development we write code in our local IDE. Then we copy the python code to the DBFS and perform an import with
Followed by
Note the "-e" flag for pip install. This enables to resync the code without reinstalling it making iterations between local and notebook quicker. The issue with the workspace is that it is not possible to do |
@sgerloff Have you considered using DABs? The docs: https://docs.databricks.com/en/dev-tools/bundles/python-wheel.html |
@shreyas-goenka I would need to investigate this more. It does have the sync and watch option that we would need. But at first glance it does seem to be intended for something else (pipelines and jobs), instead of just pushing code to a volume that can be easily accessed from a databricks notebook. In fact, it does seem like a lot of overhead compared to either our inhouse soluton db-rocket, or a simple bash script using Is there any concern allowing the |
@sgerloff Yes, the concern I have is it's another file system that we'll have to maintain for the Note: Databricks Asset Bundles (available under the |
@shreyas-goenka reading up on bundles (which btw seem very cool), it does seem it suffers from the same issue: With the
And calling It does suffer from the same issue, where in the sync module: Line 102 in 37b9df9
(Note: I am pretty sure that building the wheel and syncing that would not solve my issue, as it would again not allow for the |
@sgerloff With DABs, we do not support writing files / wheel files to DBFS. We recently introduced support for uploading wheel files to UC volumes, which should work. #1591 Note: In general we recommend using UC volumes instead of DBFS because of better security and governance.
@andrewnester How do we recommend users work around this problem today? Is it by incrementing the wheel version number? |
@andrewnester the requirements for my usecase are:
|
I think using DABs for just syncing files might be an overkill. Generally we recommend using UC Volumes or Workspace file system (WSFS) for uploading your files because it has better permission control comparing to DBFS. That being said I anticipate that we will unlikely add support for DBFS for
You can sync it to WSFS with
This can be done with
You can use |
@andrewnester The WSFS is indeed assessible but it is incompatible with the It looks like that the feature request is not fitting your design goals. Would it be feasible to add a new command to the |
could you please clarify this? what error do you get? As to feature itself, we'll have an internal discussion on it and will come with you with any updates |
@andrewnester The error is somewhat obfuscated, but its easy to reproduce:
You should get the error Two ways to fix this:
I hope this helps pin-pointing the problem. |
Thanks for the detailed description, I will pass it on to the team owning this functionality and will update this ticket with any new information. In the meantime relying on |
Describe the issue
The command
databricks sync
is limited to target directories on the Workspace. This may not be the desirable target location for all syncs. I propose to handle targets on DBFS similar to howdatabricks fs cp
does it, extending the range of applications for the sync command.Steps to reproduce the behavior
databricks sync <local_dir> dbfs:<dbfs dir>
Expected Behavior
Sync files and directories to specified location on DBFS
Actual Behavior
Complain about the path, due to using the Workspace filer.
OS and CLI version
All
Is this a regression?
No
Debug Logs
Honestly, this is more like a feature request...
The text was updated successfully, but these errors were encountered: