Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issues of opendalfs 0.1 release #6

Open
2 of 19 tasks
Xuanwo opened this issue Jul 26, 2024 · 11 comments
Open
2 of 19 tasks

Tracking issues of opendalfs 0.1 release #6

Xuanwo opened this issue Jul 26, 2024 · 11 comments

Comments

@Xuanwo
Copy link
Collaborator

Xuanwo commented Jul 26, 2024

This issue is used to track the progress of opendalfs 0.1 release. Welcome to join in the developmenet by leaving your comments here.

  • In 0.1 release, we might not cover all supported service. We will provide memory, fs, s3, azblob, gcs, oss at least.
  • In 0.1 release, we will only have blocking API first. Async API is on our plan.

Tasks

  • Figure out all fsspec APIs that need to implement (maybe refer to s3fs and ossfs)
  • Implement OpendalFileSystem APIs
    • fsid
    • mkdir
    • mkdirs
    • rmdir
    • ls
    • info
    • rm_file
    • _open
    • created
    • modified
  • Implement OpendalBufferedFile APIs
    • _upload_chunk
    • _initiate_upload
    • _fetch_range
  • Figure out how to perform releases
    • I expect to have a package for every different service.
  • Figure out how to do test on different service
  • Add docs for service.
@Xuanwo
Copy link
Collaborator Author

Xuanwo commented Jul 26, 2024

cc @wey-gu and @BeautyyuYanli, are there any features you'd like included in our initial release?

@Xuanwo Xuanwo pinned this issue Jul 26, 2024
@BeautyyuYanli
Copy link
Collaborator

I think no need to have a package for every different service, since the opendal has all in one package

@BeautyyuYanli
Copy link
Collaborator

Some methods have already been implemented in the abstract class. Others we can implement them by opendal-python or directly rust binding. Maybe the package will not depend on opendal-python, but becomes a new Python binding.

@Xuanwo
Copy link
Collaborator Author

Xuanwo commented Jul 27, 2024

I think no need to have a package for every different service, since the opendal has all in one package

Hi, thanks for joining the discussion first.

Please allow me to provide some context before going deeper:

  • the opendal rust core have all services in a single crate (the package in Rust). Users can enable only the services they need, rather than importing unnecessary ones.
  • The python binding for OpenDAL has opted to include as many services as possible. However, this decision has not been well-received by the community because it results in a very large python package. Users are required to install the entire huge package even if they only need access to S3.
    image
  • To further complicate matters, some services require additional dynamic libraries to function. For instance, sqlite requires libsqlite, while hdfs needs both libhdfs and libjvm. We must decide whether to discontinue support for these services or require all users to install the necessary libraries.

In opendalfs, I plan to separate various services into distinct packages, allowing users to selectively install the services they need, such as with pip install opendalfs[s3, azblob]. The implementation details are still being researched; however, I personally believe this is the better approach.

Maybe the package will not depend on opendal-python, but becomes a new Python binding.

I believe we can directly build from opendal rust to better align with fsspec's behavior without additional abstraction.

@Xuanwo
Copy link
Collaborator Author

Xuanwo commented Jul 27, 2024

I have establish the project layout. Adding a service should be as easy as add a simple config: #11

@Xuanwo
Copy link
Collaborator Author

Xuanwo commented Jul 28, 2024

@BeautyyuYanli, I have updated the API that we need to implement and added place holders for them, welcome to take a look.

@martindurant
Copy link
Member

I would like to add to your list:

  • decide on async or blocking API (this might already be done)
  • pick a protocol string, and think about how users should express the inner protocol. For example, "opendal://gdrive/shared_folder/path/file" might be a reasonable way to refer to a path which is to be handled by DAL, but resides in gdrive (or s3, or ...). Or you could introduce separate protocols for each, like "dal_gdrive", or you might want to "register" your implementation to overwrite the known protocols in fsspec, such that "gdrive" refers to DAL's implementation for the rest of the session.

@dongshunyao
Copy link

Hello! I would like to contribute to this project.

I have experience with Python and C++, but I am not yet familiar with Rust. I am very willing to learn it for this project.

I am still a beginner in open source. @Xuanwo Could you please assign me some simple and basic tasks to familiarize me with the whole process? I could get started and learn from them.

Thanks to @wey-gu for the guidance!

@Xuanwo
Copy link
Collaborator Author

Xuanwo commented Sep 24, 2024

Hello! I would like to contribute to this project.

I have experience with Python and C++, but I am not yet familiar with Rust. I am very willing to learn it for this project.

I am still a beginner in open source. @Xuanwo Could you please assign me some simple and basic tasks to familiarize me with the whole process? I could get started and learn from them.

Thanks to @wey-gu for the guidance!

Hi, @dongshunyao, nice to meet you! I think we can start with implementing info or mkdir like we do for ls.

@dongshunyao
Copy link

Hello! I would like to contribute to this project.
I have experience with Python and C++, but I am not yet familiar with Rust. I am very willing to learn it for this project.
I am still a beginner in open source. @Xuanwo Could you please assign me some simple and basic tasks to familiarize me with the whole process? I could get started and learn from them.
Thanks to @wey-gu for the guidance!

Hi, @dongshunyao, nice to meet you! I think we can start with implementing info or mkdir like we do for ls.

Thank you! I prefer to implement mkdir first. Should I write it in Rust like fs.rs#L18 and in Python like fs.py#L39, and complete the corresponding tests?

@Xuanwo
Copy link
Collaborator Author

Xuanwo commented Sep 24, 2024

Thank you! I prefer to implement mkdir first. Should I write it in Rust like fs.rs#L18 and in Python like fs.py#L39, and complete the corresponding tests?

Exactly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants