Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Improved project bootstrapping #538

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

janbuchar
Copy link
Collaborator

@janbuchar janbuchar commented Sep 20, 2024

This adds a unified crawler template. The original playwright and beautifulsoup templates are kept for compatibility with older versions of the CLI.

The user is now prompted for package manager type (pip, poetry), crawler type, start URL and whether or not Apify integration should be set up.

@janbuchar janbuchar added the t-tooling Issues with this label are in the ownership of the tooling team. label Sep 20, 2024
@github-actions github-actions bot added this to the 98th sprint - Tooling team milestone Sep 20, 2024
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Pull Request Tookit has failed!

Pull request is neither linked to an issue or epic nor labeled as adhoc!

@github-actions github-actions bot added the tested Temporary label used only programatically for some analytics. label Sep 20, 2024
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Pull Request Tookit has failed!

Pull request is neither linked to an issue or epic nor labeled as adhoc!

@janbuchar
Copy link
Collaborator Author

janbuchar commented Sep 20, 2024

okay, this could use some testing now... plus there's a bunch of stuff to consider

  • the pip thing is probably not very ergonomic and I'm not sure how to approach it well... should we create a .venv for users? should we have a requirements.txt and use it as a lockfile?
  • does this work on windows?
  • isn't the creation dialog too long now? maybe some of the options don't need the prompt and just CLI flags are enough
  • should we include http client selection?
  • we should probably scream and shout if poetry is not installed... or we could just disable the option in that case

@janbuchar janbuchar marked this pull request as ready for review September 20, 2024 12:33
Copy link
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, just a few comments

@@ -0,0 +1,12 @@
{
"project_name": "crawlee-python-beautifulsoup-project",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll probably just change it to my-crawler or something.

@@ -0,0 +1,37 @@
# {{cookiecutter.project_name}}

Project skeleton generated by Crawlee (Beautifulsoup template).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it always BS?

src/crawlee/_cli.py Show resolved Hide resolved
@janbuchar
Copy link
Collaborator Author

Thanks for the review @vdusek! Do you have anything to add regarding those points for consideration?

@vdusek
Copy link
Collaborator

vdusek commented Sep 25, 2024

the pip thing is probably not very ergonomic and I'm not sure how to approach it well... should we create a .venv for users? should we have a requirements.txt and use it as a lockfile?

That is generally how pip works. I wouldn't add additional overhead here. Simply dumping the dependencies into requirements.txt should be enough. Managing the environment and installation should be left to the user - that's the standard approach with pip.

isn't the creation dialog too long now? maybe some of the options don't need the prompt and just CLI flags are enough

IMO the current setup is fine. Also, all options have default values.

should we include http client selection?

That would be a nice feature, but I wouldn't prioritize it at this moment. We have a guide covering that, which should be sufficient for now.

we should probably scream and shout if poetry is not installed... or we could just disable the option in that case

I would suggest throwing an error. If a user wants to use Poetry, the option should be conditionally on Poetry being installed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is copied, but I suggest using Hadolint and improving it according to their suggestions.

We should also pin (or at least limit to < 2) the Poetry (mostly because of python-poetry/poetry#3332).


## Usage

{% if cookiecutter.package_manager == 'poetry' %}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this leads to 2 empty lines

## Usage

{% if cookiecutter.package_manager == 'poetry' %}
To get started, ensure you have [Poetry](https://python-poetry.org/), a package and dependency management system, installed on your machine. We recommend installing it with the following command:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we wrap it to 120 line width?

Copy link
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the Pip versions does not work

@janbuchar
Copy link
Collaborator Author

the pip thing is probably not very ergonomic and I'm not sure how to approach it well... should we create a .venv for users? should we have a requirements.txt and use it as a lockfile?

That is generally how pip works. I wouldn't add additional overhead here. Simply dumping the dependencies into requirements.txt should be enough. Managing the environment and installation should be left to the user - that's the standard approach with pip.

I generally agree. But how would you ensure that pipx run crawlee create ... installs the dependencies in a venv? Often people will want to keep the virtualenv in the project directory, which doesn't get created until you run crawlee create.

should we include http client selection?

That would be a nice feature, but I wouldn't prioritize it at this moment. We have a guide covering that, which should be sufficient for now.

It's a 15-minute adventure, the only problem is that it'd make the dialog longer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.
Projects
None yet
2 participants