After you click the "go" button, the app crawls all the URLs in the given webpage and displays to the customer a complete list of all the URLs the crawler found, including nested links. So essentially what you need to do is implement a simple web crawler in the form of a Django web app.
Take home part should not be a “production ready” solution, however we encourage you to write a clean and readable code like you would do in real life.
- Django + DRF + django-celery-results + PostgreSQL
- Celery + RabbiMQ + PostgreSQL + Flower
make init
: initialize the project for development (copy .env.copy to .env)make migrate
: perform migrations inside the django docker containermake create-superuser
: create a superuser in the django docker containermake help
: show this help
- The Csrf token send to the client using the
ensure_csrf_cookie
decorator, because by default django sets a cookie if you added{% csrf_token %}
to the html file. - celery_worker needs the django library because celery has fixups for django and check that django is installed.
CELERY_RESULT_BACKEND_URL
has db+ prefix, because celery hasBACKEND_ALIASES
(celery.app.backends) dictionary ( key - database type, value - class)- To solve problems with psycopg2, you need to install additional
packages
RUN apk update && apk add gcc libpq-dev musl-dev
- By default celery create two tables celery_taskmeta and celery_tasksetmeta
- django-celery-results create 3 tables:
- django_celery_results_taskresult
- django_celery_results_groupresult
- django_celery_results_chordcounter
- Without
CELERY_RESULT_EXTENDED = True
celery saves to database only task_id, status, result (in zipped format), date_done - Without
CELERY_TASK_TRACK_STARTED = True
celery doesn't add line to database with task status, only when task was finished - Django celery fixups is used by the
BUILTIN_FIXUPS
variable - Celery has no way to save the task in the backend of the result before sending it to the worker: link1 and link2
- To print all queries in postgres
"postgres", "-c", "log_statement=all"
- Regex faster than BeautifulSoup: proof.