Skip to content

Commit

Permalink
SDK v2 development (#26)
Browse files Browse the repository at this point in the history
Object model updates:

* Task/Batch/Project models updated
* A new method as_dict() introduced to access object as a dict
* New ways to retrieve the list of tasks/batches:
get_tasks and get_batches are the new generator methods for bulk retrieval

API:

* Isolated API access into a different class
* Enabled HTTP retry for certain error codes
* Improved error handling by differentiating exception types

Infra improvements:

* Enabled type hinting across the package
* New code standards applied via Pylint, flake8 and black
* Integrated pre-commit for a better/consistent developer experience
* publish.sh introduced for an automated publish to PyPI
* New pytest test cases are added

Documentation

* New Migration guide for v2
* New Developer Guide (how to setup repo env and configure pre-commit)
* Updated deployment and publishing guide
* Updated README for v2
* Made README to be available in PyPI
  • Loading branch information
fatihkurtoglu committed Apr 5, 2021
1 parent 139e521 commit 22827ba
Show file tree
Hide file tree
Showing 21 changed files with 1,753 additions and 584 deletions.
32 changes: 26 additions & 6 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,8 +1,28 @@
*.pyc
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# Distribution / packaging
/build/
/dist/
/*.egg-info
.tox
.cache
/.vscode/
*.egg
*.eggs
*.egg-info/
MANIFEST

# For Visual Studio Code
.vscode/

# Mac
.DS_Store
/build/

# Unit test / coverage reports
.[nt]ox/
htmlcov/
.coverage
.coverage.*
.*cache
nosetests.xml
coverage.xml
*.cover
41 changes: 41 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
default_language_version:
python: python3.6
default_stages: [commit]

repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v3.2.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-added-large-files
- id: check-yaml
- id: check-case-conflict
- repo: https://github.com/pycqa/isort
rev: 5.8.0
hooks:
- id: isort
name: isort
args: ["--profile", "black"]
- repo: https://github.com/psf/black
rev: 20.8b1
hooks:
- id: black
- repo: https://gitlab.com/pycqa/flake8
rev: 3.8.4
hooks:
- id: flake8
- repo: local
hooks:
- id: pylint
name: pylint
entry: pylint
language: python
types: [python]
files: scaleapi/
additional_dependencies:
- 'pylint>=2.7.4'
- 'requests>=2.25.0'
- 'urllib3>=1.26.0'
- 'pytest>=6.2.2'
language_version: python3.6
8 changes: 8 additions & 0 deletions .pylintrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
[MASTER]
disable=
missing-module-docstring,
too-few-public-methods,
too-many-locals,
too-many-arguments,
too-many-instance-attributes,
invalid-name,
5 changes: 0 additions & 5 deletions MANIFEST

This file was deleted.

181 changes: 121 additions & 60 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,22 @@
=====================
*********************
Scale AI | Python SDK
=====================
*********************

If you use earlier versions of the SDK, please refer to `v1.0.4 documentation <https://github.com/scaleapi/scaleapi-python-client/blob/release-1.0.4/README.rst>`_.

If you are migrating from earlier versions to v2, please refer to `Migration Guide to v2 <https://github.com/scaleapi/scaleapi-python-client/blob/master/docs/migration_guide.md>`_.

|pic1| |pic2| |pic3|

.. |pic1| image:: https://pepy.tech/badge/scaleapi/month
:alt: Downloads
:target: https://pepy.tech/project/scaleapi
.. |pic2| image:: https://img.shields.io/pypi/pyversions/scaleapi.svg
:alt: Supported Versions
:target: https://pypi.org/project/scaleapi
.. |pic3| image:: https://img.shields.io/github/contributors/scaleapi/scaleapi-python-client.svg
:alt: Contributors
:target: https://github.com/scaleapi/scaleapi-python-client/graphs/contributors

Installation
____________
Expand All @@ -9,8 +25,6 @@ ____________
$ pip install --upgrade scaleapi
Note: We strongly suggest using `scaleapi` with Python version 2.7.9 or greater due to SSL issues with prior versions.

Usage
_____

Expand All @@ -23,11 +37,11 @@ Tasks
_____

Most of these methods will return a `scaleapi.Task` object, which will contain information
about the json response (task_id, status, etc.).
about the json response (task_id, status, params, response, etc.).

Any parameter available in `Scale's API documentation`__ can be passed as an argument option with the corresponding type.

__ https://docs.scale.com/reference#task-object
__ https://docs.scale.com/reference#tasks-object-overview

The following endpoints for tasks are available:

Expand All @@ -38,15 +52,18 @@ This method can be used for any Scale supported task type using the following fo

.. code-block:: python
client.create_{{Task Type}}_task(...)
client.create_task(TaskType, ...task parameters...)
Passing in the applicable values into the function definition. The applicable fields and further information for each task type can be found in `Scale's API documentation`__.

__ https://docs.scale.com/reference#general-image-annotation
__ https://docs.scale.com/reference

.. code-block:: python
client.create_imageannotation_task(
from scaleapi.tasks import TaskType
client.create_task(
TaskType.ImageAnnotation,
project = 'test_project',
callback_url = "http://www.example.com/callback",
instruction= "Draw a box around each baby cow and big cow.",
Expand All @@ -61,51 +78,65 @@ __ https://docs.scale.com/reference#general-image-annotation
}
)
Retrieve task
^^^^^^^^^^^^^
Retrieve a task
^^^^^^^^^^^^^^^

Retrieve a task given its id. Check out `Scale's API documentation`__ for more information.

__ https://docs.scale.com/reference#retrieve-tasks

.. code-block :: python
task = client.fetch_task('asdfasdfasdfasdfasdfasdf')
print(task.status) // Task status ('pending', 'completed', 'error', 'canceled')
print(task.response) // If task is complete
task = client.get_task('30553edd0b6a93f8f05f0fee')
print(task.status) # Task status ('pending', 'completed', 'error', 'canceled')
print(task.response) # If task is complete
List Tasks
^^^^^^^^^^

Retrieve a list of tasks, with optional filter by start and end date/time. Paginated with `next_token`. The return value is a `scaleapi.Tasklist`, which acts as a list, but also has fields for the total number of tasks, the limit and offset, and whether or not there's more. Check out `Scale's API documentation`__ for more information.
Retrieve a list of `Task` objects, with filters for: ``project_name``, ``batch_name``, ``type``, ``status``,
``review_status``, ``unique_id``, ``completed_after``, ``completed_before``, ``updated_after``, ``updated_before``,
``created_after``, ``created_before`` and ``tags``.

``get_tasks()`` is a **generator** method and yields ``Task`` objects.

`A generator is another type of function, returns an iterable that you can loop over like a list.
However, unlike lists, generators do not store the content in the memory.
That helps you to process a large number of objects without increasing memory usage.`

If you will iterate through the tasks and process them once, using a generator is the most efficient method.
However, if you need to process the list of tasks multiple times, you can wrap the generator in a ``list(...)``
statement, which returns a list of Tasks by loading them into the memory.

Check out `Scale's API documentation`__ for more information.

__ https://docs.scale.com/reference#list-multiple-tasks

.. code-block :: python
next_token = None;
counter = 0
all_tasks =[]
while True:
tasks = client.tasks(
start_time = "2020-09-08",
end_time = "2021-01-01",
customer_review_status = "accepted",
next_token = next_token,
)
for task in tasks:
counter += 1
print('Downloading Task %s | %s' % (counter, task.task_id))
all_tasks.append(task.__dict__['param_dict'])
next_token = tasks.next_token
if next_token is None:
break
print(all_tasks)
from scaleapi.tasks import TaskReviewStatus, TaskStatus
tasks = client.get_tasks(
project_name = "My Project",
created_after = "2020-09-08",
completed_before = "2021-04-01",
status = TaskStatus.Completed,
review_status = TaskReviewStatus.Accepted
)
# Iterating through the generator
for task in tasks:
# Download task or do something!
print(task.task_id)
# For retrieving results as a Task list
task_list = list(tasks)
print(f"{len(task_list))} tasks retrieved")
Cancel Task
^^^^^^^^^^^

Cancel a task given its id if work has not started on the task (task status is `Queued` in the UI). Check out `Scale's API documentation`__ for more information.
Cancel a task given its id if work has not started on the task (task status is ``Queued`` in the UI). Check out `Scale's API documentation`__ for more information.

__ https://docs.scale.com/reference#cancel-task

Expand Down Expand Up @@ -153,8 +184,13 @@ __ https://docs.scale.com/reference#batch-status
client.batch_status(batch_name = 'batch_name_01_07_2021')
Retrieve Batch
^^^^^^^^^^^^^^
# Alternative via Batch.get_status()
batch = client.get_batch('batch_name_01_07_2021')
batch.get_status() # Refreshes tasks_{status} attributes of Batch
print(batch.tasks_pending, batch.tasks_completed)
Retrieve A Batch
^^^^^^^^^^^^^^^^

Retrieve a single Batch. Check out `Scale's API documentation`__ for more information.

Expand All @@ -167,27 +203,37 @@ __ https://docs.scale.com/reference#batch-retrieval
List Batches
^^^^^^^^^^^^

Retrieve a list of Batches. Check out `Scale's API documentation`__ for more information.
Retrieve a list of Batches. Optional parameters are ``project_name``, ``batch_status``, ``created_after`` and ``created_before``.

``get_batches()`` is a **generator** method and yields ``Batch`` objects.

`A generator is another type of function, returns an iterable that you can loop over like a list.
However, unlike lists, generators do not store the content in the memory.
That helps you to process a large number of objects without increasing memory usage.`

When wrapped in a ``list(...)`` statement, it returns a list of Batches by loading them into the memory.

Check out `Scale's API documentation`__ for more information.

__ https://docs.scale.com/reference#batch-list

.. code-block :: python
next_token = None;
from scaleapi.batches import BatchStatus
batches = client.get_batches(
batch_status=BatchStatus.Completed,
created_after = "2020-09-08"
)
counter = 0
all_batchs =[]
while True:
batches = client.list_batches(
status = "completed"
)
for batch in batches:
counter += 1
print('Downloading Batch %s | %s | %s' % (counter, batch.name, batch.param_dict['status']))
all_batchs.append(batch.__dict__['param_dict'])
next_token = batches.next_token
if next_token is None:
break
print(all_batchs)
for batch in batches:
counter += 1
print(f'Downloading batch {counter} | {batch.name} | {batch.project}')
# Alternative for accessing as a Batch list
batch_list = list(batches)
print(f"{len(batch_list))} batches retrieved")
Projects
________
Expand Down Expand Up @@ -221,7 +267,7 @@ __ https://docs.scale.com/reference#project-retrieval
List Projects
^^^^^^^^^^^^^

This function does not take any arguments. Retrieve a list of every Project.
This function does not take any arguments. Retrieve a list of every Project.
Check out `Scale's API documentation`__ for more information.

__ https://docs.scale.com/reference#batch-list
Expand All @@ -232,7 +278,7 @@ __ https://docs.scale.com/reference#batch-list
projects = client.projects()
for project in projects:
counter += 1
print('Downloading project %s | %s | %s' % (counter, project['name'], project['type']))
print(f'Downloading project {counter} | {project.name} | {project.type}')
Update Project
^^^^^^^^^^^^^^
Expand All @@ -245,23 +291,38 @@ __ https://docs.scale.com/reference#project-update-parameters
data = client.update_project(
project_name='test_project',
pathc = false,
patch = false,
instruction='update: Please label all the stuff',
)
Error handling
______________

If something went wrong while making API calls, then exceptions will be raised automatically
as a `scaleapi.ScaleException` or `scaleapi.ScaleInvalidRequest` runtime error. For example:
as a `ScaleException` parent type and child exceptions:

- ``ScaleInvalidRequest``: 400 - Bad Request -- The request was unacceptable, often due to missing a required parameter.
- ``ScaleUnauthorized``: 401 - Unauthorized -- No valid API key provided.
- ``ScaleNotEnabled``: 402 - Not enabled -- Please contact [email protected] before creating this type of task.
- ``ScaleResourceNotFound``: 404 - Not Found -- The requested resource doesn't exist.
- ``ScaleDuplicateTask``: 409 - Conflict -- The provided idempotency key or unique_id is already in use for a different request.
- ``ScaleTooManyRequests``: 429 - Too Many Requests -- Too many requests hit the API too quickly.
- ``ScaleInternalError``: 500 - Internal Server Error -- We had a problem with our server. Try again later
- ``ScaleTimeoutError``: 504 - Server Timeout Error -- Try again later.

Check out `Scale's API documentation <https://docs.scale.com/reference#errors>`_ for more details.

For example:

.. code-block:: python
try
client.create_categorization_task('Some parameters are missing.')
except scaleapi.ValidationError as e:
print(e.code) # 400
print(e.message) # missing param X
from scaleapi.exceptions import ScaleException
try:
client.create_task(TaskType.TextCollection, attachment='Some parameters are missing.')
except ScaleException as err:
print(err.code) # 400
print(err.message) # Parameter is invalid, reason: "attachments" is required
Troubleshooting
_______________
Expand Down
6 changes: 6 additions & 0 deletions docs/dev_requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
black>=19.10b0
flake8>=3.8.4
pre-commit==2.11.1
isort>=5.7.0
pytest>=6.2.2
pylint>=2.7.2
Loading

0 comments on commit 22827ba

Please sign in to comment.