Scheduler rate limit functionality #1611

jpbruinsslot · 2023-08-14T13:21:25Z

Changes

This PR adds rate-limiting capabilities to the scheduler for tasks. Boefjes that are subject to rate limiting (defined in the boefje manifest) are being tracked by an in-memory rate limiter leveraging the package limits. When a task is subject to a rate-limit it will get the DELAYED status and will be postponed until it is allowed again. A thread will check for delayed tasks and put them on the queue when they're ready.

This will expect some changes in the katalogus mainly exposing a grouping/namespacing on which to rate limit and the parseable rate limit.

Issue link

Closes #1317

Processess

For a boefje scheduler when a task gets created it will get check whether the boefje is rate limited or not:

flowchart TD
    A[Task Creation] --> B{is_task_rate_limited}
    B -->|Yes| C[set Task to status <br>DELAYED]
    B ---->|No| E[End]

Checking if a task is rate limited:

graph TD
    A[Start] --> B{rate_limit is None?}
    B -- Yes --> C[Return False]
    B -- No --> D{Try to parse rate_limit}
    D --> E{parsed_rate_limit is None?}
    E -- Yes --> F[Raise ValueError]
    E -- No --> G[Lock rate_limiter]
    G --> H[Get identifier_template]
    H --> I[Render identifier from template]
    I --> J[Test rate_limiter with identifier]
    J --> K{can_consume?}
    K -- No --> L[Return True]
    K -- Yes --> M{hit?}
    M -- Yes --> N[Hit rate_limiter]
    N --> O[Return False]
    M -- No --> O
    D --> P[Caught ValueError]
    P --> Q[Log warning]
    Q --> R[Raise exc]

Within the boefje scheduler a new process (push_tasks_for_delayed_tasks) is running next to:

check for scan profile changes
check for enabled boefjes
check for rescheduling (random ooi endpoint octopoes)

This push_tasks_for_delayed_tasks method gets all tasks that have the status of DELAYED. And will be checked whether or not task can be pushed onto the queue.

* main: Fix robot test (#1420) Use the correct clearance level variable in organization member list template (#1427) Fix translation in Debian package (#1432) Reschedule tasks when no results in bytes are found after grace period (#1410) Don't scan hostname nmap in nmap boefje (#1415) Add and use our own CVE API (#1383) Add `task_id` as a query parameter to the `GET /origins` endpoint (#1414) Remove member group checks and check for permission instead (#1275) Bump cryptography from 41.0.0 to 41.0.2 in /boefjes/boefjes/plugins/kat_ssl_certificates (#1396) Bump cryptography from 41.0.1 to 41.0.2 in /bytes (#1397) Build the Debian build image on the main branch (#1387) Add explicit `black` config to all modules (#1395) Fix <no title> in the user guide docs (#1391) Add configurable octpoes request timeout (#1382) Remove hardcoded clearance level in member list for superusers (#1390) Add Debian build depends for CVE API package (#1384) Add buttons to manual rerun tasks, both boefjes or normalizers (#1339) Use fix multiprocessing bug on macOS where `qsize()` is not implemented (#1374)

* main: Create new filters for findings (#1293) Add Question OOI form rendering on the object detail page (#1408) Upgrade certifi (#1462) Default scan level filter to 0 (#1463) Remove some unused config options, and set better defaults for others (#1428) Remove unnecessary dependency on ipaddress package (#1448) Translations for release 1.11 - EN -> NL, PAP (#1439)

* main: Add sectxt dependency (#1610) Refactor environment settings, names, and documentation (#1517) Add pipeline to check if there are new translation strings (#1606) Translations update from Hosted Weblate (#1604) Update scheduler documentation (#1476) Add community install/update scripts (#1309) Bump actions/checkout from 1 to 3 (#1598) Run docker-compose pull in make pull (#1585) Configure github actions in dependabot (#1594) fix many ports open normalizer (#1592) Fix human-readable name for ImageMetadata (#1558) Upgrade FastAPI (#1576) OOI Detail page: Remember page position after clicking the "show inheritance" link (#1590) Fix `rstcheck` hook (#1584)

jpbruinsslot · 2023-08-15T08:03:29Z

@ammar92 I've implemented cross-organisational support would you mind taking a look?

praseodym · 2023-08-17T11:25:59Z

Originally posted at #1413 (review), but still valid:

As far as I understand, the rate limiter is a property of the BoefjeScheduler class. This class has a separate rate_limiter instance per organisation, which means that rate limiting only works per organisation.

In larger deployments, API keys such as for Shodan will likely be shared by multiple KAT organisations. How do we implement rate limiting for that scenario?

jpbruinsslot · 2023-08-21T08:14:26Z

Originally posted at #1413 (review), but still valid:

As far as I understand, the rate limiter is a property of the BoefjeScheduler class. This class has a separate rate_limiter instance per organisation, which means that rate limiting only works per organisation.

In larger deployments, API keys such as for Shodan will likely be shared by multiple KAT organisations. How do we implement rate limiting for that scenario?

I've updated the code so that the rate limiter memory is now on the application level. The identifier of the rate limiter can then be formatted with a key for instance so that it will be shared across multiple organisations.

* main: Use 127.0.0.1 for RabbitMQ in install script (#1644) Remove environment variables from container docs (#1645) Feature/report generation timeout (#1640) Add reverse DNS boefje (#1579) Add first version of new normalisers runner design (#1538) Fix `poetry-dependencies` target in Makefile (#1627) Upgrade OpenTelemetry (#1626) Remove finding types from rocky/OOI_database_seed.json (#1619) Feature: Add task detail pages and show objects yielded by normalizer (#1506) Update django-admin-auto-tests (#1617) Update GitHub Actions (#1618) Updated cryptography (#1615) Improve filter by muted findings on findings page (#1595) Redteamer can now acknowledge clearance level during onboarding (#1549) Do not add line information in `.po` files (#1616) Add TLS Cipher checks (#1381)

underdarknl · 2023-10-19T11:22:01Z

In larger deployments, API keys such as for Shodan will likely be shared by multiple KAT organisations. How do we implement rate limiting for that scenario?

This also means that we should probably include both 'shodan' and the actual key in the 'group' to make sure we don't limit shodan requests across the whole install even though we might have multiple keys available.

underdarknl · 2023-10-19T11:30:08Z

mula/scheduler/schedulers/boefje.py

+            # Get the identifier for the rate limiter
+            identifier_template = rate_limit.identifier
+            environment = Environment(loader=BaseLoader())
+            identifier = environment.from_string(identifier_template).render(task=task)


I feel like using Jinja here is probably too much of a good thing. Woudn't fstring be more than capable enough?

I think the main reason was to give more flexibility to the user to construct a rate limit identifier that can use the information embedded in the task.

@ammar92 am I correct in that assumption?

jpbruinsslot added 30 commits July 12, 2023 15:59

Start with rate limit implementation

3248e35

Start with rate limitting implementation

7444ef2

Formatting

a6f5d62

Start writing tests

c6fa041

Rename to delayed tasks, update tests

a9aac16

Leverage tasks instead of in memory datastructure to store delayed tasks

10a7f9f

Fix consuming of the rate limiter

0b552a0

Add optional consume parameter

fda6991

Fix rate limit tests

8ec4416

Fix formatting and type hint suggestions

f5a71ea

Debugging and writing tests for edge cases

90786b9

Add order_by argument for task store

2c4a1c2

Add migration

46c1e6e

Fix migration

83e8f30

Fix tests

7f31f24

Update requirements.txt

7b162ac

Update tests timeout

8485870

Finaly made the tests work

fc17de7

Fixes for precommit

d659a16

Update tests

81223a8

Implement lock

fe1197d

Remove comment

1362292

Adding valueerror tests

f92e3d3

Fix pre-commit suggestions

3077aa6

Merge branch 'main' into feature/mula/rate-limit

880fca8

Merge branch 'main' into feature/mula/rate-limit

3496344

Implement and update rate limiting attributes from katalogus

2829e86

Merge branch 'main' into feature/mula/rate-limit

443c31e

jpbruinsslot added 5 commits August 2, 2023 09:32

Mypy

ab7d4d1

Merge branch 'main' into feature/mula/rate-limit

244ded1

Merge branch 'main' into feature/mula/rate-limit

0639733

Merge branch 'main' into feature/mula/rate-limit

4d39b20

jpbruinsslot self-assigned this Aug 14, 2023

jpbruinsslot added the mula Issues related to the scheduler label Aug 14, 2023

jpbruinsslot mentioned this pull request Aug 14, 2023

Scheduler rate limiting #1413

Closed

jpbruinsslot added 3 commits August 14, 2023 16:37

Add organisational rate limiting support

4f12d5f

Add test for organisation rate limiting

10984c0

Merge branch 'main' into feature/mula/rate-limit-2

a85001d

jpbruinsslot marked this pull request as ready for review August 15, 2023 08:02

jpbruinsslot requested a review from a team as a code owner August 15, 2023 08:02

Trying to fix tests

92a1e97

jpbruinsslot added 4 commits August 22, 2023 09:50

Merge branch 'main' into feature/mula/rate-limit-2

b51ec09

Merge branch 'main' into feature/mula/rate-limit-2

9586009

Merge branch 'main' into feature/mula/rate-limit-2

eebb018

jpbruinsslot marked this pull request as draft September 11, 2023 07:27

underdarknl reviewed Oct 19, 2023

View reviewed changes

jpbruinsslot mentioned this pull request Mar 6, 2024

Batched jobs for scheduler and task runner #2613

Open

underdarknl added this to the OpenKAT v1.17 milestone May 7, 2024

underdarknl modified the milestones: OpenKAT v1.17, OpenKAT v1.18 Aug 27, 2024

jpbruinsslot mentioned this pull request Oct 22, 2024

Create rate limiting functionality #2291

Open

jpbruinsslot linked an issue Oct 22, 2024 that may be closed by this pull request

Create rate limiting functionality #2291

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduler rate limit functionality #1611

Scheduler rate limit functionality #1611

jpbruinsslot commented Aug 14, 2023 •

edited

Loading

jpbruinsslot commented Aug 15, 2023 •

edited

Loading

praseodym commented Aug 17, 2023

jpbruinsslot commented Aug 21, 2023

underdarknl commented Oct 19, 2023

underdarknl Oct 19, 2023

jpbruinsslot Oct 19, 2023

Scheduler rate limit functionality #1611

Are you sure you want to change the base?

Scheduler rate limit functionality #1611

Conversation

jpbruinsslot commented Aug 14, 2023 • edited Loading

Changes

Issue link

Processess

jpbruinsslot commented Aug 15, 2023 • edited Loading

praseodym commented Aug 17, 2023

jpbruinsslot commented Aug 21, 2023

underdarknl commented Oct 19, 2023

underdarknl Oct 19, 2023

Choose a reason for hiding this comment

jpbruinsslot Oct 19, 2023

Choose a reason for hiding this comment

jpbruinsslot commented Aug 14, 2023 •

edited

Loading

jpbruinsslot commented Aug 15, 2023 •

edited

Loading