-
Notifications
You must be signed in to change notification settings - Fork 159
Characterization
We need to characterize Ziti performance so that we can compare it against plain internet, against other technologies and against itself, so we can tell if we improving, maintaining or degrading performance over time.
Characterization scenarios will be done across three axis.
- The model
- This includes the numbers and interactions of services, identities and polices
- The deployment
- This includes the number and type of instances and in which regions they are deployed. It also includes if we are using tunnelers or native Ziti applications
- The traffic
- This includes the number of concurrent concurrent sessions, the amount of data sent and the number of iterations.
- 1 service
- 1 identity
- 1 edge router
- 1 of each policy
For models with multiple edge routers, do we need to set the runtime up so only one is active, for consistency in test results (and also keeping testing costs down?)
For each policy from A <-> B, ensure we have at least
- an A with a policy which has all Bs
- a B with a policy which has all As
- an A with all policies
- a B with all policies
- Ensure that the A and B we test with are worst case: have access to maximum entities on both sides and are lexically sorted last to expose slowdowns in scans
- 20 services
- 100 identities
- 10 edge routers
- 10 Service Policies
- 10 Edge Router Policies
- 10 Service Edge Router Policies
- 100 services
- 5,000 identities
- 100 edge routers
- 50 Service Policies
- 50 Edge Router Policies
- 10 Service Edge Router Policies
- 200 services
- 100,000 identities
- 500 edge routers
- 250 Service Policies
- 250 Edge Router Policies
- 100 Service Edge Router Policies
We can test the model in isolation outside the context of a full deployment/throughput/scale testing to ensure that the queries we need to do for the SDK will scale well. Ideally permission checks would O(1) so that the only non-constant would be service look-ups (since as a user has more services, that will naturally take more time).
This testing can be done locally, just exercising the APIs used by the SDK. If we can eliminate poor performance here that will let us focus on performance in the edge routers for the throughput and connection scale testing.
Results
baseline | small | medium | large
=====================|=======================|======================|=====================
Create API Session: | Create API Session: | Create API Session: | Create API Session:
Min : 6ms | Min : 8ms | Min : 8ms | Min : 15ms
Max : 46ms | Max : 53ms | Max : 66ms | Max : 58ms
Mean : 23.3ms | Mean : 20.45ms | Mean : 24.4ms | Mean : 28.85ms
95th : 45.9ms | 95th : 52.39ms | 95th : 65.6ms | 95th : 57.24ms
Refresh API Session: | Refresh API Session: | Refresh API Session: | Refresh API Session:
Min : 0ms | Min : 0ms | Min : 0ms | Min : 0ms
Max : 0ms | Max : 0ms | Max : 0ms | Max : 0ms
Mean : 0ms | Mean : 0ms | Mean : 0ms | Mean : 0ms
95th : 0ms | 95th : 0ms | 95th : 0ms | 95th : 0ms
Get Services: | Get Services: | Get Services: | Get Services:
Min : 14ms | Min : 156ms | Min : 785ms | Min : 3521ms
Max : 17ms | Max : 187ms | Max : 848ms | Max : 3705ms
Mean : 16ms | Mean : 169.6ms | Mean : 805.4ms | Mean : 3620.5ms
95th : 17ms | 95th : 187ms | 95th : 848ms | 95th : 3705ms
Create Session: | Create Session: | Create Session: | Create Session:
Min : 6ms | Min : 8ms | Min : 18ms | Min : 2033ms
Max : 36ms | Max : 49ms | Max : 38ms | Max : 4951ms
Mean : 15.75ms | Mean : 20.35ms | Mean : 24.05ms | Mean : 3386.95ms
95th : 35.9ms | 95th : 48.95ms | 95th : 37.9ms | 95th : 4944.65ms
Refresh Session: | Refresh Session: | Refresh Session: | Refresh Session:
Min : 0ms | Min : 0ms | Min : 0ms | Min : 0ms
Max : 0ms | Max : 0ms | Max : 0ms | Max : 0ms
Mean : 0ms | Mean : 0ms | Mean : 0ms | Mean : 0ms
95th : 0ms | 95th : 0ms | 95th : 0ms | 95th : 0ms
We should test with a variety of instance types, from t2 on up. Until we start testing, it will be hard to say what is needed. For high bandwidth applications you often need bigger instance types, even if the CPU and memory aren't required.
The controller should require smaller instances than the router, at least in terms of network use.
We shouldn't need to test deployment variations, such as tunneler vs SDK enabled application for all scenarios. We can pick one or two scenarios in order to find out if there are noticeable differences.
There are some different traffic types we should test:
- IPerf, for sustained throughput testing. This can be done with various degrees of parallelism.
- Something like a web-service or HTTP server, for lots of concurrent, short lived connections, to get a feel for connection setup/teardown overhead.