Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add performance test and minor code refactor #50

Merged
merged 2 commits into from
May 19, 2024
Merged

Conversation

linj121
Copy link
Contributor

@linj121 linj121 commented May 16, 2024

First of all, I would like to thank the author for this project❤️ , which really makes my life a lot easier. And I also learnt a lot from the coding style and structure

Here are the changes I made:

Performance Test

  • I've created a new performance testing suite under tests/perf.js, which can be run with npm run test:perf
  • The reason for this is that I plan to integrate the tiktoken WASM binding into this project in the future. And before that, I would like to setup more fine-grained performance tests, so that we can test the performance difference between js-tiktoken and tiktoken
  • Inspired by testPerformance(messages) in test.js, I set up tests in tests/perf.js that measures the execution time for each of the following operations separately: usageInfo.usedTokens, usageInfo.promptUsedTokens, usageInfo.completionUsedTokens and usageInfo.usedUSD.
  • To get rid of the impact of cold start, aGPTTokens instance is always instantiated and perform usageInfo.usedTokens to cache the encoding in modelEncodingCache before the timing starts. So the test results might look different from the original testPerformance(messages)
  • To get a more consistent test result, the average execution time of 10, 100, 1000, ... , 1 million iterations are computed, which always converges to around 0.019ms per calls (by 'calls' I mean ops like usageInfo.usedTokens, usageInfo.promptUsedTokens, etc ) on my system (Ubuntu 22.04.3, 5.15.146.1-microsoft-standard-WSL2, x86_64, Intel(R) Core(TM) i5-7300HQ CPU @ 2.50GHz), with NodeJS v21.2.0
  • Here is the latest test result (time unit: ms)
>>> Start of Test Result >>>
>>> Start of Batch 1 >>>
Testing performance...
Options: {"model":"gpt-3.5-turbo-0613","messages":[{"role":"user","content":"Hello world"}]}, iterations: 20

-> usedTokens time: 0.1168
-> promptUsedTokens time: 0.3037
-> completionUsedTokens time: 0.1459
-> usedUSD time: 1.7766
Total time: 2.343

-> usedTokens time: 0.1139
-> promptUsedTokens time: 0.1359
-> completionUsedTokens time: 0.0042
-> usedUSD time: 0.2422
Total time: 0.4962

-> usedTokens time: 0.0684
-> promptUsedTokens time: 0.0445
-> completionUsedTokens time: 0.0027
-> usedUSD time: 0.3555
Total time: 0.4711

-> usedTokens time: 0.0309
-> promptUsedTokens time: 0.0401
-> completionUsedTokens time: 0.0016
-> usedUSD time: 0.1136
Total time: 0.1862

-> usedTokens time: 0.0555
-> promptUsedTokens time: 0.0387
-> completionUsedTokens time: 0.0023
-> usedUSD time: 0.2473
Total time: 0.3438

-> usedTokens time: 0.0297
-> promptUsedTokens time: 0.0399
-> completionUsedTokens time: 0.0022
-> usedUSD time: 0.1403
Total time: 0.2121

-> usedTokens time: 0.0276
-> promptUsedTokens time: 0.0391
-> completionUsedTokens time: 0.0023
-> usedUSD time: 0.2982
Total time: 0.3672

-> usedTokens time: 0.0424
-> promptUsedTokens time: 0.0535
-> completionUsedTokens time: 0.0024
-> usedUSD time: 0.0955
Total time: 0.1938

-> usedTokens time: 0.0288
-> promptUsedTokens time: 0.0421
-> completionUsedTokens time: 0.0049
-> usedUSD time: 0.1504
Total time: 0.2262

-> usedTokens time: 0.0308
-> promptUsedTokens time: 0.0435
-> completionUsedTokens time: 0.0062
-> usedUSD time: 0.1025
Total time: 0.183

-> usedTokens time: 0.0274
-> promptUsedTokens time: 0.0395
-> completionUsedTokens time: 0.0018
-> usedUSD time: 0.0775
Total time: 0.1462

-> usedTokens time: 0.0263
-> promptUsedTokens time: 0.0389
-> completionUsedTokens time: 0.0019
-> usedUSD time: 0.0803
Total time: 0.1474

-> usedTokens time: 0.027
-> promptUsedTokens time: 0.0391
-> completionUsedTokens time: 0.0018
-> usedUSD time: 0.0952
Total time: 0.1631

-> usedTokens time: 0.0266
-> promptUsedTokens time: 0.0383
-> completionUsedTokens time: 0.0021
-> usedUSD time: 0.0749
Total time: 0.1419

-> usedTokens time: 0.0269
-> promptUsedTokens time: 0.0386
-> completionUsedTokens time: 0.0019
-> usedUSD time: 0.0752
Total time: 0.1426

-> usedTokens time: 0.0271
-> promptUsedTokens time: 0.0393
-> completionUsedTokens time: 0.0018
-> usedUSD time: 8.2488
Total time: 8.317

-> usedTokens time: 0.0362
-> promptUsedTokens time: 0.0972
-> completionUsedTokens time: 0.0649
-> usedUSD time: 0.461
Total time: 0.6593

-> usedTokens time: 0.0333
-> promptUsedTokens time: 0.0798
-> completionUsedTokens time: 0.002
-> usedUSD time: 0.1261
Total time: 0.2412

-> usedTokens time: 0.0322
-> promptUsedTokens time: 0.0702
-> completionUsedTokens time: 0.0019
-> usedUSD time: 0.1382
Total time: 0.2425

-> usedTokens time: 0.0373
-> promptUsedTokens time: 0.0605
-> completionUsedTokens time: 0.0013
-> usedUSD time: 0.0901
Total time: 0.1892

>>> End of Batch 1 >>>
>>> Start of Batch 2 >>>
Testing performance...
Options: {"model":"gpt-3.5-turbo-0613","messages":[{"role":"user","content":"Hello world"}]}, iterations: 10

Testing performance...
Options: {"model":"gpt-3.5-turbo-0613","messages":[{"role":"user","content":"Hello world"}]}, iterations: 100

Testing performance...
Options: {"model":"gpt-3.5-turbo-0613","messages":[{"role":"user","content":"Hello world"}]}, iterations: 1000

Testing performance...
Options: {"model":"gpt-3.5-turbo-0613","messages":[{"role":"user","content":"Hello world"}]}, iterations: 10000

Testing performance...
Options: {"model":"gpt-3.5-turbo-0613","messages":[{"role":"user","content":"Hello world"}]}, iterations: 100000

Testing performance...
Options: {"model":"gpt-3.5-turbo-0613","messages":[{"role":"user","content":"Hello world"}]}, iterations: 500000

Testing performance...
Options: {"model":"gpt-3.5-turbo-0613","messages":[{"role":"user","content":"Hello world"}]}, iterations: 1000000


Statistical Information:
Setting: {"options":{"model":"gpt-3.5-turbo-0613","messages":[{"role":"user","content":"Hello world"}]},"iterations":10}
Used Tokens for Each Call: 9
Total Execution Time: 1.5428999960422516
Total Number of Iterations: 10
Total Number of Calls per Iteration: 4
Avg Execution Time (per Iteration): 0.15429ms
Avg Execution Time (per Call): 0.03857ms
Setting: {"options":{"model":"gpt-3.5-turbo-0613","messages":[{"role":"user","content":"Hello world"}]},"iterations":100}
Used Tokens for Each Call: 9
Total Execution Time: 14.768799722194672
Total Number of Iterations: 100
Total Number of Calls per Iteration: 4
Avg Execution Time (per Iteration): 0.14769ms
Avg Execution Time (per Call): 0.03692ms
Setting: {"options":{"model":"gpt-3.5-turbo-0613","messages":[{"role":"user","content":"Hello world"}]},"iterations":1000}
Used Tokens for Each Call: 9
Total Execution Time: 96.91169968247414
Total Number of Iterations: 1000
Total Number of Calls per Iteration: 4
Avg Execution Time (per Iteration): 0.09691ms
Avg Execution Time (per Call): 0.02423ms
Setting: {"options":{"model":"gpt-3.5-turbo-0613","messages":[{"role":"user","content":"Hello world"}]},"iterations":10000}
Used Tokens for Each Call: 9
Total Execution Time: 714.3903965950012
Total Number of Iterations: 10000
Total Number of Calls per Iteration: 4
Avg Execution Time (per Iteration): 0.07144ms
Avg Execution Time (per Call): 0.01786ms
Setting: {"options":{"model":"gpt-3.5-turbo-0613","messages":[{"role":"user","content":"Hello world"}]},"iterations":100000}
Used Tokens for Each Call: 9
Total Execution Time: 8780.892964661121
Total Number of Iterations: 100000
Total Number of Calls per Iteration: 4
Avg Execution Time (per Iteration): 0.08781ms
Avg Execution Time (per Call): 0.02195ms
Setting: {"options":{"model":"gpt-3.5-turbo-0613","messages":[{"role":"user","content":"Hello world"}]},"iterations":500000}
Used Tokens for Each Call: 9
Total Execution Time: 39309.42104059458
Total Number of Iterations: 500000
Total Number of Calls per Iteration: 4
Avg Execution Time (per Iteration): 0.07862ms
Avg Execution Time (per Call): 0.01965ms
Setting: {"options":{"model":"gpt-3.5-turbo-0613","messages":[{"role":"user","content":"Hello world"}]},"iterations":1000000}
Used Tokens for Each Call: 9
Total Execution Time: 77574.36164027452
Total Number of Iterations: 1000000
Total Number of Calls per Iteration: 4
Avg Execution Time (per Iteration): 0.07757ms
Avg Execution Time (per Call): 0.01939ms
>>> End of Batch 2 >>>
>>> End of Test Result ( Thu May 16 2024 16:33:31 GMT-0400 (Eastern Daylight Time) ) >>>
  • The test result seems pretty good to me, but to address this issue: js-tiktoken 性能太差了 #23, I'll add more test cases in the future (eg. testing with longer messages input, comparing js-tiktoken with WASM tiktoken and measuring cold start speed). According to the test result here, the wasm version tiktoken is only faster than js-tiktoken when the input size is large (923942 tokens)

Minor refactor

I've moved modelEncodingCache and getEncodingForModelCached inside GPTTokens and make them protected static, so that they are only accessible from with the class and its sub-classes. This also makes our API more encapsulated.

NPM registry error

I got the following auth error when running npm i with the latest package-lock.json

npm ERR! code E401
npm ERR! 401 Unauthorized - GET https://srun-npm.pkg.coding.net/srun4-portal/portal-core/whatwg-url/-/whatwg-url-5.0.0.tgz - Invalid credential. 请确认输入了正确的用户名和密码。

It seems that this npm resigtry requires some sort of authentication, and so I replaced them with the official registry https://registry.npmjs.org

Regression test

Add process.env.FINE_TUNE_MODEL -> const model = process.env.FINE_TUNE_MODEL || 'ft:gpt-3.5-turbo-1106:opensftp::8IWeqPit', so that developers could use their own model for testing (the original fine-tuned model listed there is somehow not accessible to me)

The test result (for my latest commit) looks good to me. The DeprecationWarning is caused by NodeJS v21, switching to v20 will solve the problem.

Testing GPT...
[1/20]: Testing gpt-3.5-turbo...
(node:725962) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
Pass!
[2/20]: Testing gpt-3.5-turbo-16k...
Pass!
[3/20]: Testing gpt-4...
Pass!
[4/20]: Testing gpt-4-32k...
Ignore model gpt-4-32k:
404 The model `gpt-4-32k` does not exist or you do not have access to it.
[5/20]: Testing gpt-4-turbo-preview...
Pass!
[6/20]: Testing gpt-4-turbo...
Pass!
[7/20]: Testing gpt-4o...
Pass!
[8/20]: Testing gpt-4o-2024-05-13...
Pass!
[9/20]: Testing gpt-4-turbo-2024-04-09...
Pass!
[10/20]: Testing gpt-4-0314...
Ignore model gpt-4-0314:
404 The model `gpt-4-0314` has been deprecated, learn more here: https://platform.openai.com/docs/deprecations
[11/20]: Testing gpt-4-32k-0314...
Ignore model gpt-4-32k-0314:
404 The model `gpt-4-32k-0314` has been deprecated, learn more here: https://platform.openai.com/docs/deprecations
[12/20]: Testing gpt-4-0613...
Pass!
[13/20]: Testing gpt-4-32k-0613...
Ignore model gpt-4-32k-0613:
404 The model `gpt-4-32k-0613` does not exist or you do not have access to it.
[14/20]: Testing gpt-4-1106-preview...
Pass!
[15/20]: Testing gpt-4-0125-preview...
Pass!
[16/20]: Testing gpt-3.5-turbo-0301...
Pass!
[17/20]: Testing gpt-3.5-turbo-0613...
Pass!
[18/20]: Testing gpt-3.5-turbo-16k-0613...
Pass!
[19/20]: Testing gpt-3.5-turbo-1106...
Pass!
[20/20]: Testing gpt-3.5-turbo-0125...
Pass!
Test success!
Testing function calling...
Pass!
Testing fine-tune...
Pass!
Testing Create a fine-tuned model...
Pass!
Testing performance...
Messages: [{"role":"user","content":"Hello world"}]
GPTTokens: 1.403ms
GPTTokens: 0.351ms
GPTTokens: 0.289ms
GPTTokens: 0.284ms
GPTTokens: 1.11ms
GPTTokens: 0.412ms
GPTTokens: 0.176ms
GPTTokens: 0.3ms
GPTTokens: 0.235ms
GPTTokens: 0.265ms

I suggest setting up a CI/CD pipeline for automated testing to make development and contribution easier

…d method of GPTTokens class; aggregate tests unders ./tests, add more fine-grained performance benchmark
@linj121 linj121 changed the title Add fine-grained performance test and refactor two global declarations Add performance test and minor code refactor May 16, 2024
@Cainier
Copy link
Owner

Cainier commented May 19, 2024

Thank you very much for your contribution to this project:

  1. NPM registry error is because I mistakenly modified the global npm private package when developing another project. Thanks for finding and raising the issue, I will delete package-lock.json and re-execute npm i to generate the file

  2. Model ft:gpt-3.5-turbo-1106:opensftp::8IWeqPit only works when testing with my own accesskey. This is indeed a problem, I will replace it with environment variables as per the scheme in your submission

  3. Regarding WASM, Starting from v1.1 version, changed from @dqbd/tiktoken to js-tiktoken, because using wasm on the web requires configuration of vite/webpack https://github.com/Cainier/gpt-tokens/tree/v1.0.9

If have the need to use wasm, I will try add useWASM configuration in next version

I will try to use CI/CD pipeline for automated testing and building, But accessKey is a problem, maybe use GitHub environment variables can fix it

@Cainier Cainier merged commit 0d48413 into Cainier:main May 19, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants