Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIS Server CPU Mode - Model returns "you." only #114

Open
NickJLange opened this issue Aug 12, 2023 · 7 comments
Open

WIS Server CPU Mode - Model returns "you." only #114

NickJLange opened this issue Aug 12, 2023 · 7 comments

Comments

@NickJLange
Copy link

Please also reference PR #113 for the run-as environment producing the below on CPU-only in Docker. All recorded output returns "You". I'm not in a position to confirm that the recorded audio passed to the model is the sample spoken text (could be anything), but wanted to park this as an issue that is perhaps known or a result of leaving off the triton optimizer on OSX.

Disconnected from ASR Service
iceConnectionLog failed
iceConnectionLog disconnected
ASR Speedup: 0x faster than realtime
ASR Audio Duration: 2500 ms
ASR Infer time: 5588.092 ms
Doing ASR with model large beam size 5 detect language {detect_language} - please wait
ASR Recording - start talking and press stop when done
ASR Speedup: 0x faster than realtime
ASR Audio Duration: 3380 ms
ASR Infer time: 5573.094 ms
Doing ASR with model large beam size 5 detect language {detect_language} - please wait
ASR Recording - start talking and press stop when done
ASR Speedup: 0x faster than realtime
ASR Audio Duration: 1460 ms
ASR Infer time: 8805.115000000002 ms
Doing ASR with model large beam size 5 detect language {detect_language} - please wait
iceConnectionLog connected
iceConnectionLog disconnected
ASR Recording - start talking and press stop when done
Connected to ASR Service - start recording whenever you like
iceConnectionLog connected
iceConnectionLog checking
signalingLog complete
localDescription offer {
  "type": "offer",
  "sdp": "v=0\r\no=mozilla...THIS_IS_SDPARTA-99.0 4828736718779339078 0 IN IP4 0.0.0.0\r\ns=-\r\nt=0 0\r\na=sendrecv\r\na=fingerprint:sha-256 F8:49:9C:C7:A8:93:81:AF:7B:4E:B9:72:46:CB:9B:24:CF:83:ED:AB:E8:75:A8:38:BC:27:1D:55:15:0F:42:BF\r\na=group:BUNDLE 0 1\r\na=ice-options:trickle\r\na=msid-semantic:WMS *\r\nm=audio 60775 UDP/TLS/RTP/SAVPF 109 9 0 8 101\r\nc=IN IP4 100.2.129.248\r\na=candidate:0 1 UDP 2122252543 2600:4041:78db:4a00:60a6:4b1f:31f1:637c 65138 typ host\r\na=candidate:2 1 UDP 2122187007 192.168.100.23 60775 typ host\r\na=candidate:4 1 TCP 2105524479 2600:4041:78db:4a00:60a6:4b1f:31f1:637c 9 typ host tcptype active\r\na=candidate:5 1 TCP 2105458943 192.168.100.23 9 typ host tcptype active\r\na=candidate:0 2 UDP 2122252542 2600:4041:78db:4a00:60a6:4b1f:31f1:637c 49664 typ host\r\na=candidate:2 2 UDP 2122187006 192.168.100.23 63018 typ host\r\na=candidate:4 2 TCP 2105524478 2600:4041:78db:4a00:60a6:4b1f:31f1:637c 9 typ host tcptype active\r\na=candidate:5 2 TCP 2105458942 192.168.100.23 9 typ host tcptype active\r\na=candidate:3 1 UDP 1685987327 100.2.129.248 60775 typ srflx raddr 192.168.100.23 rport 60775\r\na=candidate:3 2 UDP 1685987326 100.2.129.248 63018 typ srflx raddr 192.168.100.23 rport 63018\r\na=sendrecv\r\na=end-of-candidates\r\na=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level\r\na=extmap:2/recvonly urn:ietf:params:rtp-hdrext:csrc-audio-level\r\na=extmap:3 urn:ietf:params:rtp-hdrext:sdes:mid\r\na=fmtp:109 maxplaybackrate=48000;stereo=1;useinbandfec=1\r\na=fmtp:101 0-15\r\na=ice-pwd:bd9641bb404f5584d2f449a5be357b18\r\na=ice-ufrag:3b957186\r\na=mid:0\r\na=msid:- {6f242098-82d5-4a9c-b7c6-68cab4f17b66}\r\na=rtcp:63018 IN IP4 100.2.129.248\r\na=rtcp-mux\r\na=rtpmap:109 opus/48000/2\r\na=rtpmap:9 G722/8000/1\r\na=rtpmap:0 PCMU/8000\r\na=rtpmap:8 PCMA/8000\r\na=rtpmap:101 telephone-event/8000\r\na=setup:actpass\r\na=ssrc:3692125949 cname:{0b754e23-b20e-4754-b770-858f10541a71}\r\nm=application 62984 UDP/DTLS/SCTP webrtc-datachannel\r\nc=IN IP4 100.2.129.248\r\na=candidate:0 1 UDP 2122252543 2600:4041:78db:4a00:60a6:4b1f:31f1:637c 65324 typ host\r\na=candidate:2 1 UDP 2122187007 192.168.100.23 62984 typ host\r\na=candidate:4 1 TCP 2105524479 2600:4041:78db:4a00:60a6:4b1f:31f1:637c 9 typ host tcptype active\r\na=candidate:5 1 TCP 2105458943 192.168.100.23 9 typ host tcptype active\r\na=candidate:3 1 UDP 1685987327 100.2.129.248 62984 typ srflx raddr 192.168.100.23 rport 62984\r\na=sendrecv\r\na=end-of-candidates\r\na=ice-pwd:bd9641bb404f5584d2f449a5be357b18\r\na=ice-ufrag:3b957186\r\na=mid:1\r\na=setup:actpass\r\na=sctp-port:5000\r\na=max-message-size:1073741823\r\n"
}
iceGatheringLog complete
iceGatheringLog gathering
signalingLog new
added track to peer connection
@kristiankielhofner
Copy link
Contributor

Using WebRTC chances are it's not negotiating properly and you're not actually getting audio to Whisper ("you" is a pretty common hallucination when there is no speech/audio).

I looked over your PR and will comment on it separately but this likely isn't related to disabling triton (triton and auto-gptq are only used for LLM support). Those infer times are painful but not exactly unexpected given the configuration you're running.

WebRTC can be tough to debug - what client are you using? Can you give some network details? I fear that running on Mac with Docker Desktop (and the VM it uses), etc will present network challenges that make WebRTC support somewhere between difficult and impossible.

@NickJLange
Copy link
Author

Sure thing - I am running OSX 13.5 using Firefox 116.0.1 (64-bit) connected on loopback to the Docker exposed Webservice...

@kristiankielhofner
Copy link
Contributor

That's what I suspected... Can you try Chrome? It's what we do most of our testing with and it has significantly better WebRTC support compared to Firefox.

I suspect you'll still have negotiation issues but it's a good first debugging step.

@NickJLange
Copy link
Author

Ok, we are in business. I'm no longer sure of the exact issue - perhaps it was transient /user error.

As I try to get native libraries going, here is a perf. reference on a M2 Mac w/ CPU support on Docker 24. I think the inference numbers look great for the base/tiny model...

Original Text from Newspaper: Russian Groups fudge freight costs to mitigate impact of G7 oil press cap. $1 billion benefit in single-quarter. Customs data exposes adjustment. Baltic-India trade in spotlight.

Standard Model (10x faster than realtime) :Transcribed Text: Russian Group's flood freight cost to mitigate impact of G7 oil press cap in the financial times. $1 billion benefit in single-quarter custom data exposes adjustment, Baltic India trade and spotlight.

willow-inference-server-wis-1    | [2023-08-15 04:37:28 +0000] [58] [DEBUG] WHISPER: Audio duration is 13340 ms - activating long mode
willow-inference-server-wis-1    | [2023-08-15 04:37:28 +0000] [58] [DEBUG] WHISPER: Loading audio took 9.181999999999999 ms
willow-inference-server-wis-1    | [2023-08-15 04:37:28 +0000] [58] [DEBUG] WHISPER: Feature extraction took 17.53 ms
willow-inference-server-wis-1    | [2023-08-15 04:37:28 +0000] [58] [DEBUG] WHISPER: Using system default language en
willow-inference-server-wis-1    | [2023-08-15 04:37:28 +0000] [58] [DEBUG] WHISPER: Using model base with beam size 3
willow-inference-server-wis-1    | [2023-08-15 04:37:29 +0000] [58] [DEBUG] WHISPER: Model took 1270.2649999999999 ms
willow-inference-server-wis-1    | [2023-08-15 04:37:29 +0000] [58] [DEBUG] WHISPER: Decode took 0.214 ms

Tiny
Russian Group's Fudge Freight Cost to mitigate impact of G7 oil price cap. 1 billion benefit in single quarter. Customs data exposes adjustment. Altogether in the trade and spotlight.

willow-inference-server-wis-1    | [2023-08-15 04:42:09 +0000] [58] [DEBUG] WHISPER: Audio duration is 13740 ms - activating long mode
willow-inference-server-wis-1    | [2023-08-15 04:42:09 +0000] [58] [DEBUG] WHISPER: Loading audio took 7.523000000000001 ms
willow-inference-server-wis-1    | [2023-08-15 04:42:09 +0000] [58] [DEBUG] WHISPER: Feature extraction took 18.644000000000002 ms
willow-inference-server-wis-1    | [2023-08-15 04:42:09 +0000] [58] [DEBUG] WHISPER: Using system default language en
willow-inference-server-wis-1    | [2023-08-15 04:42:09 +0000] [58] [DEBUG] WHISPER: Using model tiny with beam size 3
willow-inference-server-wis-1    | [2023-08-15 04:42:10 +0000] [58] [DEBUG] WHISPER: Model took 805.3489999999999 ms
willow-inference-server-wis-1    | [2023-08-15 04:42:10 +0000] [58] [DEBUG] WHISPER: Decode took 0.206 ms
willow-inference-server-wis-1    | [2023-08-15 04:42:10 +0000] [58] [DEBUG] WHISPER: ASR transcript:  Russian Group's Fudge Freight Cost to mitigate impact of G7 oil price cap. 1 billion benefit in single quarter. Customs data exposes adjustment. Altogether in the trade and spotlight.
willow-inference-server-wis-1    | [2023-08-15 04:42:10 +0000] [58] [DEBUG] WHISPER: Inference took 832.345 ms
willow-inference-server-wis-1    | [2023-08-15 04:42:10 +0000] [58] [DEBUG] WHISPER: Inference speedup: 16x

Large: Russian groups fudge freight costs to mitigate impact of G7 oil price cap. One billion benefit in single quarter. Customs data expose adjustment. Baltic India trade in spotlight.

willow-inference-server-wis-1    | [2023-08-15 04:44:52 +0000] [58] [DEBUG] WHISPER: Audio duration is 13520 ms - activating long mode
willow-inference-server-wis-1    | [2023-08-15 04:44:52 +0000] [58] [DEBUG] WHISPER: Loading audio took 10.224 ms
willow-inference-server-wis-1    | [2023-08-15 04:44:52 +0000] [58] [DEBUG] WHISPER: Feature extraction took 24.625999999999998 ms
willow-inference-server-wis-1    | [2023-08-15 04:44:52 +0000] [58] [DEBUG] WHISPER: Using system default language en
willow-inference-server-wis-1    | [2023-08-15 04:44:52 +0000] [58] [DEBUG] WHISPER: Using model large with beam size 3
willow-inference-server-wis-1    | [2023-08-15 04:45:02 +0000] [58] [DEBUG] WHISPER: Model took 9945.594 ms
willow-inference-server-wis-1    | [2023-08-15 04:45:02 +0000] [58] [DEBUG] WHISPER: Decode took 0.184 ms
willow-inference-server-wis-1    | [2023-08-15 04:45:02 +0000] [58] [DEBUG] WHISPER: ASR transcript:  Russian groups fudge freight costs to mitigate impact of G7 oil price cap. One billion benefit in single quarter. Customs data expose adjustment. Baltic India trade in spotlight.
willow-inference-server-wis-1    | [2023-08-15 04:45:02 +0000] [58] [DEBUG] WHISPER: Inference took 9981.130000000001 ms
willow-inference-server-wis-1    | [2023-08-15 04:45:02 +0000] [58] [DEBUG] WHISPER: Inference speedup: 1x
willow-inference-server-wis-1    | [2023-08-15 04:45:02 +0000] [58] [DEBUG] RTC DC: Russian groups fudge freight costs to mitigate impact of G7 oil price cap. One billion benefit in single quarter. Customs data expose adjustment. Baltic India trade in spotlight.

@kristiankielhofner
Copy link
Contributor

I've made some progress on native Mac support. However, I'm currently experiencing a crash with signal 11 when attempting to run Whisper models. I think it could be related to my Macbook M1 only having 16GB of RAM but I'm going to continue to work through this.

Thank you for providing these performance details - all in all not that bad considering but native performance plus Apple Accelerate support should be substantially better.

Roughly 10.5x realtime for 13s of audio with base is still pretty bad considering a GTX 1070 for 10 seconds of audio with base is 115x - the realtime multiple increases dramatically with longer speech segments (same params on a GTX 1070 for ~30s of audio reaches 149x realtime). Per usual not really a fair comparison but based on what I've seen with native Mac performance with other projects we can likely do substantially better natively while still not approaching CUDA GPU speeds (for the time being).

@kristiankielhofner
Copy link
Contributor

I created a feature/mac branch. You can read the commit message to get started.

I'd be really interested to see how your testing goes with this. As I only have one Mac device (16GB Macbook Pro M1) I'm not able to do much more testing and as I've mentioned I think the issues I'm experiencing could be RAM related (or not).

@kristiankielhofner
Copy link
Contributor

@NickJLange - Have you had a chance to try the feature/mac branch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants