Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional panics occurring in the Emitter Transport layer #49

Open
jbeemster opened this issue Jul 18, 2020 · 3 comments
Open

Occasional panics occurring in the Emitter Transport layer #49

jbeemster opened this issue Jul 18, 2020 · 3 comments
Labels
type:defect Bugs or weaknesses. The issue has to contain steps to reproduce.

Comments

@jbeemster
Copy link
Member

Describe the bug
When running the tracker there are occasional panics thrown which cause the whole application to crash. These occur not in the Tracker itself but in the Emitter transport layer below it - however it might be possible to handle this panic within the Emitter to recover gracefully.

To Reproduce
This is not easy to reproduce but in a long running application I will see crashes consistently every 1-2 days.

Expected behavior
The tracker should not throw a panic performing its core function.

Environment (please complete the following information):

  • ECS Fargate on AWS
  • Golang 1.13.8
  • Container: alpine:3.7

Additional context
The stack trace:

1594737074490,fatal error: concurrent map read and map write
1594737074493,goroutine 1385994 [running]:
1594737074493,"runtime.throw(0x10170e9, 0x21)"
1594737074493,	/home/travis/.gimme/versions/go1.13.8.linux.amd64/src/runtime/panic.go:774 +0x72 fp=0xc00096ea80 sp=0xc00096ea50 pc=0x42dc02
1594737074493,"runtime.mapaccess1(0xe630a0, 0xc000396060, 0xc00096eb60, 0x19844c0)"
1594737074493,	/home/travis/.gimme/versions/go1.13.8.linux.amd64/src/runtime/map.go:411 +0x269 fp=0xc00096eac8 sp=0xc00096ea80 pc=0x40da49
1594737074493,"net/http.(*Transport).removeIdleConnLocked(0x1957300, 0xc00116e480, 0x8)"
1594737074493,	/home/travis/.gimme/versions/go1.13.8.linux.amd64/src/net/http/transport.go:983 +0x1ac fp=0xc00096eba8 sp=0xc00096eac8 pc=0x6bcc4c
1594737074493,"net/http.(*Transport).removeIdleConn(0x1957300, 0xc00116e480, 0xc000080b00)"
1594737074493,	/home/travis/.gimme/versions/go1.13.8.linux.amd64/src/net/http/transport.go:973 +0x80 fp=0xc00096ec08 sp=0xc00096eba8 pc=0x6bca50
1594737074493,"net/http.(*persistConn).readLoop.func1(0xc00116e480, 0xc00096ed88)"
1594737074493,	/home/travis/.gimme/versions/go1.13.8.linux.amd64/src/net/http/transport.go:1880 +0x58 fp=0xc00096ec30 sp=0xc00096ec08 pc=0x6ca168
1594737074493,net/http.(*persistConn).readLoop(0xc00116e480)
1594737074493,	/home/travis/.gimme/versions/go1.13.8.linux.amd64/src/net/http/transport.go:1917 +0x1148 fp=0xc00096efd8 sp=0xc00096ec30 pc=0x6c37b8
1594737074493,runtime.goexit()
1594737074493,	/home/travis/.gimme/versions/go1.13.8.linux.amd64/src/runtime/asm_amd64.s:1357 +0x1 fp=0xc00096efe0 sp=0xc00096efd8 pc=0x45b151
1594737074493,created by net/http.(*Transport).dialConn
1594737074493,	/home/travis/.gimme/versions/go1.13.8.linux.amd64/src/net/http/transport.go:1580 +0xb0d
@jbeemster jbeemster added the type:defect Bugs or weaknesses. The issue has to contain steps to reproduce. label Jul 18, 2020
@jbeemster
Copy link
Member Author

Isolating this down it seems like something in this function is causing the issue: https://github.com/snowplow/snowplow-golang-tracker/blob/master/tracker/emitter.go#L100-L115

Going to see if the issue can be fixed by using a custom transport instead of the default one.

@jbeemster
Copy link
Member Author

jbeemster commented Jul 18, 2020

Also we are editing the global transport object in this way which feels like something we should be avoiding - it seems likely that providing a custom transport rather than using the default transport is almost certainly the way to clean this issue up and stop it from occurring.

Example of a default we could copy: https://github.com/golang/go/blob/go1.13.14/src/net/http/transport.go#L42-L54

@apaatsio
Copy link

apaatsio commented Feb 6, 2024

The issue is line 109 which dereferences the pointer and therefore copies the mutex value.

defaultTransport := *defaultTransportPointer

I think it would be fixed by

defaultTransport := http.DefaultTransport.(*http.Transport).Clone()

A workaround is to pass a custom http client to InitEmitter().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:defect Bugs or weaknesses. The issue has to contain steps to reproduce.
Projects
None yet
Development

No branches or pull requests

2 participants