Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block RPC calls if not properly started or errored out during start #1122

Closed
dstadulis opened this issue Aug 29, 2024 · 21 comments · Fixed by #1126
Closed

Block RPC calls if not properly started or errored out during start #1122

dstadulis opened this issue Aug 29, 2024 · 21 comments · Fixed by #1126
Assignees
Labels

Comments

@dstadulis
Copy link
Collaborator

dstadulis commented Aug 29, 2024

In a recent user report, litd dysfunction was shown because the universe RPC call ran before the universe sub service was online.

See TODO summary here:
#1122 (comment)

@ViktorTigerstrom
Copy link
Contributor

ViktorTigerstrom commented Aug 29, 2024

Thanks for the issue report! Should we allow any rpc requests to the tapd before the universe sub-service has been started?

If they should be separated, so that we would want allow some requests prior to the universe sub-service, would an ok first version be to just disallow any tapd RPCs until the universe sub-service is up, and then create the separation at a later stage?

@jharveyb
Copy link
Collaborator

I don't think they should be separated, or that there would be any benefit to that.

IIUC the desired fix is to disallow all tapd RPCs until all tapd sub-services are up.

@ellemouton
Copy link
Member

LiT today should already take care of this. We only mark a subserver as "ready for calls" once the Start (for tap, this is the (s *Server) StartAsSubserver method) method of that subserver returns without an error. We want to keep lit as generic as possible in terms of handing of subservers and so it is expected for each subserver that it is ready to handle calls once that start method has returned and we really should not be doing extra calls to check if various subserver specifics are ready - the SLA here is that things should be ready once that method has returned.

@ellemouton
Copy link
Member

If yall dont want to permanently change the behaviour, then I suggest adding a "BlockTillUniverseStart" functional option or something on that call

@guggero
Copy link
Member

guggero commented Aug 30, 2024

Hmm, I see. That is the correct behavior as far as I can see. We only have a screenshot of a stack trace to go from and it looked like an RPC request did come through before the internal gRPC server in tapd was started.

But perhaps tapd stopped with an error and the request came through after the gRPC server was stopped?
Though from just the code it looks like that's handled correctly as well.

Need more info from the user then, preferably actual logs.

@jharveyb
Copy link
Collaborator

Reposting the stack trace as text:

2024-08-29 11:27:64.591 [DBG] RPCS: [/Lnrpc.State/GetState] requested

2024-08-29 11:27:64.682 [DBG] RPCS: [/Lnrpc.State/GetState] requested

2024-08-29 11:27:04.690 [DBG] RPCS: [/universerpc.Universe/ListFederationServers] requested

2024-08-29 11:27:64.692 [DBG] RPCS: [/Lnrpc.Lightning/ChecktacaroonPermissions] requested

2024-08-29 11:27:64.692 [DBG] RPCS: [/Lnrpc.Lightning/ListPeers] requested

panic: runtime error: invalid memory address or nil pointer dereference

[signal SIGSEGV: segmentation violation code=@x1 addr=0x38 pc=9x16c62a0]

goroutine 3224 [running]:

github. com/lightninglabs/taproot -assets. (*rpcServer).ListFederationServers(@x0?, {@x32b0ad0?, 0x40026efc50?}, @x1bb10407)
github.con/Lightninglabs/[email protected]/rpcserver .g0:5654 +8x20

github. com/lightninglabs/taproot -assets/taprpc/universerpc._Universe_ListFederationServers_Handler .func1({@x3Zb0ad0?, @x40026efc50?}, {@x1bb1040?,
github.con/Lightninglabs/taproot [email protected] -2110839696cb/taprpc/universerpc/universe_grpc.pb.90:688 +0xd0

github. com/lightningnetwork/Lnd/rpcperms.(*InterceptorChain) .CreateServerOpts. (*InterceptorChain) .middLewareUnaryServer Interceptor . func7({@x3zb0adé
github. com/lightningnetwork/[email protected] 1b353b0bfd58/rpcperms/interceptor .go:832 +0xec

‘google.golang.org/grpc.getChainunaryHandler .funct({9x32bGad0, @x40026ef C50}, {@x1bb1049, 6x49026eFcB0})
google. golang.org/[email protected]/server.go:1163 +0xa0

github. com/lightningnetwork/Lnd/rpcperms. (*InterceptorChain) .CreateServerOpts. (*InterceptorChain) .MacaroonUnaryServer Interceptor . func5({@x3Zb0ad0,
github. com/lightningnetwork/Lnd@[email protected] 1b353b0bfd58/rpcperms/interceptor .go:689 +0x90

google. golang.org/grpc.getChainUnaryHandLer . func1({@x32b0ad0, @x40026efc50}, {@x1bb1040, @x40026efc80})
google.golang.org/[email protected]/server.go:1163 +8xa0

github. com/lightningnetwork/Lnd/rpcperms. (*InterceptorChain) .CreateServerOpts. (*InterceptorChain) .rpcStateUnaryServer Interceptor . Func3({@x3Zb0ad0,
github. com/lightningnetwork/Lnd@vd. 18.0-beta.rc4.0.20240730143253-1b353bebfdS8/rpcperms/interceptor .go:781 +6x108

google. golang.org/grpc.getChainUnaryHandLer . func1({@x32b0ad0, @x40026efc50}, {@x1bb1040, @x40026efc80})
google.golang.org/[email protected]/server.go:1163 +8xa0

github. com/lightningnetwork/Lnd/rpcperms.(*InterceptorChain) .CreateServer0pts.errorLogunaryServer Interceptor . funcl({@x32b0ad0?, @x40026efc50?}, {0:
github. com/lightningnetwork/Lnd@v@. 18.0-beta.rc4.0.20240730143253- 1b353b0bfd58/rpcperms/interceptor .go:605 +0x48

google.golang.org/grpc.NewServer .chatnUnaryServerInterceptors.chainUnaryinterceptors.funci({@x32b0ad®, @x49026efC50}, {@x1bb1040, @x49026eFc8},
‘google.golang.org/[email protected]/server .go:1154 +0x88

github. com/lightninglabs/taproot -assets/taprpc/universerpc._Universe_ListFederationServers_Handler({@xleSe5a0, 0x400056e600}, {@x32b0ad0, 0x40026et
github.con/Lightninglabs/taproot [email protected] -2110839696cb/taprpc/untverserpc/universe_grpc.pb.g0:690 +9x148

‘google.golang.org/grpc. (*Server).processUnaryRPC(Gx40008541e9, {Gx32bGad0, @x49026eF99}, {6x32c2920, 0x4000588340}, 0x40026b19e0, 6x4900591c20,
google. golang.org/[email protected]/server .go:1343 +0xb40

google.golang.org/grpc. (*Server).handleStream(@x40008541e0, {0x32c2920, @x4000588340}, @x40026b19e9)
‘google. golang.org/[email protected]/server .go:1737 +8x95c

google. golang.org/grpc. (*Server).serveStreams. func1.1()
google. golang.org/[email protected]/server.go:986 +0x88

created by google.golang.org/grpc.(*Server).serveStreams.funcl in goroutine 3223
google. golang.org/[email protected]/server.go:997 +0x14c

@Roasbeef
Copy link
Member

So today we have a middleware interceptor that'll bounce all calls until the server is fully started:

// checkRPCState checks whether a call to the given server is allowed in the
// current RPC state.
func (r *InterceptorChain) checkRPCState(srv interface{}) error {
// The StateService is being accessed, we allow the call regardless of
// the current state.
_, ok := srv.(lnrpc.StateServer)
if ok {
return nil
}
r.RLock()
state := r.state
r.RUnlock()
switch state {
// Do not accept any RPC calls (unless to the state service) until LND
// has not started.
case waitingToStart:
return ErrWaitingToStart
// If the RPC server or tapd server is active, we allow all calls.
case rpcActive, serverActive:
default:
return fmt.Errorf("unknown RPC state: %v", state)
}
return nil
}

We then set to active after all the sub-systems have started here:

taproot-assets/server.go

Lines 374 to 378 in 72b93f8

// We transition the RPC state to Active, as the RPC server is up.
interceptorChain.SetRPCActive()
// We transition the server state to Active, as the server is up.
interceptorChain.SetServerActive()

From the trace, either the rpcserver or FederationDB was nil at that point, which is puzzling (all the sub-system structs have already been initialized at that point). So perhaps some mutation occurred somewhere?

@lukegao209
Copy link
Contributor

This issue is coming from our team. When the terminal is restarted while lnd and tapd are still in the process of syncing, the terminal crashes continuously due to receiving various requests from tapd (such as assets list, subscribeRfqEvents, universe).

@guggero
Copy link
Member

guggero commented Sep 3, 2024

@lukegao209 could you send us a full log (including the stack trace) from the beginning of a startup sequence until it panics?

@guggero
Copy link
Member

guggero commented Sep 9, 2024

@lukegao209 also, what port are you issuing the ListFederationServers call on? The main litd port (8443) or the integrated tapd port (10029)?

@lukegao209
Copy link
Contributor

@lukegao209 also, what port are you issuing the ListFederationServers call on? The main litd port (8443) or the integrated tapd port (10029)?

port 10029

@ellemouton
Copy link
Member

@lukegao209 - ok cool thanks 🙏 This makes sense then - rather point your requests to Lit's port 8443 as then LiT will block calls to tapd until it is ready.

@Roasbeef re

So today we have a middleware interceptor that'll bounce all calls until the server is fully started:

This is actually not the case for the Subserver startup of Tap.

  • this is done for the main startup of Tap when it is in its own process: see RunUntilShutdown which sets up the interceptor chain for Tap which does this blocking
  • but for StartAsSubserver which LiT uses , the interceptor chain in Tap is not set up

@lukegao209
Copy link
Contributor

In conclusion, for the terminal, it should request lnd on port 10009; for other services (tapd, loop, pool, etc.), they should all request port 8443. ?

@ellemouton
Copy link
Member

you can also point LND requests to 8443 👍

@lukegao209
Copy link
Contributor

thanks

@lukegao209
Copy link
Contributor

btw , will terminal’s account support taproot assets channel?

@guggero
Copy link
Member

guggero commented Sep 9, 2024

btw , will terminal’s account support taproot assets channel?

not out of the box, no. we'll need to update the litd account system once taproot asset channels are fully implemented.

@lukegao209
Copy link
Contributor

If Taproot Assets are implemented through parsing the invoice’s custom_data, maybe I can start working on supporting it now.

@Roasbeef
Copy link
Member

Roasbeef commented Sep 9, 2024

@ellemouton

but for StartAsSubserver which LiT uses , the interceptor chain in Tap is not set up

Gotcha, ok that seems to be the core issue here. We should retain the interceptor chain for tapd, either using the same system, or a more explicit check.

@ellemouton
Copy link
Member

@guggero - with our latest offline discussion, do you rate we can close this and move it to the Tap repo?

@guggero
Copy link
Member

guggero commented Sep 12, 2024

Yes. TODOs are:

  • tapd: return false in waitForReady if tapd was shut down or errored out
  • tapd: don't set initialized=true when startup (StartAsSubserver) fails
  • tapd: add server nil check to global error interceptors

Transferring to tapd repo.

@guggero guggero changed the title Block universe RPC calls to universe service until service is online Block RPC calls if not properly started or errored out during start Sep 12, 2024
@guggero guggero transferred this issue from lightninglabs/lightning-terminal Sep 12, 2024
@dstadulis dstadulis added this to the v0.4.2 milestone Sep 12, 2024
@dstadulis dstadulis added the gRPC label Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: ✅ Done
Development

Successfully merging a pull request may close this issue.

8 participants