Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug]: Tapd crashed during minting some unexpected non-final batches #1125

Open
btcwer opened this issue Sep 17, 2024 · 4 comments
Open

[bug]: Tapd crashed during minting some unexpected non-final batches #1125

btcwer opened this issue Sep 17, 2024 · 4 comments
Assignees
Labels
bug Something isn't working needs triage
Milestone

Comments

@btcwer
Copy link

btcwer commented Sep 17, 2024

Background

Tapd crashed quickly when starting up in bitcoin testnet enviroment. It can be seen from below logs that this is from an nil pointer reference.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x12de1fd]

Your environment

Tapd version: 0.4.1-alpha
LND version: 0.18.2-beta
BTC Core version: v27.0.0, with full node on testnet3
OS: Ubuntu 20.04.6 LTS

Steps to reproduce

Tapd's logs as followed:

./tapd --network=testnet --debuglevel=trace --lnd.host=127.0.0.1:10009 --lnd.macaroonpath=/home/bittap/.lnd/data/chain/bitcoin/testnet/admin.macaroon --lnd.tlspath=/home/bittap/.lnd/tls.cert --databasebackend=postgres --postgres.host=127.0.0.1 --postgres.port=5432 --postgres.user=postgres --postgres.password=Abc666666 --postgres.dbname=tapd
2024-09-13 13:57:01.552 [WRN] CONF: open /home/bittap/.tapd/tapd.conf: no such file or directory
2024-09-13 13:57:01.552 [INF] CONF: Attempting to establish connection to lnd...
2024-09-13 13:57:01.558 [INF] CONF: lnd connection initialized
2024-09-13 13:57:01.558 [INF] CONF: Opening postgres database at: postgres://postgres:****@127.0.0.1:5432/tapd?sslmode=disable
2024-09-13 13:57:01.558 [INF] TADB: Using SQL database 'postgres://postgres:****@127.0.0.1:5432/tapd?sslmode=disable'
2024-09-13 13:57:01.564 [INF] TADB: Attempting to apply migration(s) (current_db_version=21, latest_migration_version=21)
2024-09-13 13:57:01.564 [INF] TADB: Database version after migration: 21
2024-09-13 13:57:01.564 [INF] CONF: Configuring testnet.universe.lightning.finance:10029 as initial Universe federation server
2024-09-13 13:57:01.565 [INF] TSVR: Version: 0.4.1-alpha commit=, build=production, logging=default, debuglevel=trace
2024-09-13 13:57:01.565 [INF] TSVR: Active network: testnet3
2024-09-13 13:57:01.565 [INF] RPCS: Validating RPC requests based on macaroon at: /home/bittap/.tapd/data/testnet/admin.macaroon
2024-09-13 13:57:01.568 [INF] GRDN: Starting ChainPlanter
2024-09-13 13:57:01.577 [INF] TSVR: Shutdown complete

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x12de1fd]

goroutine 1 [running]:
github.com/lightninglabs/taproot-assets/commitment.(*TapCommitment).CommittedAssets(0x0?)
        /home/bittap/bittap/source/taproot-assets-0.4.1/commitment/tap.go:529 +0x1d
github.com/lightninglabs/taproot-assets/tapdb.marshalMintingBatch({0x1f13dc0, 0xc00026dcc0}, {0x1f38668, 0xc0003898f0}, {0x4, 0x3, {0xc00013ac80, 0x278, 0x280}, {0x1, ...}, ...})
        /home/bittap/bittap/source/taproot-assets-0.4.1/tapdb/asset_minting.go:1211 +0x5e5
github.com/lightninglabs/taproot-assets/tapdb.(*AssetMintingStore).FetchNonFinalBatches.func1.1({0x4, 0x3, {0xc00013ac80, 0x278, 0x280}, {0x1, 0x1}, {0x3, 0x1}, 0x2bd796, ...})
        /home/bittap/bittap/source/taproot-assets-0.4.1/tapdb/asset_minting.go:1017 +0x5b
github.com/lightninglabs/taproot-assets/fn.MapErr[...]({0xc00018c280?, 0x2, 0xc000380005}, 0xc00046d408?)
        /home/bittap/bittap/source/taproot-assets-0.4.1/fn/func.go:83 +0xf8
github.com/lightninglabs/taproot-assets/tapdb.(*AssetMintingStore).FetchNonFinalBatches.func1({0x1f38668, 0xc0003898f0})
        /home/bittap/bittap/source/taproot-assets-0.4.1/tapdb/asset_minting.go:1020 +0x137
github.com/lightninglabs/taproot-assets/tapdb.(*TransactionExecutor[...]).ExecTx(0x1efe060, {0x1f13dc0, 0xc00026dcc0}, {0x1f016a0, 0xc00015643b}, 0xc0002a5ca0)
        /home/bittap/bittap/source/taproot-assets-0.4.1/tapdb/interfaces.go:241 +0x1c2
github.com/lightninglabs/taproot-assets/tapdb.(*AssetMintingStore).FetchNonFinalBatches(0xc0005ce700, {0x1f13dc0, 0xc00026dcc0})
        /home/bittap/bittap/source/taproot-assets-0.4.1/tapdb/asset_minting.go:1005 +0xc5
github.com/lightninglabs/taproot-assets/tapgarden.(*ChainPlanter).Start.func1()
        /home/bittap/bittap/source/taproot-assets-0.4.1/tapgarden/planter.go:332 +0xc5
sync.(*Once).doSlow(0x0?, 0xc000136ba0?)
        /usr/local/go/src/sync/once.go:74 +0xc2
sync.(*Once).Do(...)
        /usr/local/go/src/sync/once.go:65
github.com/lightninglabs/taproot-assets/tapgarden.(*ChainPlanter).Start(0xc0005c39d0?)
        /home/bittap/bittap/source/taproot-assets-0.4.1/tapgarden/planter.go:318 +0x50
github.com/lightninglabs/taproot-assets.(*Server).initialize(0xc00026dbd0, 0xc000136a80)
        /home/bittap/bittap/source/taproot-assets-0.4.1/server.go:172 +0x8b2
github.com/lightninglabs/taproot-assets.(*Server).RunUntilShutdown(0xc00026dbd0, 0xc0001360c0)
        /home/bittap/bittap/source/taproot-assets-0.4.1/server.go:325 +0x551
main.main()
        /home/bittap/bittap/source/taproot-assets-0.4.1/cmd/tapd/main.go:78 +0x5be

Cause

For some reason there were unfinished minting batches in db. The Tapd was running for a month properly, but crash during a restart recently. Don't know why this can't be handled properly but a crash.
屏幕截图 2024-09-13 224036

Solution

The func marshalMintingBatch() in tapdb/assets_store.go, should has a validation on assetRoot like below:

		assetRoot := batch.RootAssetCommitment
		if assetRoot != nil {
                   assetsInBatch := assetRoot.CommittedAssets()  //crash here if without a validation
                ...
               }
@btcwer btcwer added bug Something isn't working needs triage labels Sep 17, 2024
@jharveyb
Copy link
Collaborator

Thanks for the detailed issue!

I initially thought this could be an issue with the on-disk state of one of those Broadcast batches, but I think it's simpler. A Broadcast batch must have a root Commitment, but the err here is unchecked due to reuse of the err variable (which the linter will not flag as unchecked rn).

Could you try out this branch with that DB? Or apply the patch in some other way if you prefer.

https://github.com/lightninglabs/taproot-assets/tree/batch_marshal_fixes

@dstadulis
Copy link
Collaborator

@btcwer thank you for such a well-written review

@btcwer
Copy link
Author

btcwer commented Sep 18, 2024

Thanks for the detailed issue!

I initially thought this could be an issue with the on-disk state of one of those Broadcast batches, but I think it's simpler. A Broadcast batch must have a root Commitment, but the err here is unchecked due to reuse of the err variable (which the linter will not flag as unchecked rn).

Could you try out this branch with that DB? Or apply the patch in some other way if you prefer.

https://github.com/lightninglabs/taproot-assets/tree/batch_marshal_fixes

Applied this patch but it didn't work. When marshalMintingBatch() failed, the daemon quitted too.

./tapd --network=testnet --debuglevel=trace --lnd.host=127.0.0.1:10009 --lnd.macaroonpath=/home/bittap/.lnd/data/chain/bitcoin/testnet/admin.macaroon --lnd.tlspath=/home/bittap/.lnd/tls.cert --databasebackend=postgres --postgres.host=127.0.0.1 --postgres.port=5432 --postgres.user=postgres --postgres.password=Abc666666 --postgres.dbname=tapd
2024-09-18 01:56:38.442 [WRN] CONF: open /home/bittap/.tapd/tapd.conf: no such file or directory
2024-09-18 01:56:38.442 [INF] CONF: Attempting to establish connection to lnd...
2024-09-18 01:56:38.448 [INF] CONF: lnd connection initialized
2024-09-18 01:56:38.448 [INF] CONF: Opening postgres database at: postgres://postgres:****@127.0.0.1:5432/tapd?sslmode=disable
2024-09-18 01:56:38.448 [INF] TADB: Using SQL database 'postgres://postgres:****@127.0.0.1:5432/tapd?sslmode=disable'
2024-09-18 01:56:38.454 [INF] TADB: Attempting to apply migration(s) (current_db_version=21, latest_migration_version=21)
2024-09-18 01:56:38.454 [INF] TADB: Database version after migration: 21
2024-09-18 01:56:38.456 [INF] CONF: Configuring testnet.universe.lightning.finance:10029 as initial Universe federation server
2024-09-18 01:56:38.456 [INF] TSVR: Version: 0.4.1-alpha commit=, build=production, logging=default, debuglevel=trace
2024-09-18 01:56:38.456 [INF] TSVR: Active network: testnet3
2024-09-18 01:56:38.456 [INF] RPCS: Validating RPC requests based on macaroon at: /home/bittap/.tapd/data/testnet/admin.macaroon
2024-09-18 01:56:38.460 [INF] GRDN: Starting ChainPlanter
2024-09-18 01:56:38.469 [ERR] TSVR: Shutting down because error in main method: unable to initialize RPC server: unable to start asset minter: unable to parse batch: invalid commitment to asset sprouts: batch 02231d758f0132fb3d25e950534996a2c0449a1cf9452275d3aa2d7663792c5ce3
2024-09-18 01:56:38.469 [INF] TSVR: Shutdown complete

unable to initialize RPC server: unable to start asset minter: unable to parse batch: invalid commitment to asset sprouts: batch 02231d758f0132fb3d25e950534996a2c0449a1cf9452275d3aa2d7663792c5ce3

@jharveyb
Copy link
Collaborator

Yes, I expected a quit to still happen; just wanted to make sure the panic was now prevented.

Can you inspect the DB entries for that batch? A first step would be to see if a TX was made, and if so if it ever got confirmed.

Alsoc, could you provide more details about that batch, such as if you created it with an older version, maybe it had a large # of assets, etc.?

Perhaps we should adjust the minter to skip batches that were stored in a bad state, but I'd prefer finding out how we got there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage
Projects
Status: 👀 In review
Development

No branches or pull requests

3 participants