Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PBSS + v1.3.6/v1.3.7 OOM, please keep v1.3.5 version #132

Closed
du5 opened this issue Jan 3, 2024 · 14 comments
Closed

PBSS + v1.3.6/v1.3.7 OOM, please keep v1.3.5 version #132

du5 opened this issue Jan 3, 2024 · 14 comments

Comments

@du5
Copy link
Contributor

du5 commented Jan 3, 2024

PBSS + v1.3.6/v1.3.7 OOM, please keep v1.3.5 version

Due to the update of bnb-chain/bsc#2155 code, v1.3.8 cannot start this snapshot, please wait for the new snapshot to be released

@du5
Copy link
Contributor Author

du5 commented Jan 3, 2024

If you capture relevant metrics information, please collect it and feedback it to the BSC team

https://github.com/bnb-chain/bsc/issues/new

@zzzckck
Copy link

zzzckck commented Jan 3, 2024

AFIK, v1.3.7 does not have any memory related changes, no idea why v1.3.7 has the OOM issue while v1.3.6 not.

@du5
Copy link
Contributor Author

du5 commented Jan 3, 2024

According to feedback from a community user, he experienced an OOM after running the pbss node for two days and could no longer start it. The same problem also occurred after I upgraded to 1.3.7, but the running time was different. It seemed that there was no specific pattern. Positioning The problem may be more troublesome

@du5
Copy link
Contributor Author

du5 commented Jan 3, 2024

This is the stdout information of restart after oom

root@snap-helper /opt # ./pbss.sh
INFO [01-02|15:39:12.848] Starting Geth on BSC mainnet...
INFO [01-02|15:39:12.848] Bumping default cache on mainnet         provided=1024 updated=4096
INFO [01-02|15:39:12.849] Maximum peer count                       ETH=256 LES=0 total=256
INFO [01-02|15:39:12.850] Using pebble as db engine
INFO [01-02|15:39:12.925] Using pebble as the backing database
INFO [01-02|15:39:12.925] Allocated cache and file handles         database=/opt/geth.pbss/geth/chaindata cache=1.60GiB handles=524,288 "memory table"=409.50MiB
INFO [01-02|15:39:13.073] Found legacy ancient chain path          location=/opt/geth.pbss/geth/chaindata/ancient
INFO [01-02|15:39:13.076] Opened ancient database                  database=/opt/geth.pbss/geth/chaindata/ancient readonly=false frozen=34,629,314
INFO [01-02|15:39:13.078] All are provided, state scheme set to already existing scheme=path
INFO [01-02|15:39:13.084] Set global gas cap                       cap=50,000,000
INFO [01-02|15:39:13.084] Initializing the KZG library             backend=gokzg
INFO [01-02|15:39:13.141] Capped dirty cache size                  provided=1024.00MiB adjusted=256.00MiB
INFO [01-02|15:39:13.141] Clean cache size                         provided=614.00MiB
INFO [01-02|15:39:13.142] Allocated trie memory caches             clean=614.00MiB dirty=256.00MiB
INFO [01-02|15:39:13.160] Using pebble as the backing database
INFO [01-02|15:39:13.160] Allocated cache and file handles         database=/opt/geth.pbss/geth/chaindata         cache=1.60GiB handles=524,288 "memory table"=409.50MiB
INFO [01-02|15:39:13.281] Found legacy ancient chain path          location=/opt/geth.pbss/geth/chaindata/ancient
INFO [01-02|15:39:13.282] Read ancientdb item counts               items=0
INFO [01-02|15:39:13.283] Opened ancientdb with nodata mode        database=/opt/geth.pbss/geth/chaindata/ancient frozen=34,629,314
INFO [01-02|15:39:13.285] Parlia                                   chainConfig="{ChainID: 56 Homestead: 0 DAO: <nil> DAOSupport: false EIP150: 0 EIP155: 0 EIP158: 0 Byzantium: 0 Constantinople: 0 Petersburg: 0 Istanbul: 0, Muir Glacier: 0, Ramanujan: 0, Niels: 0, MirrorSync: 5184000, Bruno: 13082000, Berlin: 31302048, YOLO v3: <nil>, CatalystBlock: <nil>, London: 31302048, ArrowGlacier: <nil>, MergeFork:<nil>, Euler: 18907621, Gibbs: 23846001, Nano: 21962149, Moran: 22107423, Planck: 27281024,Luban: 29020050, Plato: 30720096, Hertz: 31302048, Hertzfix: 34140700, ShanghaiTime: 1705996800, KeplerTime: 1705996800, Engine: parlia}"
INFO [01-02|15:39:13.481] Initialising Ethereum protocol           network=56 dbversion=8
INFO [01-02|15:39:14.253] new async node buffer                    limit=256.00MiB layers=74
WARN [01-02|15:39:15.329] Path-based state scheme is an experimental feature sync=false
INFO [01-02|15:39:15.509] Initialised chain configuration          config="{ChainID: 56 Homestead: 0 DAO: <nil> DAOSupport: false EIP150: 0 EIP155: 0 EIP158: 0 Byzantium: 0 Constantinople: 0 Petersburg: 0 Istanbul: 0, Muir Glacier: 0, Ramanujan: 0, Niels: 0, MirrorSync: 5184000, Bruno: 13082000, Berlin: 31302048, YOLO v3: <nil>, CatalystBlock: <nil>, London: 31302048, ArrowGlacier: <nil>, MergeFork:<nil>, Euler: 18907621, Gibbs: 23846001, Nano: 21962149, Moran: 22107423, Planck: 27281024,Luban: 29020050, Plato: 30720096, Hertz: 31302048, Hertzfix: 34140700, ShanghaiTime: 1705996800, KeplerTime: 1705996800, Engine: parlia}"
INFO [01-02|15:39:16.205] Loaded most recent local block           number=34,719,314 hash=69a3b8..11f2bd root=5257e5..b31cd1 td=68,977,079 age=5d19h40m
INFO [01-02|15:39:16.283] Loaded most recent local finalized block number=34,719,312 hash=9eb38f..84e2a5 root=74bf91..38355e td=68,977,075 age=5d19h40m
INFO [01-02|15:39:16.363] Loaded last snap-sync pivot marker       number=34,580,824
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0xcd121c]

goroutine 1 [running]:
github.com/ethereum/go-ethereum/core/rawdb.(*ResettableFreezer).AncientRange(0xcfee40?, {0x28e40c2?, 0xc03b2298d8?}, 0xc03b2299a8?, 0x248caa0?, 0xc03efd6ed0?)
        /opt/bsc/core/rawdb/freezer_resettable.go:126 +0x5c
github.com/ethereum/go-ethereum/core/rawdb.ReadStateHistoryMetaList(...)
        /opt/bsc/core/rawdb/accessors_state.go:180
github.com/ethereum/go-ethereum/trie/triedb/pathdb.checkHistories(0x0, 0x1b8872ccbeaa9682?, 0xc591320e457d591f?, 0xc03349d750)
        /opt/bsc/trie/triedb/pathdb/history.go:548 +0x85
github.com/ethereum/go-ethereum/trie/triedb/pathdb.(*Database).Recoverable(0xc002910050, {0x3, 0x77, 0xc4, 0x5, 0xd2, 0xe5, 0x36, 0x52, 0x82, ...})
        /opt/bsc/trie/triedb/pathdb/database.go:363 +0x205
github.com/ethereum/go-ethereum/trie.(*Database).Recoverable(0x7faf11c48890?, {0x3, 0x77, 0xc4, 0x5, 0xd2, 0xe5, 0x36, 0x52, 0x82, ...})
        /opt/bsc/trie/database.go:320 +0x45
github.com/ethereum/go-ethereum/core.NewBlockChain({0x3376ad8?, 0xc000126600}, 0x0?, 0x7ffffffe805afca8?, 0x0?, {0x33653c0?, 0xc0012b1100?}, {{0x0, 0x0}, 0x0, ...}, ...)
        /opt/bsc/core/blockchain.go:403 +0x14b0
github.com/ethereum/go-ethereum/eth.New(0xc0001aac40, 0xc00171f800)
        /opt/bsc/eth/backend.go:252 +0x170f
github.com/ethereum/go-ethereum/cmd/utils.RegisterEthService(0x0?, 0xc00171f800)
        /opt/bsc/cmd/utils/flags.go:2154 +0x167
main.makeFullNode(0xc00153fbf0?)
        /opt/bsc/cmd/geth/config.go:181 +0x255
main.geth(0xc001729b80)
        /opt/bsc/cmd/geth/main.go:341 +0xf3
github.com/urfave/cli/v2.(*Command).Run(0xc0017ffb80, 0xc001729b80, {0xc0001aa000, 0xe, 0xe})
        /home/runner/go/pkg/mod/github.com/urfave/cli/[email protected]/command.go:274 +0x9eb
github.com/urfave/cli/v2.(*App).RunContext(0xc0006ab2c0, {0x334ef10?, 0xc0001ac000}, {0xc0001aa000, 0xe, 0xe})
        /home/runner/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:332 +0x616
github.com/urfave/cli/v2.(*App).Run(...)
        /home/runner/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:309
main.main()
        /opt/bsc/cmd/geth/main.go:284 +0x47

root@snap-helper /opt # cat pbss.sh 
geth --datadir=geth.pbss --history.transactions=0 --tries-verify-mode=local --db.engine=pebble --maxpeers=256 --syncmode=full --ipcpath=/opt/ipc.ipc --port=30311 --discovery.port=30311 --disablesnapprotocol=true --pruneancient=true --config=config.toml --state.scheme=path
root@snap-helper /opt # cat config.toml 
[Eth]
NetworkId = 56
LightPeers = 100
TrieTimeout = 150000000000
StateScheme = "path"

[Eth.Miner]
GasCeil = 140000000
GasPrice = 3000000000
Recommit = 10000000000

[Eth.TxPool]
Locals = []
NoLocals = true
Journal = "transactions.rlp"
Rejournal = 3600000000000
PriceLimit = 3000000000
PriceBump = 10
AccountSlots = 200
GlobalSlots = 8000
AccountQueue = 200
GlobalQueue = 4000

[Eth.GPO]
Blocks = 20
Percentile = 60
OracleThreshold = 1000

[Node]
IPCPath = "geth.ipc"
HTTPHost = "localhost"
InsecureUnlockAllowed = false
HTTPPort = 8545
HTTPVirtualHosts = ["localhost"]
HTTPModules = ["eth", "net", "web3", "txpool", "parlia"]
WSPort = 8546
WSModules = ["net", "web3", "eth"]

[Node.P2P]
MaxPeers = 200
NoDiscovery = false
ListenAddr = ":30311"
EnableMsgEvents = false

@du5
Copy link
Contributor Author

du5 commented Jan 9, 2024

Other users have reported that version 1.3.6 has the same problem and they may need to use version 1.3.5.

bnb-chain/bsc#2141

@du5 du5 changed the title PBSS + V1.3.7 OOM, please keep v1.3.6 version PBSS + V1.3.7 OOM, please keep v1.3.5 version Jan 9, 2024
@sysvm
Copy link

sysvm commented Jan 11, 2024

@du5 which snapshot do you use? Do you do any operations before panic happens, such as restart geth?

@du5
Copy link
Contributor Author

du5 commented Jan 13, 2024

@du5 which snapshot do you use? Do you do any operations before panic happens, such as restart geth?

The version I found the problem with is https://snapshots.48.club/geth.pbss.34712063.tar.zst, geth was launched without any warning, without any Panic error, it ended directly, based on these symptoms I identified it as "oom".

When "oom" occurs and you start again, there will be a panic log.

What I did after that was to re-unzip and synchronize using v1.3.6, which worked for me, and it has been working normally until now. But I observed that v1.3.6 also had users reporting this problem, and it was solved in v1.3.5. This problem will occur with snapshots built using BSCTeam or 48Club. It seems that the snapshot itself is not damaged.

btw, the geth.pbss.34712063.tar.zst snapshot has been deleted, but the latest snapshot is obtained after synchronizing this snapshot.

@du5 du5 changed the title PBSS + V1.3.7 OOM, please keep v1.3.5 version PBSS + v1.3.6/v1.3.7 OOM, please keep v1.3.5 version Jan 22, 2024
@du5 du5 closed this as completed Jan 22, 2024
@xux1217
Copy link

xux1217 commented Jan 23, 2024

I find this issue has completed, so what's the version we should use with PBSS? The README still display we need use v1.3.5

@zzzckck
Copy link

zzzckck commented Jan 24, 2024

the latest v1.3.7 is ok to run PBSS, but may have some issue with some snapshot provided by 48Club, due to the --pruneancient compatible issue.
Meanwhile, there will be another release v1.3.8, likely next week. You'd better to try v1.3.8 once it is ready

@du5
Copy link
Contributor Author

du5 commented Jan 24, 2024

I find this issue has completed, so what's the version we should use with PBSS? The README still display we need use v1.3.5

Due to a series of problems caused by pruneancient, we have decided not to use this tag in the future. There are many problems with the future of bsc-geth. Turning this tag on in version 1.3.x will not prune the database, and the database size continues to grow.

I have multiple nodes where pruneancient is also turned on. The minimum database size is 1.1tb and the maximum is 1.9tb. I think there is a problem with the pruneancient function logic and it is not a problem with the snapshot.

Regarding the conflict between pbss and pruneancient, I still recommend using version v1.3.5

@xux1217
Copy link

xux1217 commented Jan 25, 2024

I use v1.3.5 version bsc-geth and this snapshot "https://snapshots.48.club/geth.pbss.35485953.tar.zst", still oom.

and restart the process report "panic: runtime error: invalid memory address or nil pointer dereference":


goroutine 1 [running]: github.com/ethereum/go-ethereum/core/rawdb.(*ResettableFreezer).AncientRange(0xcfe340?, {0x28e1341?, 0xc017550b68?}, 0xc017550c68?, 0x248a500?, 0xc01872a420?) /home/runner/work/bsc/bsc/core/rawdb/freezer_resettable.go:125 +0x5c github.com/ethereum/go-ethereum/core/rawdb.ReadStateHistoryMetaList(...) /home/runner/work/bsc/bsc/core/rawdb/accessors_state.go:180 github.com/ethereum/go-ethereum/trie/triedb/pathdb.checkHistories(0x0, 0x13206fae5cdc8042?, 0xbd42451522faaccd?, 0xc01349d750) /home/runner/work/bsc/bsc/trie/triedb/pathdb/history.go:548 +0x85 github.com/ethereum/go-ethereum/trie/triedb/pathdb.(*Database).Recoverable(0xc0113eb450, {0xa3, 0x1a, 0x76, 0xb8, 0x13, 0xe6, 0x1d, 0x22, 0x42, ...}) /home/runner/work/bsc/bsc/trie/triedb/pathdb/database.go:363 +0x205 github.com/ethereum/go-ethereum/trie.(*Database).Recoverable(0x7faef645daa8?, {0xa3, 0x1a, 0x76, 0xb8, 0x13, 0xe6, 0x1d, 0x22, 0x42, ...}) /home/runner/work/bsc/bsc/trie/database.go:320 +0x45 github.com/ethereum/go-ethereum/core.NewBlockChain({0x33a14d8?, 0xc0134763c0}, 0x0?, 0x0?, 0x0?, {0x338fdc0?, 0xc00127f100?}, {{0x0, 0x0}, 0x0, ...}, ...) /home/runner/work/bsc/bsc/core/blockchain.go:403 +0x14b0 github.com/ethereum/go-ethereum/eth.New(0xc0010520e0, 0xc0014b1000) /home/runner/work/bsc/bsc/eth/backend.go:252 +0x170f github.com/ethereum/go-ethereum/cmd/utils.RegisterEthService(0x0?, 0xc0014b1000) /home/runner/work/bsc/bsc/cmd/utils/flags.go:2156 +0x167 main.makeFullNode(0xc001c3fbf0?) /home/runner/work/bsc/bsc/cmd/geth/config.go:175 +0x255 main.geth(0xc001a21340) /home/runner/work/bsc/bsc/cmd/geth/main.go:341 +0xf3 github.com/urfave/cli/v2.(*Command).Run(0xc001aac000, 0xc001a21340, {0xc000134000, 0x12, 0x12}) /home/runner/go/pkg/mod/github.com/urfave/cli/[email protected]/command.go:274 +0x9eb github.com/urfave/cli/v2.(*App).RunContext(0xc0013a0f00, {0x3379910?, 0xc000130010}, {0xc000134000, 0x12, 0x12}) /home/runner/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:332 +0x616 github.com/urfave/cli/v2.(*App).Run(...) /home/runner/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:309 main.main() /home/runner/work/bsc/bsc/cmd/geth/main.go:284 +0x47

so I think we should not use the snapshot with pbss flag.

@du5
Copy link
Contributor Author

du5 commented Jan 25, 2024

@xux1217 If the database is damaged, downgrading cannot repair it at this time. You need to download it again. Downgrading must be done before the database is damaged.

@xux1217
Copy link

xux1217 commented Jan 25, 2024

I am sure that I first download the snapshot, and then just use the v1.3.5 bsc-geth to start, not downgrade action.

my start cmd: ./geth --config ./config.toml --datadir /data/geth.full --syncmode=full --db.engine=pebble --cache 8000 --rpc.allow-unprotected-txs --history.transactions=0 --tries-verify-mode=local --diffblock=5000 --http --http.corsdomain=* --http.vhosts=* --pruneancient --state.scheme path

and the config.yaml:

[Eth]
NetworkId = 56
LightPeers = 100
TrieTimeout = 150000000000
StateScheme = "path"

[Eth.Miner]
GasCeil = 140000000
GasPrice = 3000000000
Recommit = 10000000000

[Eth.TxPool]
Locals = []
NoLocals = true
Journal = "transactions.rlp"
Rejournal = 3600000000000
PriceLimit = 3000000000
PriceBump = 10
AccountSlots = 200
GlobalSlots = 8000
AccountQueue = 200
GlobalQueue = 4000

[Eth.GPO]
Blocks = 20
Percentile = 60
OracleThreshold = 1000

[Node]
IPCPath = "geth.ipc"
HTTPHost = "0.0.0.0"
InsecureUnlockAllowed = false
HTTPPort = 8545
HTTPVirtualHosts = ["*"]
HTTPModules = ["eth", "net", "web3", "txpool", "parlia","debug"]
WSPort = 8546
WSModules = ["net", "web3", "eth"]

[Node.P2P]
MaxPeers = 200
NoDiscovery = false
StaticNodes = []
ListenAddr = ":30311"
EnableMsgEvents = false

[Node.LogConfig]
FilePath = "bsc.log"
MaxBytesSize = 10485760
Level = "info"
FileRoot = ""

@xux1217 If the database is damaged, downgrading cannot repair it at this time. You need to download it again. Downgrading must be done before the database is damaged.

@SECTOR-1
Copy link

i have the same issue, fresh download with 1.3.5
Jan 25 20:43:18 orangepi5 bash[194799]: panic: runtime error: invalid memory address or nil pointer dereference
Jan 25 20:43:18 orangepi5 bash[194799]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xbc4664]
Jan 25 20:43:18 orangepi5 bash[194799]: goroutine 1 [running]:
Jan 25 20:43:18 orangepi5 bash[194799]: github.com/ethereum/go-ethereum/core/rawdb.(*ResettableFreezer).AncientRange(0x400052c230?, {0x25a47b7?, 0x4011d54d00?}, 0x40001031e8?, 0x40001032e8?, 0x214da80?)
Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/work/bsc/bsc/core/rawdb/freezer_resettable.go:125 +0x34
Jan 25 20:43:18 orangepi5 bash[194799]: github.com/ethereum/go-ethereum/core/rawdb.ReadStateHistoryMetaList(...)
Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/work/bsc/bsc/core/rawdb/accessors_state.go:180
Jan 25 20:43:18 orangepi5 bash[194799]: github.com/ethereum/go-ethereum/trie/triedb/pathdb.checkHistories(0x0, 0xdfacbb468a60fb71?, 0xc9c7e760d93b58c2?, 0x40144b3728)
Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/work/bsc/bsc/trie/triedb/pathdb/history.go:548 +0x70
Jan 25 20:43:18 orangepi5 bash[194799]: github.com/ethereum/go-ethereum/trie/triedb/pathdb.(*Database).Recoverable(0x4011976140, {0xf2, 0x63, 0x8c, 0x88, 0x1b, 0x4f, 0xe9, 0x8a, 0x71, ...})
Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/work/bsc/bsc/trie/triedb/pathdb/database.go:363 +0x17c
Jan 25 20:43:18 orangepi5 bash[194799]: github.com/ethereum/go-ethereum/trie.(*Database).Recoverable(0x7f57c525e8?, {0xf2, 0x63, 0x8c, 0x88, 0x1b, 0x4f, 0xe9, 0x8a, 0x71, ...})
Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/work/bsc/bsc/trie/database.go:320 +0x44
Jan 25 20:43:18 orangepi5 bash[194799]: github.com/ethereum/go-ethereum/core.NewBlockChain({0x3065038?, 0x40016fa7c8}, 0x4?, 0x0?, 0x0?, {0x3053c20?, 0x4000410700?}, {{0x0, 0x0}, 0x0, ...}, ...)
Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/work/bsc/bsc/core/blockchain.go:403 +0x1174
Jan 25 20:43:18 orangepi5 bash[194799]: github.com/ethereum/go-ethereum/eth.New(0x400019e8c0, 0x40013d0000)
Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/work/bsc/bsc/eth/backend.go:252 +0x1234
Jan 25 20:43:18 orangepi5 bash[194799]: github.com/ethereum/go-ethereum/cmd/utils.RegisterEthService(0x0?, 0x40013d0000)
Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/work/bsc/bsc/cmd/utils/flags.go:2156 +0x120
Jan 25 20:43:18 orangepi5 bash[194799]: main.makeFullNode(0x4001a3fbf8?)
Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/work/bsc/bsc/cmd/geth/config.go:175 +0x208
Jan 25 20:43:18 orangepi5 bash[194799]: main.geth(0x40016a35c0)
Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/work/bsc/bsc/cmd/geth/main.go:341 +0xbc
Jan 25 20:43:18 orangepi5 bash[194799]: github.com/urfave/cli/v2.(*Command).Run(0x400077e160, 0x40016a35c0, {0x40001a6000, 0x15, 0x16})
Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/go/pkg/mod/github.com/urfave/cli/[email protected]/command.go:274 +0x73c
Jan 25 20:43:18 orangepi5 bash[194799]: github.com/urfave/cli/v2.(*App).RunContext(0x40013ca000, {0x303d7c0?, 0x40001a0020}, {0x40001a6000, 0x15, 0x16})
Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:332 +0x568
Jan 25 20:43:18 orangepi5 bash[194799]: github.com/urfave/cli/v2.(*App).Run(...)
Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:309

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants