Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Chia Harvester stopped sending Partials to pool #18804

Open
chankitsing opened this issue Nov 1, 2024 · 9 comments
Open

[Bug] Chia Harvester stopped sending Partials to pool #18804

chankitsing opened this issue Nov 1, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@chankitsing
Copy link

What happened?

I am running
OS Name Microsoft Windows Server 2022 Standard
Processor Intel(R) Xeon(R) CPU E5-2695 v2 @ 2.40GHz, 2401 Mhz, 12 Core(s), 24 Logical Processor(s)
Total Physical Memory 448 GB
Installed Physical Memory (RAM) 448 GB

I encounter my harvester periodically stop sending partial to pool (please refer to the attachment as screenshot: harvester hang)harvester hang This screenshot may not reflect to the actual time and Date but the screenshot show how this happened.
each time it stopped sending partial i have to restart the Chia UI to make it work again. thus i closely monitor the harvester, by turning on the log to "INFO" and run the Chia using CLI from window powershell.

i amend a scripts that restarting the harvester every hours but it does not stopped the harvester stopped sending partial to the pool after certain amount of time.
thus, i dive further using the log file i found the chia timelord keep repetitive processing "Warning" alomst every second
WARNING Will not infuse XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX because its reward chain challenge is not in the chain (repeat multiple time with different address)
(please refer to the attachment, from the attachment, the harvester stopped sending partial to pool at 15:52, i restarted the harvester on 16:38, i restarted the chia time lord using CLI at 16:53:17)
Google Drive : https://drive.google.com/file/d/1v20QkhlkyutXTwIyRq7ZtxAnDM_S8djR/view?usp=sharing

slowly the time lord repetitively processing WARNING Will not infuse XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX because its reward chain challenge is not in the chain
by increamentally,

Perhaps this is sometime the dev team need to look into, this issue have been exists for many months, only recent i have time to dive in.

Version

2.4.4

What platform are you using?

Windows

What ui mode are you using?

GUI

Relevant log output

No response

@chankitsing chankitsing added the bug Something isn't working label Nov 1, 2024
@Chia-Network Chia-Network deleted a comment Nov 4, 2024
@BrandtH22
Copy link
Contributor

Hey @chankitsing , that other commenter provided a spam link. Do not proceed with any troubleshooting via the link and information they provided.

All support provided by a CNI team member will occur here in Github or on our official discord server: https://discord.gg/chia

I have relayed this ticket and its content to our engineering team for their insights but at first glance:

  • The will not infuse message is separate and not related to the issue with sending partials to the pool

What script are you using to start chia and restart the harvester?? (the issues with sending partials could be that the harvester is not properly connecting to the chia services when restarted, providing the script you are using can help identify if that is the cause)

@BrandtH22
Copy link
Contributor

Hey @chankitsing , followup question: are you intending to run a timelord? (it is not needed for standard farming)

@chankitsing
Copy link
Author

chankitsing commented Nov 5, 2024

I block the Spam, @BrandtH22

@BrandtH22 , this issue is not because the harvester restarting and not connecting to services. It happened when farming with chia gui window os. After several hours while the chia gui still running, but harvester stop sending partial to the pool. To fix it, need to restart the chia gui to make chia gui start farming again.

To trouble shooting, this is the script I use when I started the chia services via cli and from shell scripts I run the script below.

=========== Script that I am using=============
$countrestart = 0
$DST1 = Get-Date
$Timelord = 0

while ($true) {
$duration = 60;
$countrestart++;
cd C:\Users\Administrator\AppData\Local\Programs\Chia\resources\app.asar.unpacked\daemon
#.\chia.exe harvestor -r;
.\chia.exe stop harvester;
Write-Host ('restarting Harvester...');
start-sleep -Seconds 30;
.\chia.exe start harvester;
Write-Host ('count of restarting harvester', $countrestart);

If ($Timelord -eq 0)
{.\chia.exe stop timelord;
Write-Host ('restarting Timelord...');
start-sleep -Seconds 30;
.\chia.exe start timelord;
$Timelord = 4 }

Write-Host ('Hours to restart Timelord', $Timelord);
$Timelord--

while ($duration -gt 0){

$DST = Get-Date

# Decrease the duration
$duration--

# Display the countdown
Write-Host ('Countdown until restart harvester', $duration, 'minutes. Count down until restart Timelord', $Timelord,'hours and ', $duration, 'at', $DST, '. Started since', $DST1) 

# Wait for 1 hour

Start-Sleep -Seconds 60
if (Get-Process -Name Start_harvester -ErrorAction SilentlyContinue) {
write-host ('Chia Harvester are running', $DST ) }
else {
 .\chia.exe start harvester;

}    

}
}

========================
I have not amend any file for chia or time lord.

A little note to dev, I have been using chia gui (window) for farming, this issue has been bothering me quite some times.
Now I have been using the cli to start all chia services and the above scripts to restart chia services in several hours to make sure it keep alive for sending partial to the pools.

By restarting the harvester, the issue remain, harvester will stop sending partial to the pool after some times. But if you restart time lord it seem it's buy more time for the harvester before it's stop sending partial to the pool.

If I can remember correctly, this issue happened since 2.1.x(fingers cross) I have been trying every version release since then. Even a fresh install window server os. It's didn't stop the issues

Perhaps next I would just start chia farmers without timelord instead of running a full nodes

@BrandtH22
Copy link
Contributor

Hey @chankitsing , what cli command are you running to start the chia services?? The above is only showing the restart script.

I am thinking there are a few things occurring:

  • incorrect start command is being used leading to unneeded services running
  • the config file might need to be recreated and repopulated with custom settings if the config file has never been recreated (stop chia, rename the config.yaml, run chia init, then copy any custom settings from the old config to the new. I can provide more detailed instructions as needed.)
  • there might be multiple instances of the chia services running due to the start command and restart scripts. Let's start by reviewing the start command to verify

@chankitsing
Copy link
Author

Hi @BrandtH22

I am using this command to start the chia services via CLI (current running enviroment)
===========================script==============================
PS C:\Users\Administrator\AppData\Local\Programs\Chia\resources\app.asar.unpacked\daemon> .\chia.exe start all
chia_harvester: started
chia_timelord_launcher: started
chia_timelord: started
chia_farmer: started
chia_full_node: started
chia_wallet: started
chia_data_layer: started
chia_data_layer_http: started

=================================================================================

To verify the running process

==========================Script====================================================
PS C:\Users\Administrator\AppData\Local\Programs\Chia\resources\app.asar.unpacked\daemon> Get-Process -Name start_*

Handles NPM(K) PM(K) WS(K) CPU(s) Id SI ProcessName


270      48   159716      85288     156.50   1288   1 start_data_layer
191      29    58580      51540       1.17   1308   1 start_data_layer_http
244      52   167624      96808   2,421.42   9816   1 start_farmer
204      40   159268      86528     168.66   8752   1 start_full_node
351      85   713072     619740     339.45  10428   1 start_full_node
205      40   157860      84868      43.83  14860   1 start_full_node
431      64  1385840     222268     312.14  12920   1 start_harvester
238      50   158780      87488       7.23  11596   1 start_timelord
205      41   166428      93308   1,153.25   7912   1 start_wallet
285      55   246592     166348   1,365.36  10984   1 start_wallet

==================================================================================

@chankitsing
Copy link
Author

hi @BrandtH22

My time (6th Nov 2024 10:19AM)(GMT+8)
I am going to stop the shell script (stop the restarting of chia service)
Now, I am going to stop all the chia services using

PS C:\Users\Administrator\AppData\Local\Programs\Chia\resources\app.asar.unpacked\daemon> .\chia.exe stop all
chia_harvester: Stopped
chia_timelord_launcher: Not running
chia_timelord: Stopped
chia_farmer: Stopped
chia_full_node: Stopped
chia_wallet: Stopped
chia_data_layer: Stopped
chia_data_layer_http: Stopped
PS C:\Users\Administrator\AppData\Local\Programs\Chia\resources\app.asar.unpacked\daemon>

backup the config.yaml by rename as config - copy(2).yaml
backup all the log file to a new folder "backup"
to start the chia services

PS C:\Users\Administrator\AppData\Local\Programs\Chia\resources\app.asar.unpacked\daemon> Get-Process -Name start_*
PS C:\Users\Administrator\AppData\Local\Programs\Chia\resources\app.asar.unpacked\daemon> .\chia start all
can't find C:\Users\Administrator.chia\mainnet\config\config.yaml
** please run chia init to migrate or create new config files **
PS C:\Users\Administrator\AppData\Local\Programs\Chia\resources\app.asar.unpacked\daemon> .\chia init
Chia directory C:\Users\Administrator.chia\mainnet
Found private CA in C:\Users\Administrator.chia\mainnet, using it to generate TLS certificates
Setting the xch destination for the farmer reward (1/8 plus fees, solo and pooling) to xch__________________________
Setting the xch destination address for pool reward (7/8 for solo only) to xch________________________
To change the XCH destination addresses, edit the xch_target_address entries in C:\Users\Administrator.chia\mainnet\config\config.yaml.

To see your keys, run 'chia keys show --show-mnemonic-seed'
PS C:\Users\Administrator\AppData\Local\Programs\Chia\resources\app.asar.unpacked\daemon> .\chia start all
chia_harvester: started
chia_timelord_launcher: started
chia_timelord: started
chia_farmer: started
chia_full_node: started
chia_wallet: started
chia_data_layer: started
chia_data_layer_http: started
PS C:\Users\Administrator\AppData\Local\Programs\Chia\resources\app.asar.unpacked\daemon> Get-Process -Name start_*

Handles NPM(K) PM(K) WS(K) CPU(s) Id SI ProcessName


276      49   159868      85020       2.53   7188   1 start_data_layer
194      29    58864      51796       1.08   6264   1 start_data_layer_http
249      52   159328      84792       3.20  14416   1 start_farmer
206      40   157272      84012       4.56    780   1 start_full_node
206      40   158404      85016       5.67   3260   1 start_full_node
206      40   156820      83620       6.17   6820   1 start_full_node
207      40   157876      84612       5.25   8036   1 start_full_node
206      40   156872      83580       5.58  10856   1 start_full_node
206      40   157748      83992       5.98  11204   1 start_full_node
369      82   625624     531216      20.28  11664   1 start_full_node
206      40   157404      84028       5.34  14748   1 start_full_node
248      48   163036      87504       3.34  13680   1 start_harvester
243      50   158556      87032       2.58   5096   1 start_timelord
290      54   220224     142480       8.30   3060   1 start_wallet
207      41   163248      89360      50.02   5988   1 start_wallet

PS C:\Users\Administrator\AppData\Local\Programs\Chia\resources\app.asar.unpacked\daemon>

I have not started my restarting shell script, let wait for the issue to replicate, it may take hours or days to occur.
since i have time let's compare the 2 config,yaml file

The comparison between the two YAML files shows that several lines differ. Below are the key details:

Line Differences: Differences were found on numerous lines, where values for specific keys vary between the two files.

Extra Lines in File 1: The first file has additional lines not present in the second file. Here are some examples:

reset_sync_for_fingerprint: null
rpc_port: 9256
wallet_peers_file_path: wallet/db/wallet_peers.dat

Structural Differences: There are some differences in section orders and potential configurations that exist in one file but not the other, especially around network, SSL, and wallet peer configurations.

===============================================================================

@chankitsing
Copy link
Author

chankitsing commented Nov 6, 2024

@BrandtH22, there are problems after the changes above, the new config file does not contain the information for my pool, causing my Chia are not sending partial to the pool from 10:19AM to 11:17AM.
image
image

i found out this when i started the chia GUI.
the mainnet DB are sync,
Pool information are not there....
i need to revert back the config file,
and start the chia services via CLI again. (11:23AM)
resume partial sending to pool
image
image

@BrandtH22
Copy link
Contributor

Hey @chankitsing , we are seeing a couple issues:

  1. incorrect start command is starting unneeded services. To resolve please use the start command chia start farmer and do not use chia start all starting all is starting the unneeded datalayer and timelord services

  2. wallet resync needed for the pool information. In the GUI go to settings / advanced / resync , upon clicking resync the gui will close and when reopened the wallet will resync. As the wallet syncs the pool information will be added to the config file.

  3. harvester issues. If there is a bad disk then anytime the harvester attempts to read plots on that disk it can cause the harvester to stall or crash. Once we have completed the above then lets run a chia plots check and let us know if any errors or warnings are reported by the end of the check. Next step would be to set the harvester logs to debug and let it run until the harvester stalls then provide those logs (in the config you can add log_level: DEBUG to the logging field under harvester)

@chankitsing
Copy link
Author

Hi @BrandtH22

On it, Completed 1 and 2.
Checking on plots at the moment, will update until its complete.

11:20AM - log_level debug
Start the Chia GUI farming at the moment, the restart script is not running, and log_level has been set to debug.
will provide an update if the harvester stall, and not sending the partial to the pool.

Chia service running at the moment

Handles NPM(K) PM(K) WS(K) CPU(s) Id SI ProcessName


244      52   164976      93876      14.63  13584   1 start_farmer
344      80   645800     552684     162.58  14712   1 start_full_node
205      40   158512      85692      66.67  15328   1 start_full_node
447      65  1384992     223336     359.23   6040   1 start_harvester
273      50   175556     101096      10.08   1392   1 start_wallet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants