Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect work of examine() in case of halt groups #902

Open
kr-sc opened this issue Aug 7, 2023 · 13 comments
Open

Incorrect work of examine() in case of halt groups #902

kr-sc opened this issue Aug 7, 2023 · 13 comments

Comments

@kr-sc
Copy link
Contributor

kr-sc commented Aug 7, 2023

I launch spike simulator with 4 harts:

spike \
  --rbb-port=7777 \
  --isa=rv64gc \
  --dm-progsize=6 \
  --dm-sba=0 \
  --dm-abstract-rti=0 \
  -p4 \
  -m0x80000000:0xa00000 \
  <some_elf_file>

then the following script:

for LOG in {1..2}; \
  do openocd \
    -c "adapter driver remote_bitbang" \
    -c "remote_bitbang host 127.0.0.1" \
    -c "remote_bitbang port 7777" \
    -c 'gdb_port disabled' \
    -f spike_smp4_rtoshw4.cfg \
    -c "log_output ${LOG}.log" \
    -c init \
    -c targets \
    -c shutdown -d3; \
done;
spike_smp4_rtoshw4.cfg
proc init_targets {} {
	set _CHIPNAME riscv
	jtag newtap $_CHIPNAME cpu -irlen 5 -expected-id 0x10e31913

	set _TARGETNAME_0 $_CHIPNAME.cpu0
	set _TARGETNAME_1 $_CHIPNAME.cpu1
	set _TARGETNAME_2 $_CHIPNAME.cpu2
	set _TARGETNAME_3 $_CHIPNAME.cpu3

	target create $_TARGETNAME_0 riscv -chain-position $_CHIPNAME.cpu -coreid 0 -rtos hwthread
	target create $_TARGETNAME_1 riscv -chain-position $_CHIPNAME.cpu -coreid 1 -rtos hwthread
	target create $_TARGETNAME_2 riscv -chain-position $_CHIPNAME.cpu -coreid 2 -rtos hwthread
	target create $_TARGETNAME_3 riscv -chain-position $_CHIPNAME.cpu -coreid 3 -rtos hwthread

	target smp $_TARGETNAME_0 $_TARGETNAME_1 $_TARGETNAME_2 $_TARGETNAME_3
}

Result of the first iteration:

TargetName         Type       Endian TapName            State
--  ------------------ ---------- ------ ------------------ ------------
 0  riscv.cpu0         riscv      little riscv.cpu          running
 1  riscv.cpu1         riscv      little riscv.cpu          running
 2  riscv.cpu2         riscv      little riscv.cpu          running
 3* riscv.cpu3         riscv      little riscv.cpu          running

Result of the second iteration:

TargetName         Type       Endian TapName            State
--  ------------------ ---------- ------ ------------------ ------------
 0  riscv.cpu0         riscv      little riscv.cpu          running
 1  riscv.cpu1         riscv      little riscv.cpu          unknown
 2  riscv.cpu2         riscv      little riscv.cpu          unknown
 3* riscv.cpu3         riscv      little riscv.cpu          unknown

Also, if you add additional printing in file riscv-013.c in function riscv013_get_hart_state(), you can see that dmstatus values are different between two runs.

From 8ca3f30a292977f7b94ba057a86fab7459b8f2c8 Mon Sep 17 00:00:00 2001
From: Kirill Radkin <[email protected]>
Date: Mon, 7 Aug 2023 12:57:39 +0300
Subject: [PATCH] addtional printing

---
 src/target/riscv/riscv-013.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/target/riscv/riscv-013.c b/src/target/riscv/riscv-013.c
index 24713a93c..70f905c35 100644
--- a/src/target/riscv/riscv-013.c
+++ b/src/target/riscv/riscv-013.c
@@ -2496,6 +2496,7 @@ static int riscv013_get_hart_state(struct target *target, enum riscv_hart_state
 		return ERROR_FAIL;
 
 	uint32_t dmstatus;
+	LOG_INFO("DMSTATUS == %" PRIx32, dmstatus);
 	if (dmstatus_read(target, &dmstatus, true) != ERROR_OK)
 		return ERROR_FAIL;
 	if (get_field(dmstatus, DM_DMSTATUS_ANYHAVERESET)) {
-- 
2.34.1

If I understand it correctly, this situation is connected to how examine() is implemented in riscv-013.c. At the end of examine() OpenOCD adds all targets to a halt group (because we have smp configuration in this case). It leads to an interesting situation: harts are halted and resumed one by one (as it should be) when examined during the first connection, but when examined during the second connection harts are already merged into a halt group. Because of that when OpenOCD tries to halt the first target, all harts become in a halted state.

@kr-sc
Copy link
Contributor Author

kr-sc commented Aug 25, 2023

@timsifive, can you confirm that my reasoning is correct?

@JanMatCodasip
Copy link
Collaborator

JanMatCodasip commented Aug 29, 2023

@kr-sc I haven't yet tried to reproduce it locally. What values of dmstatus are you getting, please? Are they unexpected?

Also your added LOG_INFO print in the patch is put before the dmstatus_read() call - can you check if that is maybe a typo?

My expectation is that the examination should pass even if the harts are members a halt/resume group. For the examination, it is important that the given hart being examined can be halted for a brief period of time (and then resumed, if running previously).

If in that process some other hart(s) get halted/resumed also (due to the halt/resume groups), that should not break the examination as such.

@kr-sc
Copy link
Contributor Author

kr-sc commented Aug 30, 2023

I haven't yet tried to reproduce it locally. What values of dmstatus are you getting, please? Are they unexpected?

I get values that differs only in all/any resumeack bits, maybe it's not important here.

Also your added LOG_INFO print in the patch is put before the dmstatus_read() call - can you check if that is maybe a typo?

Oh, it should be after dmstatus_read(), of course.

My expectation is that the examination should pass even if the harts are members a halt/resume group. For the examination, it is important that the given hart being examined can be halted for a brief period of time (and then resumed, if running previously).

Main problem that when we try to halt (if it's was not halted, for example) only first target during second examine() call, we halt all targets which was added in halt group during previous examiine() call and we lost information about hart's state before second examine.

@timsifive
Copy link
Collaborator

Can you add -d to your OpenOCD invocations, capture the output, and attach it here?

@kr-sc
Copy link
Contributor Author

kr-sc commented Sep 5, 2023

1.log
2.log

After second examine targets (1-3) in halted state (not unknown, maybe somebody fix it in new versions), but before first connection all of them were in running state.

@kr-sc
Copy link
Contributor Author

kr-sc commented Sep 5, 2023

P.S. unknown hart state was fixed here 198edca

@timsifive
Copy link
Collaborator

The difference seems to be simply that in the first log the harts are running while in the second log they are halted.

@aap-sc do you have time to look at this? It looks related to (or maybe even fix by) #900.

@timsifive
Copy link
Collaborator

Wait, is @kr-sc's latest comment saying this was fixed by #900? If so, can we just close this?

@aap-sc
Copy link
Collaborator

aap-sc commented Sep 8, 2023

@timsifive not quite. What @kr-sc says is that unknown status was fixed. However, the underlying issue is still there.

It's just that instead of unknown status, harts will be halted, instead of unknown. I'll try to produce an updated version of logs. My understanding is that this is related to how halt-groups are handled by OpenOCD during examine: when harts are combined in a halt group - this is an invasive change which is not reverted upon disconnect/re-examine). It's been a while since I took a glance at this one, so my memory may fail me.

Actually, @kr-sc was going to look at this one eventually, however since halt groups are not supported by Syntacore HW this task is not of the highest priority. This issue was filed just to document a problem.

That being said - I'll double check the situation.

@kr-sc
Copy link
Contributor Author

kr-sc commented Sep 18, 2023

The difference seems to be simply that in the first log the harts are running while in the second log they are halted.

Yes, but in both runs we connect to same simulator session, which targets were in running state before first connection. After second, they become in halted state. I think, that it is incorrect behaviour of OpenOCD, becauese we didn't send any commands during these two connections, but we changed hart state.

@timsifive
Copy link
Collaborator

To summarize:

  1. When first connecting, all harts are running. After examine() all harts are still running. examine() did put all harts in the same halt group.
  2. When connecting a second time, all harts are also running. However, during examine(), halting the first hart halts all the other harts as well (because they're in the same halt group). After examine(), hart 0 is running (because it was running when examine() first looked at it), but the remaining harts are halted (because they were halted when examine() first looked at them).

I agree that's a bug. Not sure when I'll get to it. One solution might be to clear all halt groups before examining the first hart. (But examine() is called per "target" without any extra context, so it's not as straightforward as it might be.)

@kr-sc
Copy link
Contributor Author

kr-sc commented Sep 26, 2023

One solution might be to clear all halt groups before examining the first hart.

Halt groups are used to implement OpenOCD smp target, so I think we should clear halt groups before OpenOCD disconnected from target.

@aap-sc
Copy link
Collaborator

aap-sc commented Sep 26, 2023

so I think we should clear halt groups before OpenOCD disconnected from target.

I think we should reset halt groups even upon initial connection. Though it makes examine even more invasive :(, however the resulting behavior is more predictable.

@kr-sc kr-sc changed the title Incorrect work of examine() Incorrect work of examine() in case of halt groups Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants