UMON debugging interface in Xemu in general and in the MEGA65 emulator #335

lgblgblgb · 2022-04-11T17:01:50Z

lgblgblgb
Apr 11, 2022
Maintainer

Generic new, OS- and emulation independent debug interface called umon (meant to replace uartmon in the future):

very old issue (unfortunately): CORE: Implement a generic external monitor framework in Xemu (initially used by MEGA65 emulation only though) #11

Current situation and problems:

bad MAP values shown: MEGA65: MAPL and MAPH values in uart monitor reported incorrectly #286
missing load command: MEGA65: uartmon load command #332
trace continue command (t1) crash problem: MEGA65: trace continue (t0) crashes xemu #333

Please comment here, or add new problems here, before any issue forming, as currently it's a very unsupported and vague area, unfortunately :( It would be useful to collect all problems, information, ideas, etc etc first. Thanks.

Short term goals:

HAVE a complete documentation on the debug protocol used on MEGA65
collect information/ideas/issues here
decide if it's better to disable uartmon (temporarily now?, or even remove if umon can replace it in the future) in Xemu/MEGA65 since it's known to be very problematic now

gurcei · 2022-04-27T14:35:22Z

gurcei
Apr 27, 2022

decide if it's better to disable uartmon...

Hmm, for as long as this 220-uartmon-update branch is effective as a temporary workaround for getting a 'modest' degree of uart-based debugging capabilities within xemu, I think we shouldn't disable it right now, but only disable it once umon completely replaces it.

Current situation and problems:

A few items there look familiar, that I might have mentioned along the way as I made my tweaks on the 220-uartmon-update branch. I suspect a few other fixes and additions for missing things were done on commits there, but I'd have to revise those past efforts to recall these things :)

I also suspect that Paul still occasionally fiddles with the output of the uart monitor, inadvertently breaking any utility that attempts to make use of the uart monitor (which I've learnt to live with over the years), but anyways, there might be things he's tweaked recently that ought to get reflected in xemu's uart monitor too (as we spot them).

I'm also wondering, since you're planning on ditching uartmon in favour of umon, 'perhaps' there might be merit in merging in at least some of my tweaks from 220-uartmon-update into uartmon (saving me from having to rebase my efforts from time to time to keep up with your latest work). Since you plan to ditch uartmon anyway, perhaps merging my stuff in won't be such a problem, as it'll get ditched in time for the new umon implemention anyway? But anyways, not fussed, I can get by for now as-is :)

0 replies

lgblgblgb · 2022-04-28T09:20:47Z

lgblgblgb
Apr 28, 2022
Maintainer Author

@gurcei Let's try something. I've opened a new branch umon would be useful if you do some PRs about the changes. I can still transfer those then with cherrypick and other mechanism into the "mainline" with keeping the author etc info.

Just please do not refer for '220' as the issue in commit and PR messages, rather than this discussion number (#335)

I think the major issue with rebase (as you mentioned you did) your changes is more about mega65.c as there were several changes there. I wouldn't think it's a problem too much at other places.

0 replies

lgblgblgb · 2022-04-28T12:10:10Z

lgblgblgb
Apr 28, 2022
Maintainer Author

And to mentions some plans as well ... So what I would like to create:

"umon" connection via TCP/IP only, supporting (auto detected on the connection) the usual protocol we use now, or using even HTTP, embedding commands into HTTP requests (thus some can easily write a platform independent "GUI" in JavaScript for umon), I even consider a third mode, websockets, thus allowing low-latency binary protocol over HTTP instead of slower HTTP requests used
connection handling is in thread, to avoid the horrific lag it creates now to have that at once only per frame
the actual command handlers still need in the "main" emulator thread, communicating with some lock protected queues with the connection thread
command handler can be executed at every scanline rather than every frames like now

More longER term plan:

Mega ;) project: re-work the whole memory decoding mechanism in Xemu, allowing "cost free" run without any breakpoint etc, but still allowing to have even data/code access r/w breakpoints etc, based on certain conditions (in this case the emulator slows down): #209

It would be even faster (without active breakpoint event) than current solution and would be much cleaner and easier to maintain. It would also provide a mechanism to run-time redefine the memory handlers for debug-aware (like conditional breakpoints).

This quite huge change is much more than the umon only, but certainly an important factor for that too.

0 replies

ki-bo · 2022-04-28T18:45:00Z

ki-bo
Apr 28, 2022

Let me post the stacktrace of the macOS crash that seems to consistently occur under macOS on different branches. It happens once t0 is executed. I post this here because it may be this is a general bug that may be looked at when the umon branch gets further developed (following the ideas discussed here).

This is the stacktrace when built on umon branch:

==76094==ERROR: AddressSanitizer: SEGV on unknown address 0x000125544000 (pc 0x7ff802899816 bp 0x00030a05a160 sp 0x00030a05a130 T0)
==76094==The signal is caused by a READ memory access.
    #0 0x7ff802899816 in _platform_memmove$VARIANT$Rosetta+0x126 (libsystem_platform.dylib:x86_64+0x6816)
    #1 0x10c5e28d6 in __asan_memcpy+0x296 (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x458d6)
    #2 0x102908684 in vic4_render_scanline vic4.c:1484
    #3 0x1028cff7b in emulation_loop mega65.c:758
    #4 0x1028cce2e in main mega65.c:848
    #5 0x2036e551d in start+0x1cd (dyld:x86_64+0x551d)

==76094==Register values:
rax = 0x0000000000000000  rbx = 0x0000000000000c80  rcx = 0x0000000125544ca0  rdx = 0x0000000000000c80  
rdi = 0x0000000125544c80  rsi = 0x0000000125544000  rbp = 0x000000030a05a160  rsp = 0x000000030a05a130  
 r8 = 0x0000000000000032   r9 = 0x0000000000000030  r10 = 0x0000000000000000  r11 = 0x0000100024aa8b20  
r12 = 0x00000002037603a0  r13 = 0x000000030a05b628  r14 = 0x0000000125544000  r15 = 0x0000000125544c80  
AddressSanitizer can not provide additional info.

Seems we get some more info from Clang's address sanitiser when crashing on the now rebased 220-uartmon-update branch from @gurcei (the link points to the commit I built from that branch):

WRITE of size 4 at 0x00011e101900 thread T0
    #0 0x100adcf00 in vic4_render_scanline vic4.c:1490
    #1 0x100aa3fe8 in emulation_loop mega65.c:837
    #2 0x100aa1118 in main mega65.c:927
    #3 0x1018ad084 in start+0x200 (dyld:arm64e+0x5084)

0x00011e101900 is located 5888 bytes inside of 16384-byte region [0x00011e100200,0x00011e104200)
freed by thread T11 here:
    #0 0x101f47c94 in wrap_free+0x98 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x3fc94)
    #1 0x1a1b417a4 in IOGPUResourceListDestroy+0x20 (IOGPU:arm64e+0x157a4)
    #2 0x1a1b31088 in IOGPUMetalCommandBufferStorageDealloc+0x9c (IOGPU:arm64e+0x5088)
    #3 0x1a1b2fb6c in -[IOGPUMetalCommandBuffer didCompleteWithStartTime:endTime:error:]+0xe8 (IOGPU:arm64e+0x3b6c)
    #4 0x18f9ce808 in -[_MTLCommandQueue commandBufferDidComplete:startTime:completionTime:error:]+0x84 (Metal:arm64e+0x20808)
    #5 0x1a1b37a00 in __IOGPUNotificationQueueSetDispatchQueue_block_invoke+0xa0 (IOGPU:arm64e+0xba00)
    #6 0x186ef6284 in _dispatch_client_callout4+0x10 (libdispatch.dylib:arm64e+0x4284)
    #7 0x186f12538 in _dispatch_mach_msg_invoke+0x1cc (libdispatch.dylib:arm64e+0x20538)
    #8 0x186efd780 in _dispatch_lane_serial_drain+0x174 (libdispatch.dylib:arm64e+0xb780)
    #9 0x186f13258 in _dispatch_mach_invoke+0x1c4 (libdispatch.dylib:arm64e+0x21258)
    #10 0x186efd780 in _dispatch_lane_serial_drain+0x174 (libdispatch.dylib:arm64e+0xb780)
    #11 0x186efe434 in _dispatch_lane_invoke+0x1b8 (libdispatch.dylib:arm64e+0xc434)
    #12 0x186efd780 in _dispatch_lane_serial_drain+0x174 (libdispatch.dylib:arm64e+0xb780)
    #13 0x186efe400 in _dispatch_lane_invoke+0x184 (libdispatch.dylib:arm64e+0xc400)
    #14 0x186f08c94 in _dispatch_workloop_worker_thread+0x284 (libdispatch.dylib:arm64e+0x16c94)
    #15 0x1870b635c in _pthread_wqthread+0x11c (libsystem_pthread.dylib:arm64e+0x335c)
    #16 0x1870b507c in start_wqthread+0x4 (libsystem_pthread.dylib:arm64e+0x207c)

previously allocated by thread T0 here:
    #0 0x101f47b58 in wrap_malloc+0x94 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x3fb58)
    #1 0x1a1b41718 in IOGPUResourceListInit+0x60 (IOGPU:arm64e+0x15718)
    #2 0x1a1b30d40 in IOGPUMetalCommandBufferStorageCreateExt+0xc0 (IOGPU:arm64e+0x4d40)
    #3 0x1a1b31d18 in IOGPUMetalCommandBufferStoragePoolCreateStorage+0x68 (IOGPU:arm64e+0x5d18)
    #4 0x1a1b2f798 in -[IOGPUMetalCommandBuffer initWithQueue:retainedReferences:synchronousDebugMode:]+0xc4 (IOGPU:arm64e+0x3798)
    #5 0x1e8bdb1c0  (AGXMetal13_3:arm64e+0x1f21c0)
    #6 0x1e8bdd55c  (AGXMetal13_3:arm64e+0x1f455c)
    #7 0x101b59fcc in METAL_ActivateRenderCommandEncoder+0x1e4 (libSDL2-2.0.0.dylib:arm64+0xb1fcc)
    #8 0x101b59264 in METAL_RunCommandQueue+0x39c (libSDL2-2.0.0.dylib:arm64+0xb1264)
    #9 0x101ade328 in FlushRenderCommands+0x24 (libSDL2-2.0.0.dylib:arm64+0x36328)
    #10 0x101ae46ac in SDL_RenderPresent_REAL+0x40 (libSDL2-2.0.0.dylib:arm64+0x3c6ac)
    #11 0x100b33174 in xemu_update_screen emutools.c:1246
    #12 0x100ada978 in vic4_close_frame_access vic4.c:251
    #13 0x100aa545c in update_emulator mega65.c:668
    #14 0x100aa35c8 in emulation_loop mega65.c:778
    #15 0x100aa1118 in main mega65.c:927
    #16 0x1018ad084 in start+0x200 (dyld:arm64e+0x5084)

Thread T11 created by T0 here:
    <empty stack>

SUMMARY: AddressSanitizer: heap-use-after-free vic4.c:1490 in vic4_render_scanline
Shadow bytes around the buggy address:
  0x007023c402d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x007023c402e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x007023c402f0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x007023c40300: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x007023c40310: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x007023c40320:[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x007023c40330: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x007023c40340: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x007023c40350: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x007023c40360: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x007023c40370: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb

Maybe this is helpful to understand what's happening?

0 replies

lgblgblgb · 2022-09-05T21:48:09Z

lgblgblgb
Sep 5, 2022
Maintainer Author

Current situation: I have some work behind the new communication layer. It's multi-threaded multi-connection multi-mode "stuff" ;) - I have run out of superlatives at the end ... - capable of supporting (with auto detection so on the same TCP/IP port) text based communication, HTTP and websocket. The later one allows to use a web browser as client as well without any need of special proxy or software between.

However, at this point there are some considerable decision is waiting. For example, the old communication framework did some ugly echoing, ie: the received command is echoed back even allowing to do it so with slow typing. Surly it may reflect MEGA65's serial-over-USB connection better but it would really complicate things, if clients (like m65dbg) really depends on this feature.

0 replies

lgblgblgb · 2024-02-27T18:12:26Z

lgblgblgb
Feb 27, 2024
Maintainer Author

After a huge gap (...) I'm here again with the memory decoder rewriting. This is another project, not strictly started because of umon or likes, but nevertheless it's a must here as well. My current plans:

The CPU emulator will get a callback mechanism to signal the umon layer (if requested to do so!) about instruction fetch. So umon can decide if it's a break point or not, and can even initiate a CPU stop (breakpoint). Advantage: in contrast of the current model it does not need checking breakpoints in the main loop, which is impossible anyway in general since Xemu by default executes more CPU steps at once (a scanline worth of amount), otherwise it will be very slow. In the CPU emulator, a signaling mechanism is neat, since it only needs to check a single bool variable to see if it needs to signal umon (breakpoint mode active) but still can have the basic model of multi-opcode execution otherwise, no need for the main emulation loop to defer to single-opcode execution which yields in very expensive contextual switches all the time between the running environments (CPU emulator vs main loop). umon for sure needs to maintain the signal method activated and deactivated on-demand, and only ask the CPU emulator to do so, when it's really used. Surely in the callback, umon can examine CPU registers including the PC.
In theory the CPU emulator can even signal umon about accepting an NMI or IRQ. I'm not sure if it's needed.
Other than CPU opcode fetch / execution, the other area which needs more attention is the "memory watch mode", ie to see if a program reads/writes a certain address. Here it comes the new memory decoder, which can be reconfigured run-time for the slower callback method (back and forth from normal mode) when umon can be signaled about memory reads/writes, including:
- CPU read/writes given by 16-bit CPU address, though the result linear address will be passed through as well
- Linear read/writes, ie non-16-bit CPU addresses, with the ZP based 32 (28 ...) bit addressing
- Even DMA accesses can signal umon, if requested by umon [no idea, if it's useful or not]

Please note: these are only hooks/callback/signal methods, not the umon itself. These are intended to be used by umon, but not directly the umon project. Also this plan says nothing and does nothing with the debugging itself: ie, umon can use the CPU opcode fetch notification to implement breakpoint or even multiple or conditional ones, even checking the opcode too (not just PC), the CPU emulator does not need to know about this anything, neither the need to complicate the "expensive" main loop, which slows down everything all the time (even if no debugging is in use at some point).

0 replies

lgblgblgb · 2024-02-28T15:01:05Z

lgblgblgb
Feb 28, 2024
Maintainer Author

Memory watch planned internal Xemu API

For maximal performance and flexibility, this API largely exposes Xemu's internal structures for the debugger (like the umon "server" built in Xemu) implementation. This however has its disadvantage, that it requires some effort to understand and using well. Implementing an abstraction layer here would seriously worsen the emulation performance even more (ie: in memory watch mode, Xemu is much slower already ...), which may not worth the price. On 40.5MHz, ideally about every clock cycle has some memory access, opcode, data etc, that is, about 40 million calls throughout this API ... This also means that the debugger must be efficient as much as possible to have the fastest and minimal code only on those callbacks which are provided with this API.

The other two important factors:

Only register callbacks when they're really needed, and unregister them, if not needed anymore, allowing Xemu to gain significant emulation performance boost again.
However, do not use register/unregister events all the time though like crazy, since those events requires to invalidating the full memory decoding cache used by Xemu, and resolve everything again, which has its cost.

Memory decoding in Xemu/MEGA65 and the notion of 'slot'

MEGA65's memory decoding is multi-layered and quite complex. To efficiently emulate this at a clock speed of about 40MHz, we must be clever and the naive approach to do all the work on access time cannot be used seriously. The 16 bit CPU address space is broken down into 256 byte long 'pages' called slot. This is a must, since 256 byte is the smallest granularity MEGA65 can map. C64 style banking surely has 4K/8K granularity, but because of MAP, we must use the smallest (yes, MAP is 8K based, however the mapped region can be anywhere in the physical address space, at every 256 byte boundaries). Thus, normally, the slot can be thought of the high byte of the 16 bit CPU address, and the offset within that page is the low byte of the address.

However, we have memory accesses which are not 16 bit address based, which is unique to MEGA65 (or C65) but unheard by C64. This is the 32 bit addressing modes of MEGA65 CPU, and the memory accesses done by the DMA. For sure, the 28 bit usable address space of MEGA65 is simply too large to divide it to 256 long pages. So, for non 16 bit accesses, only a single virtual slot is used for a given task. That means, slots $00-$FF means the 16 bit normal CPU addressing slots, and any slot number greater than that means something special, like one assigned to DMA source reading, or DMA list reading, or CPU 32 bit direct addressing, and so on.

The reason it's called slot and not page is exactly the fact, that only the 256 first slots are real pages of the CPU's 64K address space, the extra slots are not really "pages" but cache entries more which have every changing linear mapping. But to simplify Xemu and make it faster, I had to unify these, within a single framework, thus I opted for the notion of slot to avoid confusion what it means (not always pages!).

Slots:

Slot number/name	Description
`$00` ... `$FF`	CPU 16 bit addressing slots, think the slot number as the high byte of the 16 bit CPU address. Also used if inline DMA is used to fetch DMA list
`MEM_SLOT_DMA_LIST`	This slot is used by the DMA emulation to read the DMA list (unless if it's an inline DMA ...)
`MEM_SLOT_DMA_SOURCE`	Used by DMA emulation to access the source channel during a DMA operation
`MEM_SLOT_DMA_TARGET`	Used by DMA emulation to access the target channel during a DMA operation
`MEM_SLOT_CPU_LINEAR`	Used by CPU emulation when 32 (28 ...) bit addressing mode is used (thus not 16 bit)

There can be other - special - slots, but those are not suited to be handled by this API, since they are for internal use, and would be even a huge problem to try to put a custom callback on those (probably it would result in a running away recursion). Be sure not registering callbacks for other slots! This is an internal API, so there is no boundary or value check.

Other than the $00 ... $FF slots, they must be refereed with this "names" (macros) though they have numeric (greater than $FF) values, no code should depend on those numeric values directly, since there can be change in that regard!

These values (and names) are important, since registering watcher callbacks must refer these IDs to do so. See later about callback registration.

Memory watch callbacks

The heart of the ability to implement memory watch features is the opportunity for the debugger to provide its own memory reading/writing function for a (or some, or all) given slot(s). This means:

The callback must be registered/unregistered, so the memory decoding layer knows if it should be applied on memory decoding or not
The callback then must implement the actual memory read/write as well since it takes control over the normal realm of memory decoding. It is not that hard, only a macro must be used, but it's important not to forget this
Callbacks in theory can be register/unregistered by slot basis, this it's possible to have fast emulation in general but using the slower path only for a/some given slot(s). However the debugger implementation then needs to take care about all the registration/unregistration events, and must be very careful with this.

The CPU state is somewhat undefined in this callbacks, since a read/write event can and will occur in the middle of an opcode, not at the beginning. This also means, that though you can stop the CPU (this is not scope of this API, TODO), you cannot abort the already processed opcode, and that will be finished, and then the CPU will stop, if requested. Also the PC value may contain "middle of an opcode" during these callbacks. If you need the PC value of the opcode itself, you can use the old_pc element of the cpu65 structure (however note, that prefixed opcodes are executed separately - with the scheme MEGA65 uses, it's not even easy to do differently, ie we cannot know if a NEG will be a prefix or not, only later ... - thus a prefixed opcode itself will shows old_pc of the opcode after the prefix, however cpu65 struct holds info about the prefix in use). This also means, that memory watch feature is not so well suited for implementing breakpoints, since the opcode is already being executed, and you cannot stop that. Fear not, for the breakpoint feature though, another API will be used, which is tied to the CPU emulator instead (this API document is about the memory decoding based features only).

The callbacks themselves should be written something like these:

static Uint8 some_read_callback ( const Uint16 slot, const Uint8 ofs8 )
{
    // Insert your code here to do something useful for debugging
    return MEM_WATCH_CB_READ(slot, ofs8); // the callback though MUST perform the read operation at some point, and return that value at the end of the function!
}
static void some_write_callback ( const Uint16 slot, const Uint8 ofs8, const Uint8 data )
{
    // Insert your code here to do something useful for debugging
    MEM_WATCH_CB_WRITE(slot, ofs8, data); // the callback though MUST perform the write operation at some point with the provided 'data'!
}

In this example we can also see the most simple case possible: the callbacks do not do anything just revert to the normal memory read/write. It's not so useful, since it's what happening without the extra step of watch callbacks as well. Of course do not do this, since it's the very same in effect as not registering a callback but this way it's slower. These examples above only make sense, if the "Insert your code here" part is filled with some useful things to do for debugging purposes. Using "static" functions are perfectly OK, since we'll register these callbacks via function pointers anyway. See later.

Warning: MEM_WATCH_CB_WRITE and MEM_WATCH_CB_READ must not be used ever for anything else but this functionality, to provide the intended read/write what the callback is meant for. Do not use it to implement features like reading/writing more bytes (not associated with the callback params) or use it elsewhere to read/write memory (there are other solutions for those tasks!). In special cases though the callback can decide to write another data what the callback got as some kind of "odd debug feature" (I can't think of right now) or fake a data in a read callback to return with a value not got with using the MEM_WATCH_CB_READ.

The interpretation of the parameters are a bit crazy, but that's because of the efficiency of the emulation in general, what Xemu uses internally anyway:

slot: contains the memory slot number. For slot numbers $00-$FF means the 256 possible 256 long "pages" of the CPU address space. If slot is a bigger number than $FF: it means it's a special usage, like 32 bit BP based addressing, DMA ...
ofs8: contains the memory offset within a 256 byte long page
data: in case of write callback, this holds the value to be written

The callback must decode the addresses themselves if it needs it.

CPU 16 bit address: can be got with: MEM_WATCH_GET_ADDR16(slot,ofs8) but only if the slot was not bigger than $FF, since otherwise it wasn't a 16 bit access anyway, and slot then has a special meaning instead! In this case, there is no 16 bit CPU address associated at all, thus the formula above mustn't be used!
Linear 32 bit address: can be got with MEM_WATCH_GET_READ_ADDR32(slot,ofs8) or MEM_WATCH_GET_WRITE_ADDR32(slot,ofs8) (depending on being read or write event). This address is always valid. For example if C64 KERNAL is banked, and there is memory watch for some address there, then the linear address will correctly tell the real address of the C64 KERNAL within the MEGA65 linear memory map (ie, ROM starts at $2:0000). Also, remaining in this example, if you try to write banked (not mapped!) KERNAL, you write the RAM "behind" as we know, for sure, the linear address in this case then decoded to the RAM! For sure in both cases though, the CPU address itself can be the very same.

Be sure you note the difference in linear addressing: there are two macros to get the linear address, one for read and one for write callbacks! If you mix them by mistake, the result can be chaotic!

If CPU 16 bit, or linear address is needed for the callback, it's better not to calculate them all the time (if the callback use that info more than once) but only once, like:

const Uint16 cpu_addr = MEM_WATCH_GET_ADDR16(slot,ofs8);  // assuming this callback is only registered for slots $00-$FF ...
const Uint32 linear_addr = MEM_WATCH_GET_READ_ADDR32(slot,ofs8); // for READ callbacks!! Use the WRITE variant otherwise

And then use the cpu_addr and/or linear_addr. Surely, obtain these values only, if they are needed at all. Looks simple, but these are macros which may result in more complex formula which shouldn't be evaluated more than once to save processor power for emulation.

Register/unregister callbacks

The last thing to get to know is how to register/unregister callbacks. As it was already mentioned, it's important to only register memory watch callbacks when they're really needed, and unregister them, if no need for them anymore. Both read and write callbacks can be registered to any of the slots mentioned in the slot table above. It means, that only memory access event will cause watch callback invocation what was registered for the given type (read or write), also in theory there can be quite different callbacks for many-many slots registered. The performance bottleneck for emulation is in general when a memory watch callback must be called (especially if that callback is complicated and slow), thus it's perfectly OK (in fact, superb!) if only a single slot is registered if we know, there is the access event only we are looking for. Since, then for other slots, no slow-down in emulation at all.

mem_watch_register_callback(first_slot, last_slot, read_callback, write_callback);

This is all. This register a range of slots for read callback (function pointer) to "read_callback" and write callback to "write_callback". Either or both of them can be NULL for the function pointers, meaning no callback. If there was a callback before for the given slot and type (read/write), giving NULL means "unregistering", a non-NULL value means registering or changing registered callback (if there was any before there).

Note: registering/unregistering callbacks invalidates Xemu's memory decoding structures for the given slot range. This has some cost and performance impact. Thus, if possible do not change registered callbacks very-very frequently.

Again: do not try to register callbacks on "unknown" slots. This is an internal Xemu API does not have sanity our boundary checking. Some "hidden" slots should not be ever touched. So, only use slots $00-$FF or the ones by macro name mentioned in the slot table. This also means, that the extra ones (above $FF) cannot be specified normally by range because of the danger to accidentally modify something which shouldn't be touched ever. A non-range single slot register/unregister call can be achieved with first_slot the same value as last_slot. Sure, a generic CPU 16 bit address watch without knowing the exact address, can register in theory all the "normal" slots for a single read and write callback, eg first_slot being 0 and last_slot being $FF. That means all "normal" CPU accesses (plus inline DMA ...) but not the 32 bit ZP based addressing and not the DMA operations (except for the inline DMA list reading) because those operates with the "special" slots, ie slot numbers > $FF.

The ability to register for individual slots, means that you can register different callbacks for different slots, so you can exploit the fact that the address range for eg 16 bit operations are already pre-selected for you into 256 parts of the 64K memory. However if it's not desired, some can register all the slots to the same callback and using own logic there to decide what to do based on the address.

It's important to note, that 00-FF slot callbacks can query both of 16 bit and the resulted 32 bit address, but for slots above $FF only the 32 bit address is valid!

Note: a single memory access fires a single slot. So if CPU writes $D000 (I/O banked it, let's assume) then only slot $D0 will fire, though it also contains the linear address of MEGA65 I/O space (depending the I/O mode you're in, C64,C65, MEGA65 ...). If I/O is mapped with a MAP instruction to - let's say - to $8000, then callback for $80 will fire, but again the linear address is set to the real I/O address. In both of these cases though you'll find CPU address $D000 or $8000, but the same linear address, since they refer for the very same MEGA65 physical (=linear) address. However! If you use a 32 bit ZP based addressing to access that I/O reg, only slot MEM_SLOT_CPU_LINEAR will fire, again with the correct 32 bit (linear) address, but in this case there is no valid CPU address, since it's not a 16 bit memory operation! The same if you use DMA when MEM_SLOT_DMA_TARGET will be active.

0 replies

lgblgblgb · 2024-10-28T13:20:20Z

lgblgblgb
Oct 28, 2024
Maintainer Author

@gurcei I've started to refactor the code. I had to realize that I don't know or already forgot the exact details how this protocol for debugging should work. There are very ugly things I've done in the past to make it work, honestly I haven't even always understood my own code what "this crazy dude thought with all of this" (crazy dude = myself). Do we have any document on the protocol? If no, I plan to write one while doing this (unless if you want to do it!) as it can be interesting for others as well, who (eg) want to write a debugger for example, or whatever.

Now only one thing from this bug topic:

Now, I am at the part of placing breakpoint. I'm really unsure in this point though, what to do with the situation to have multiple breakpoints, how to remove a breakpoint ... The current stuff I'm developing can do multiple breakpoints, however I am really unsure how to reflect this via the protocol. some may want to dynamically add/remove breakpoints. IIRC (I can be wrong!) MEGA65 only allows a single breakpoint?

1 reply

gurcei Oct 28, 2024

Heya @lgblgblgb, so far, the main point of documentatio for the protocol is inside the mega65-book, in "Appendix O - Machine Language Monitor", in the section "The Matrix/Serial Monitor". All commands are listed and described there.

Yep, presently, the mega65's monitor only supports one hardware breakpoint.

Hmm, so if your system can do multiple breakpoints, and at present, the protocol doesn't permit this, then it might prove tricky to make multiple breakpoints happen via the existing protocol.

Just as extra food for thought on this, the approach I took in m65dbg was to offer an extra software breakpoint, but only one of those at a time (one hard, one soft). So upon user typing "sbreak " (or "sc "), m65dbg replace the code at that address with a "JMP *". The "sc" (soft continue) will then unpause the cpu, but keep monitoring the PC register and if it sees the PC register stuck on a certain , then it will intentionally pause the cpu and cause the break itself, then replace the "JMP *" with the original code belonging at "".

On the other hand, another option could be that you 'extend' the protocol to support more breakpoints (provide extra commands to list existing breakpoints, and delete them), and the hardware/vhdl might eventually catch up to your protocol improvements in time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UMON debugging interface in Xemu in general and in the MEGA65 emulator #335

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 8 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

UMON debugging interface in Xemu in general and in the MEGA65 emulator #335

lgblgblgb Apr 11, 2022 Maintainer

Replies: 8 comments · 1 reply

gurcei Apr 27, 2022

lgblgblgb Apr 28, 2022 Maintainer Author

lgblgblgb Apr 28, 2022 Maintainer Author

ki-bo Apr 28, 2022

lgblgblgb Sep 5, 2022 Maintainer Author

lgblgblgb Feb 27, 2024 Maintainer Author

lgblgblgb Feb 28, 2024 Maintainer Author

Memory watch planned internal Xemu API

Memory decoding in Xemu/MEGA65 and the notion of 'slot'

Memory watch callbacks

Register/unregister callbacks

lgblgblgb Oct 28, 2024 Maintainer Author

gurcei Oct 28, 2024

lgblgblgb
Apr 11, 2022
Maintainer

Replies: 8 comments 1 reply

gurcei
Apr 27, 2022

lgblgblgb
Apr 28, 2022
Maintainer Author

lgblgblgb
Apr 28, 2022
Maintainer Author

ki-bo
Apr 28, 2022

lgblgblgb
Sep 5, 2022
Maintainer Author

lgblgblgb
Feb 27, 2024
Maintainer Author

lgblgblgb
Feb 28, 2024
Maintainer Author

lgblgblgb
Oct 28, 2024
Maintainer Author