Replies: 8 comments 1 reply
-
Hmm, for as long as this 220-uartmon-update branch is effective as a temporary workaround for getting a 'modest' degree of uart-based debugging capabilities within xemu, I think we shouldn't disable it right now, but only disable it once umon completely replaces it.
A few items there look familiar, that I might have mentioned along the way as I made my tweaks on the 220-uartmon-update branch. I suspect a few other fixes and additions for missing things were done on commits there, but I'd have to revise those past efforts to recall these things :) I also suspect that Paul still occasionally fiddles with the output of the uart monitor, inadvertently breaking any utility that attempts to make use of the uart monitor (which I've learnt to live with over the years), but anyways, there might be things he's tweaked recently that ought to get reflected in xemu's uart monitor too (as we spot them). I'm also wondering, since you're planning on ditching uartmon in favour of umon, 'perhaps' there might be merit in merging in at least some of my tweaks from 220-uartmon-update into uartmon (saving me from having to rebase my efforts from time to time to keep up with your latest work). Since you plan to ditch uartmon anyway, perhaps merging my stuff in won't be such a problem, as it'll get ditched in time for the new umon implemention anyway? But anyways, not fussed, I can get by for now as-is :) |
Beta Was this translation helpful? Give feedback.
-
@gurcei Let's try something. I've opened a new branch Just please do not refer for '220' as the issue in commit and PR messages, rather than this discussion number (#335) I think the major issue with rebase (as you mentioned you did) your changes is more about |
Beta Was this translation helpful? Give feedback.
-
And to mentions some plans as well ... So what I would like to create:
More longER term plan: Mega ;) project: re-work the whole memory decoding mechanism in Xemu, allowing "cost free" run without any breakpoint etc, but still allowing to have even data/code access r/w breakpoints etc, based on certain conditions (in this case the emulator slows down): #209 It would be even faster (without active breakpoint event) than current solution and would be much cleaner and easier to maintain. It would also provide a mechanism to run-time redefine the memory handlers for debug-aware (like conditional breakpoints). This quite huge change is much more than the umon only, but certainly an important factor for that too. |
Beta Was this translation helpful? Give feedback.
-
Let me post the stacktrace of the macOS crash that seems to consistently occur under macOS on different branches. It happens once t0 is executed. I post this here because it may be this is a general bug that may be looked at when the umon branch gets further developed (following the ideas discussed here). This is the stacktrace when built on
Seems we get some more info from Clang's address sanitiser when crashing on the now rebased 220-uartmon-update branch from @gurcei (the link points to the commit I built from that branch):
Maybe this is helpful to understand what's happening? |
Beta Was this translation helpful? Give feedback.
-
Current situation: I have some work behind the new communication layer. It's multi-threaded multi-connection multi-mode "stuff" ;) - I have run out of superlatives at the end ... - capable of supporting (with auto detection so on the same TCP/IP port) text based communication, HTTP and websocket. The later one allows to use a web browser as client as well without any need of special proxy or software between. However, at this point there are some considerable decision is waiting. For example, the old communication framework did some ugly echoing, ie: the received command is echoed back even allowing to do it so with slow typing. Surly it may reflect MEGA65's serial-over-USB connection better but it would really complicate things, if clients (like m65dbg) really depends on this feature. |
Beta Was this translation helpful? Give feedback.
-
After a huge gap (...) I'm here again with the memory decoder rewriting. This is another project, not strictly started because of umon or likes, but nevertheless it's a must here as well. My current plans:
Please note: these are only hooks/callback/signal methods, not the umon itself. These are intended to be used by umon, but not directly the umon project. Also this plan says nothing and does nothing with the debugging itself: ie, umon can use the CPU opcode fetch notification to implement breakpoint or even multiple or conditional ones, even checking the opcode too (not just PC), the CPU emulator does not need to know about this anything, neither the need to complicate the "expensive" main loop, which slows down everything all the time (even if no debugging is in use at some point). |
Beta Was this translation helpful? Give feedback.
-
Memory watch planned internal Xemu APIFor maximal performance and flexibility, this API largely exposes Xemu's internal structures for the debugger (like the umon "server" built in Xemu) implementation. This however has its disadvantage, that it requires some effort to understand and using well. Implementing an abstraction layer here would seriously worsen the emulation performance even more (ie: in memory watch mode, Xemu is much slower already ...), which may not worth the price. On 40.5MHz, ideally about every clock cycle has some memory access, opcode, data etc, that is, about 40 million calls throughout this API ... This also means that the debugger must be efficient as much as possible to have the fastest and minimal code only on those callbacks which are provided with this API. The other two important factors:
Memory decoding in Xemu/MEGA65 and the notion of 'slot'MEGA65's memory decoding is multi-layered and quite complex. To efficiently emulate this at a clock speed of about 40MHz, we must be clever and the naive approach to do all the work on access time cannot be used seriously. The 16 bit CPU address space is broken down into 256 byte long 'pages' called However, we have memory accesses which are not 16 bit address based, which is unique to MEGA65 (or C65) but unheard by C64. This is the 32 bit addressing modes of MEGA65 CPU, and the memory accesses done by the DMA. For sure, the 28 bit usable address space of MEGA65 is simply too large to divide it to 256 long pages. So, for non 16 bit accesses, only a single virtual slot is used for a given task. That means, slots $00-$FF means the 16 bit normal CPU addressing slots, and any slot number greater than that means something special, like one assigned to DMA source reading, or DMA list reading, or CPU 32 bit direct addressing, and so on. The reason it's called Slots:
There can be other - special - slots, but those are not suited to be handled by this API, since they are for internal use, and would be even a huge problem to try to put a custom callback on those (probably it would result in a running away recursion). Be sure not registering callbacks for other slots! This is an internal API, so there is no boundary or value check. Other than the $00 ... $FF slots, they must be refereed with this "names" (macros) though they have numeric (greater than These values (and names) are important, since registering watcher callbacks must refer these IDs to do so. See later about callback registration. Memory watch callbacksThe heart of the ability to implement memory watch features is the opportunity for the debugger to provide its own memory reading/writing function for a (or some, or all) given slot(s). This means:
The CPU state is somewhat undefined in this callbacks, since a read/write event can and will occur in the middle of an opcode, not at the beginning. This also means, that though you can stop the CPU (this is not scope of this API, TODO), you cannot abort the already processed opcode, and that will be finished, and then the CPU will stop, if requested. Also the PC value may contain "middle of an opcode" during these callbacks. If you need the PC value of the opcode itself, you can use the The callbacks themselves should be written something like these:
In this example we can also see the most simple case possible: the callbacks do not do anything just revert to the normal memory read/write. It's not so useful, since it's what happening without the extra step of watch callbacks as well. Of course do not do this, since it's the very same in effect as not registering a callback but this way it's slower. These examples above only make sense, if the "Insert your code here" part is filled with some useful things to do for debugging purposes. Using "static" functions are perfectly OK, since we'll register these callbacks via function pointers anyway. See later. Warning: The interpretation of the parameters are a bit crazy, but that's because of the efficiency of the emulation in general, what Xemu uses internally anyway:
The callback must decode the addresses themselves if it needs it.
Be sure you note the difference in linear addressing: there are two macros to get the linear address, one for read and one for write callbacks! If you mix them by mistake, the result can be chaotic! If CPU 16 bit, or linear address is needed for the callback, it's better not to calculate them all the time (if the callback use that info more than once) but only once, like:
And then use the Register/unregister callbacksThe last thing to get to know is how to register/unregister callbacks. As it was already mentioned, it's important to only register memory watch callbacks when they're really needed, and unregister them, if no need for them anymore. Both read and write callbacks can be registered to any of the slots mentioned in the slot table above. It means, that only memory access event will cause watch callback invocation what was registered for the given type (read or write), also in theory there can be quite different callbacks for many-many slots registered. The performance bottleneck for emulation is in general when a memory watch callback must be called (especially if that callback is complicated and slow), thus it's perfectly OK (in fact, superb!) if only a single slot is registered if we know, there is the access event only we are looking for. Since, then for other slots, no slow-down in emulation at all.
This is all. This register a range of slots for read callback (function pointer) to "read_callback" and write callback to "write_callback". Either or both of them can be NULL for the function pointers, meaning no callback. If there was a callback before for the given slot and type (read/write), giving NULL means "unregistering", a non-NULL value means registering or changing registered callback (if there was any before there). Note: registering/unregistering callbacks invalidates Xemu's memory decoding structures for the given slot range. This has some cost and performance impact. Thus, if possible do not change registered callbacks very-very frequently. Again: do not try to register callbacks on "unknown" slots. This is an internal Xemu API does not have sanity our boundary checking. Some "hidden" slots should not be ever touched. So, only use slots $00-$FF or the ones by macro name mentioned in the slot table. This also means, that the extra ones (above $FF) cannot be specified normally by range because of the danger to accidentally modify something which shouldn't be touched ever. A non-range single slot register/unregister call can be achieved with The ability to register for individual slots, means that you can register different callbacks for different slots, so you can exploit the fact that the address range for eg 16 bit operations are already pre-selected for you into 256 parts of the 64K memory. However if it's not desired, some can register all the slots to the same callback and using own logic there to decide what to do based on the address. It's important to note, that 00-FF slot callbacks can query both of 16 bit and the resulted 32 bit address, but for slots above $FF only the 32 bit address is valid! Note: a single memory access fires a single slot. So if CPU writes $D000 (I/O banked it, let's assume) then only slot $D0 will fire, though it also contains the linear address of MEGA65 I/O space (depending the I/O mode you're in, C64,C65, MEGA65 ...). If I/O is mapped with a MAP instruction to - let's say - to $8000, then callback for $80 will fire, but again the linear address is set to the real I/O address. In both of these cases though you'll find CPU address $D000 or $8000, but the same linear address, since they refer for the very same MEGA65 physical (=linear) address. However! If you use a 32 bit ZP based addressing to access that I/O reg, only slot |
Beta Was this translation helpful? Give feedback.
-
@gurcei I've started to refactor the code. I had to realize that I don't know or already forgot the exact details how this protocol for debugging should work. There are very ugly things I've done in the past to make it work, honestly I haven't even always understood my own code what "this crazy dude thought with all of this" (crazy dude = myself). Do we have any document on the protocol? If no, I plan to write one while doing this (unless if you want to do it!) as it can be interesting for others as well, who (eg) want to write a debugger for example, or whatever. Now only one thing from this bug topic: Now, I am at the part of placing breakpoint. I'm really unsure in this point though, what to do with the situation to have multiple breakpoints, how to remove a breakpoint ... The current stuff I'm developing can do multiple breakpoints, however I am really unsure how to reflect this via the protocol. some may want to dynamically add/remove breakpoints. IIRC (I can be wrong!) MEGA65 only allows a single breakpoint? |
Beta Was this translation helpful? Give feedback.
-
Generic new, OS- and emulation independent debug interface called
umon
(meant to replaceuartmon
in the future):Current situation and problems:
@gurcei and @ki-bo
Please comment here, or add new problems here, before any issue forming, as currently it's a very unsupported and vague area, unfortunately :( It would be useful to collect all problems, information, ideas, etc etc first. Thanks.
Short term goals:
umon
can replace it in the future) in Xemu/MEGA65 since it's known to be very problematic nowBeta Was this translation helpful? Give feedback.
All reactions