Illegal class number with inject_snap error handling #50

dehorsley · 2020-06-09T09:33:46Z

If the snap command given to inject_snap generates an error, the code to print the message back to the user hits an error eg:

$ inject_snap -w blag
cls_chk: illegal class number 0 pid 25020
cls_chk error 1 in cls_rcv()

Seen on both 32 and 64bit FSL10.

The text was updated successfully, but these errors were encountered:

wehimwich · 2020-06-09T12:27:20Z

Thank for you for the nice example.

The problem is that inject_snap wants to schedule fserr to retrieve the error message. It is using the obsolete linkage where the information was passed with class-I/O. The new linkage uses a buffer in shared memory in order to avoid class-I/O inside ddout. However, that buffer can't be shared among multiple programs and ddout definitely get priority.

Possible solutions:

Remove using fserr in inject_snap, just print the error code and number, and then pass the error to ddout for expansion in the log. Pro: quick and easy; Con: no local error message.
Redo the interface to fserr to use a different message queue. Pro: local error message, gets rid of shared memory buffer; Con: lots of new infrastructure for one user.
Redo the interface for fserr to use the arg buffer in skd_util.c to pass information in and out. Pro: local error message, simpler than (2), gets rid of shared memory buffer; Con: have to add passing a buffer back to scheduler in the skd_util.c, that might not be too hard and it might be useful for other things.

(3) seems best. One question is how soon is this needed?

dehorsley · 2020-06-10T01:06:10Z

Ahh I was getting close then. I did find that shared memory buffer.

One question is how soon is this needed?

Not urgent, I don't think it needs to hold up 10.0.0

What about a modified version of option 1? Say:

factor out the error expansion of fserr into a routine (in clib say)
inject_snap can do the expansion offline.
fserr and ddout otherwise stay as they are

I don't think it matters so much if inject_snap has to readfserr.ctl.

dehorsley · 2020-06-10T01:07:42Z

Come to think of it, does fserr really need to be an online program? Does anything other than ddout use it? Can ddout parse the '*err.ctl' files on startup and do the error expansion itself?

wehimwich · 2020-06-10T23:48:05Z

I looked at passing a buffer back from a program in skd_util.c That looks like it won't be bad. Then the running of fserr could be a subroutine in clib as you rightly suggest. I think it might be a bit too much to bite off for beta2 though. It is too close to the heart of the FS for quick and dirty work, but I think it could be handled by just adding a couple entry points into skd_util.c and a couple minor change to its existing code to send and retrieve the data using the existing arg buffer. Then the buffer can be removed from shared memory.

In the meantime. inject_snap could changed to print the (non-expanded) error message so there is less cosmetic damage. At least the expanded version goes into the log. That should be okay for beta2.

I am inclined to keep fserr separate from ddout. It would be okay for ddout to read fserr/sterr.ctl once at start-up, but inject_snap would have to read it every time an error is encountered since each instance of fserr is a fresh copy. I suppose it might not be too bad (on a modern computer), but it seems like a lot of thrashing. We'll see.

dehorsley · 2020-06-11T01:41:59Z

In the meantime. inject_snap could changed to print the (non-expanded) error message so there is less cosmetic damage. At least the expanded version goes into the log. That should be okay for beta2.

Sounds good

It would be okay for ddout to read fserr/sterr.ctl once at start-up, but inject_snap would have to read it every time an error is encountered since each instance of fserr is a fresh copy. I suppose it might not be too bad (on a modern computer), but it seems like a lot of thrashing. We'll see.

To quote Knuth, "premature optimization is the root of all evil (or at least most of it)" :)

If it results in easier to manage code, I think we can afford a few milliseconds on the error path. inject_snap only need load the error definitions on error, and on machine with even a modest amount of free memory those files would be cached after the first read (by ddout) anyway.

This partially adddresses #50 until a more complete solution is available.

wehimwich · 2020-06-17T17:35:15Z

To quote Knuth, "premature optimization is the root of all evil (or at least most of it)" :)

Of course. :) I just wonder if it is more work to redo it entirely rather than fix the (now admittedly outdated) optimization. Anyway for the time being, I have pushed a change that completes the first step:

Print the non-expanded error info rather than generate a cls_chk error
Correct or replace entirely the fserr interface

dehorsley added the bug Something isn't working label Jun 9, 2020

wehimwich added a commit that referenced this issue Jun 17, 2020

Fix cls_chk error when trying to expand error message

c650bc9

This partially adddresses #50 until a more complete solution is available.

wehimwich added enhancement New feature or request and removed bug Something isn't working labels Jun 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Illegal class number with inject_snap error handling #50

Illegal class number with inject_snap error handling #50

dehorsley commented Jun 9, 2020

wehimwich commented Jun 9, 2020 •

edited

Loading

dehorsley commented Jun 10, 2020

dehorsley commented Jun 10, 2020

wehimwich commented Jun 10, 2020

dehorsley commented Jun 11, 2020

wehimwich commented Jun 17, 2020 •

edited

Loading

Illegal class number with inject_snap error handling #50

Illegal class number with inject_snap error handling #50

Comments

dehorsley commented Jun 9, 2020

wehimwich commented Jun 9, 2020 • edited Loading

dehorsley commented Jun 10, 2020

dehorsley commented Jun 10, 2020

wehimwich commented Jun 10, 2020

dehorsley commented Jun 11, 2020

wehimwich commented Jun 17, 2020 • edited Loading

wehimwich commented Jun 9, 2020 •

edited

Loading

wehimwich commented Jun 17, 2020 •

edited

Loading