Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illegal class number with inject_snap error handling #50

Open
dehorsley opened this issue Jun 9, 2020 · 6 comments
Open

Illegal class number with inject_snap error handling #50

dehorsley opened this issue Jun 9, 2020 · 6 comments
Labels
enhancement New feature or request

Comments

@dehorsley
Copy link
Member

If the snap command given to inject_snap generates an error, the code to print the message back to the user hits an error eg:

$ inject_snap -w blag
cls_chk: illegal class number 0 pid 25020
cls_chk error 1 in cls_rcv()

Seen on both 32 and 64bit FSL10.

@dehorsley dehorsley added the bug Something isn't working label Jun 9, 2020
@wehimwich
Copy link
Member

wehimwich commented Jun 9, 2020

Thank for you for the nice example.

The problem is that inject_snap wants to schedule fserr to retrieve the error message. It is using the obsolete linkage where the information was passed with class-I/O. The new linkage uses a buffer in shared memory in order to avoid class-I/O inside ddout. However, that buffer can't be shared among multiple programs and ddout definitely get priority.

Possible solutions:

  1. Remove using fserr in inject_snap, just print the error code and number, and then pass the error to ddout for expansion in the log. Pro: quick and easy; Con: no local error message.
  2. Redo the interface to fserr to use a different message queue. Pro: local error message, gets rid of shared memory buffer; Con: lots of new infrastructure for one user.
  3. Redo the interface for fserr to use the arg buffer in skd_util.c to pass information in and out. Pro: local error message, simpler than (2), gets rid of shared memory buffer; Con: have to add passing a buffer back to scheduler in the skd_util.c, that might not be too hard and it might be useful for other things.

(3) seems best. One question is how soon is this needed?

@dehorsley
Copy link
Member Author

Ahh I was getting close then. I did find that shared memory buffer.

One question is how soon is this needed?

Not urgent, I don't think it needs to hold up 10.0.0

What about a modified version of option 1? Say:

  • factor out the error expansion of fserr into a routine (in clib say)
  • inject_snap can do the expansion offline.
  • fserr and ddout otherwise stay as they are

I don't think it matters so much if inject_snap has to readfserr.ctl.

@dehorsley
Copy link
Member Author

Come to think of it, does fserr really need to be an online program? Does anything other than ddout use it? Can ddout parse the '*err.ctl' files on startup and do the error expansion itself?

@wehimwich
Copy link
Member

I looked at passing a buffer back from a program in skd_util.c That looks like it won't be bad. Then the running of fserr could be a subroutine in clib as you rightly suggest. I think it might be a bit too much to bite off for beta2 though. It is too close to the heart of the FS for quick and dirty work, but I think it could be handled by just adding a couple entry points into skd_util.c and a couple minor change to its existing code to send and retrieve the data using the existing arg buffer. Then the buffer can be removed from shared memory.

In the meantime. inject_snap could changed to print the (non-expanded) error message so there is less cosmetic damage. At least the expanded version goes into the log. That should be okay for beta2.

I am inclined to keep fserr separate from ddout. It would be okay for ddout to read fserr/sterr.ctl once at start-up, but inject_snap would have to read it every time an error is encountered since each instance of fserr is a fresh copy. I suppose it might not be too bad (on a modern computer), but it seems like a lot of thrashing. We'll see.

@dehorsley
Copy link
Member Author

In the meantime. inject_snap could changed to print the (non-expanded) error message so there is less cosmetic damage. At least the expanded version goes into the log. That should be okay for beta2.

Sounds good

It would be okay for ddout to read fserr/sterr.ctl once at start-up, but inject_snap would have to read it every time an error is encountered since each instance of fserr is a fresh copy. I suppose it might not be too bad (on a modern computer), but it seems like a lot of thrashing. We'll see.

To quote Knuth, "premature optimization is the root of all evil (or at least most of it)" :)

If it results in easier to manage code, I think we can afford a few milliseconds on the error path. inject_snap only need load the error definitions on error, and on machine with even a modest amount of free memory those files would be cached after the first read (by ddout) anyway.

wehimwich added a commit that referenced this issue Jun 17, 2020
This partially adddresses #50 until a more complete solution is available.
@wehimwich
Copy link
Member

wehimwich commented Jun 17, 2020

To quote Knuth, "premature optimization is the root of all evil (or at least most of it)" :)

Of course. :) I just wonder if it is more work to redo it entirely rather than fix the (now admittedly outdated) optimization. Anyway for the time being, I have pushed a change that completes the first step:

  • Print the non-expanded error info rather than generate a cls_chk error
  • Correct or replace entirely the fserr interface

@wehimwich wehimwich added enhancement New feature or request and removed bug Something isn't working labels Jun 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants