Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some improvements to the socket code. #1081

Merged
merged 8 commits into from
May 8, 2024

Conversation

bcoconni
Copy link
Member

@bcoconni bcoconni commented May 2, 2024

While investigating the "socket" code to address the topics discussed in #1079, I have found a number of small details that could be improved. I don't think these will resolve the discussion #1079 but I'm pushing this PR nonetheless for the record.

Minor changes / documentation

  • The first thing that stroke me is that the 2 constructors of FGfdmSocket have actually very different purposes as one constructor is returning a socket for a client connection while the other one returns a socket for a server connection. To document that, I have added comments to FGfdmSocket.h. Note that the documentation has been generated by an AI (GitHub Copilot) and reviewed by a non native English speaker (myself) so brace yourself before reading 😄
  • I've moved connected=true to a location that makes its purpose more obvious as the previous location of this line of code was misleadingly suggesting that connected=true was indicating that the call to accept() had succeeded.
  • In FGInputSocket.cpp, when the command quitis issued to JSBSim, the method FGfdmSocket::Send() is now called to return "Closing connection" instead of calling FGfdmSocket::Reply(). This avoids displaying the prompt JSBSim> just after the message "Connection closed" which looks weird to me.

Improvements to the error management

  • If the calls to bind() or listen() fail, FGfdmSocket now closes the socket sckt and sets it to INVALID_SOCKET which is a good practice in order to release unused file descriptors to the OS (see StackOverflow Should I call close() on bind() or listen() error)
  • Following the same idea, the destructor ~FGfdmSocket() shutdowns sckt_in and closes sckt (in that order otherwise it crashes according to my tests) to release the file descriptors to the OS.
  • The call to recv in FGfdmSocket::Receive() is now enclosed in an if (Protocol == ptTCP) { ... } as calling recv is meaningless for UDP sockets.
  • It is now checked that sckt != INVALID_SOCKET before calling send in FGfdmSocket::Send(). Also send is now using sckt_in if the socket is using the TCP protocol. Indeed this method is called from the class FGOutputSocket which can define a socket with either protocol UDP or TCP.
  • It is now checked that sckt != INVALID_SOCKET before calling select in FGfdmSocket::WaitUntilReadable(). Also I have added an assert to check that Protocol == ptTCP so that we are sure that this method is never called for a UDP socket. At the moment, FGfdmSocket::WaitUntilReadable() is only called by the class FGInputSocket which is guaranteed to be a TCP socket so the assert passes.

Copy link

codecov bot commented May 2, 2024

Codecov Report

Attention: Patch coverage is 0% with 59 lines in your changes are missing coverage. Please review.

Project coverage is 24.91%. Comparing base (c67311c) to head (cbfb902).
Report is 2 commits behind head on master.

Files Patch % Lines
src/input_output/FGfdmSocket.cpp 0.00% 57 Missing ⚠️
src/input_output/FGInputSocket.cpp 0.00% 1 Missing ⚠️
src/input_output/FGfdmSocket.h 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1081      +/-   ##
==========================================
- Coverage   24.95%   24.91%   -0.05%     
==========================================
  Files         168      168              
  Lines       18072    18104      +32     
==========================================
  Hits         4510     4510              
- Misses      13562    13594      +32     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@seanmcleod
Copy link
Member

seanmcleod commented May 2, 2024

Yep, I've found the socket code over the years when I've needed to take a look at things always a bit confusing in terms of the same class handling client versus server plus tcp and udp all in one.

Will take a look at the commits.

first thing that stroke me

You did mention not being a native English speaker 😉 Should be "...that struck me..", like being struck by a bolt of lightning as opposed to stroking a cat.

@jonsberndt
Copy link
Contributor

jonsberndt commented May 2, 2024 via email

@bcoconni
Copy link
Member Author

bcoconni commented May 3, 2024

Yep, I've found the socket code over the years when I've needed to take a look at things always a bit confusing in terms of the same class handling client versus server plus tcp and udp all in one.

Indeed but I'm wondering which would be the more painful ? A Swiss army knife class such as FGfdmSocket, a collection of independent classes with lots of code duplication or a pure object-oriented class hierarchy which avoids code duplication but is difficult to grasp ?

first thing that stroke me

You did mention not being a native English speaker 😉 Should be "...that struck me..", like being struck by a bolt of lightning as opposed to stroking a cat.

Haha 😆 You've been warned !
Well, I confirm that these server/client constructors struck me but did not stroke me 😄

@bcoconni
Copy link
Member Author

bcoconni commented May 3, 2024

I have to confess that when I wrote the initial implementation of the telnet interface I kind of made it up as I went. It was the first time I wrote any socket code. Feels like a hundred years ago. 😊

I guess that's where we all are. I'm no socket/network code specialist either and I learn as I go. But note that the current code still looks very much like the code you wrote many years ago, that's not so bad for "made up" code !

@bcoconni
Copy link
Member Author

bcoconni commented May 3, 2024

I have just pushed another commit to this PR:

  • Code legibility improvement: new #defines for closesocket and SD_BOTH have been added to remove a number of #ifdef _WIN32.
  • Error management: the value returned by select is now tested for errors. Also the call to select() is now using a syntax that should be compatible with both Windows and BSD/Unix platforms. In particular, the first parameter of select has been set to FD_SETSIZE:
    • This first parameter is ignored by Windows anyway (see their documentation about select)
    • For BSD/Unix platforms, FD_SETSIZE is the maximum number of socket descriptors which avoid weird situation where sckt_in == FD_SETSIZE and sckt_in+1>FD_SETSIZE. In addition, this makes this piece of code more robust as it no longer depends on the value of sckt_in.

@bcoconni
Copy link
Member Author

bcoconni commented May 3, 2024

Another commit to the PR to check the returned value of socket functions and print an error message when SOCKET_ERROR is returned.

@bcoconni
Copy link
Member Author

bcoconni commented May 3, 2024

Following my previous commit, I now see that Windows is returning lots of errors "Socket error: A non-blocking socket operation could not be completed immediately" in TestInputSocket.py. Similarly, Linux is returning lots of "Socket error: Resource temporarily unavailable". I guess these are due to the fact that telnet and JSBSim are running synchronously on the same thread in TestInputSocket.py so the telnet socket is not responding immediatly when JSBSim is interrogating it.

In addition, the messages lack context (where the error message has been issued from ?) and are flooding the console. So I have pushed one last commit to add context to the messages and print them only if debug_lvl is strictly positive.

@seanmcleod
Copy link
Member

seanmcleod commented May 3, 2024

In terms of tons of "error messages" now, I think this particular case of printing/logging an error is misplaced and shouldn't be there.

while ((num_chars = recv(sckt_in, buf, sizeof buf, 0)) > 0) {
data.append(buf, num_chars);
}
if (num_chars == SOCKET_ERROR) PrintSocketError("Receive - TCP data reception");

So, in this case since we've set the socket for the telnet connection to be non-blocking, the return value of SOCKET_ERROR/-1 isn't necessarily an error, particularly when the error code/reason is Would Block. The OS is simply telling us in this case that there is no waiting data from the telnet client.

And in particular if the FDM is running at 120Hz and you have say a human sending intermittent telnet commands there are going to be thousands of these sorts of "error messages" logged in just say a 10s period when the human isn't sending a telnet command.

So I would change this code to only log an error message along these lines (having to still deal with Windows vs Unix last error checking differences):

if (num_chars == SOCKET_ERROR && WSAGetLastError() != WSAEWOULDBLOCK)
   PrintSocketError("Receive - TCP data reception");

@bcoconni
Copy link
Member Author

bcoconni commented May 4, 2024

@seanmcleod Good point ! I have amended the code according to your feedback.

@seanmcleod
Copy link
Member

Yep, looks good.

One more thing 😉

As I mentioned in the original discussion - #1079 I'm pretty sure that the following block of code, although it can also be simplified, needs to be executed for Unix systems as well.

#ifdef _WIN32
// when nothing received and the error isn't "would block"
// then assume that the client has closed the socket.
if (num_chars == 0) {
DWORD err = WSAGetLastError();
if (err != WSAEWOULDBLOCK) {
cout << "Socket Closed. Back to listening" << endl;
closesocket(sckt_in);
sckt_in = INVALID_SOCKET;
}
}
#endif

In other words, in the case that num_chars == 0 we need to perform the socket shutdown for all operating systems.

So I'd suggest replacing that #ifdef block with.

if (num_chars == 0) {
  PrintSocketError("Receive - TCP data reception- Socket Closed. Back to listening"); 
  closesocket(sckt_in); 
  sckt_in = INVALID_SOCKET;   
}

This would fix the issue that the user reported when running JSBSim on Mac (Unix based), i.e. killing the telnet connection with ctrl-c or equivalent as opposed to sending quit via the telnet connection resulted in her not being able to re-establish a telnet connection.

Actually, thinking about it, we probably want to perform this also in the case of -1 (SOCKET_ERROR) being returned when last error isn't WOULDBLOCK.

@seanmcleod
Copy link
Member

It seems there is always one more thing with regards to the socket code 😉

So in the UDP case, in the same Receive() method, note that the UDP socket has also been set to be non-blocking in the ctor. Which means we can expect to get a lot of returns on line 314 with num_chars == SOCKET_ERROR that aren't really errors to log. We need similar logic that we have above for the TCP case.

// this is for FGUDPInputSocket
if (sckt != INVALID_SOCKET && Protocol == ptUDP) {
struct sockaddr addr;
socklen_t fromlen = sizeof addr;
int num_chars = recvfrom(sckt, buf, sizeof buf, 0, (struct sockaddr*)&addr, &fromlen);
if (num_chars > 0) data.append(buf, num_chars);
if (num_chars == SOCKET_ERROR) PrintSocketError("Receive - UDP data reception");
}

…uld block" + some further error management improvements.
@bcoconni
Copy link
Member Author

bcoconni commented May 7, 2024

@seanmcleod Just pushed another commit that implements your suggestions (both of them).

I took this opportunity to add a check of the value returned by send when sending the prompt JSBSim>:

send(sckt_in, "JSBSim> ", 8, 0);

I've also added some additional asserts in FGfdmSocket::Close() and FGfdmSocket::Reply() as both of these methods are using sckt_in which is only initialized for TCP sockets.

@seanmcleod
Copy link
Member

@bcoconni it looks good.

@bcoconni bcoconni merged commit 56c6662 into JSBSim-Team:master May 8, 2024
30 checks passed
@bcoconni bcoconni deleted the fix-socket2 branch May 8, 2024 21:01
@bcoconni
Copy link
Member Author

bcoconni commented May 9, 2024

Thanks for the detailed review, @seanmcleod
The PR is now merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants