Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple processes running concurrently using libmodbus over Modbus-Rtu #731

Closed
hfelek opened this issue Dec 27, 2023 · 6 comments
Closed

Comments

@hfelek
Copy link

hfelek commented Dec 27, 2023

libmodbus version

 3.1.10

OS and/or distribution

 Raspberrry Pi 4 , Raspberry OS Linux

Environment

  Architecture:                    armv7l
  Byte Order:                      Little Endian
  CPU(s):                          4
  Vendor ID:                       ARM
  Model name:                      Cortex-A72
  CPU max MHz:                     1500.0000
  CPU min MHz:                     600.0000

Description

 Two processes are created to scan two devices over two UART ports of RaspberryPi 4 using modbus-rtu with RS485 connection. 

In both of the processes libmodbus library is used. When either of the processes runs by itself, I can make read-write operations without any problem. However when running two processes are run concurrently(with 2 seconds delay), the process started later on can't execute long register operations(120 registers , 2 bytes each) . "Connection timed out" error is returned from modbus_read_registers or modbus_write_registers. When I dig into the related functions, I realized Src/modbus-rtu.c -> _modbus_rtu_select function can not handle to let make operations on the related file descriptors. Changing the byte timeout and response timeouts just delays the time error is returned. and still either of the processes doesn't function properly. These two processes are scanning the devices with baudrate of 115200. When I change the baudtate for one of the devices to 2000000, one process still functions properly, the other process can read long register however on write operations(regardless of the register length ) error ratio becomes %50 on a continuous operation cycle.

I haven't seen any previous issues using libmodbus on concurrently running two processes. I have been trying to figure out the issue or the solution for a week but I couldn't move forward more. I can give more details in case of any questions but issue seems to be software related but not hardware. To mention when tcp/ip is used in either or both of the processes, no issue is observed.

libmodbus output with debug mode enabled

For the process started later on.
Reading device registers!
[01][03][00][00][00][64][44][21]
Waiting for a confirmation...
ERROR Connection timed out: select
Connection timed out
<01><03><01><06><02><00><49><52><2D><4F><33><32><31><30><30><30><30><30><00><00><00><00><00><80><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><7C>Devices' identification process has failed!

@mhei
Copy link
Contributor

mhei commented Dec 27, 2023

You should tell us more about the Modbus server device, the used hardware (RS-232/RS-485...) on the Raspi side, what UART drivers are involved etc.
I cannot imagine that this is a libmodbus issue, most probably a timing issue in the whole setup. What timeouts did you use? What timeout is configured on the server device? how long does a usual response takes? Can you provide a full trace of a good-case and a bad one?
I cannot really understand your trace above: you request to read 100 registers starting at register address 0. But the response length does not seem to match (3rd byte is 0x01, but should be 0xc8 for 100x 2 byte = 200 byte) - or if the length is assumed to be correct, then the response of the server is just too long.

@hfelek
Copy link
Author

hfelek commented Dec 28, 2023

Yes you are right on the details of the issue. I have looked probable sources of the problem in different setups. I also agree on whether the problem is a libmodbus issue but on different OS and hardware behavior of the implementation may vary.

I will write my real setup to give more insight to the issue. My first comment setup was tried to see whether there is a software issue on the processes.

Kernel release version: 5.15.84-v7l+
OS version: Raspbian GNU/Linux 11 (bullseye)

Setup

This is the actual setup I want to use on my final project. There are two programs running on different UART ports(both PL011 UARTs) of RPI4 with baud rates of 115200 and 2000000. I will call the process with 115200 baud rate as 'Process 1' and process with 2000000 baud rate as 'Process 2' to make statements more clear.

Process 1

        "Baudrate": 2000000,
        "Byte Timeout": {
            "sec": 0,
            "usec":50
        },
        "Response Timeout": {
            "sec": 0,
            "usec": 20000
        },

Step 1: Available RS485 slaves are detected.
Step 2: Continuous Write-Read Partition in an infinite loop. Write operation to 16 registers with 'modbus_write_registers' and read operation from 1 register.

Process 2

        "Baudrate": 115200,
        "Byte Timeout": {
            "sec": 0,
            "usec":50
        },
        "Response Timeout": {
            "sec": 0,
            "usec": 20000
        },

Baud rate: 115200
Step 1: Available RS485 slaves are already known. 100 - 100 - 26 length read registers operations is executed in three steps.
Step 2: Continuous Write-Read Partition in an infinite loop. Write operation to 15 registers with 'modbus_write_registers' and read operation from 1 register.

#Tests

//Single run of Process 1 - Step 2: logs in a continuous loop.
Total number of scan cycles -> 7000
Slot Number: 1, Successful: 7000, Failed: 0 
Duration for 1000 main cycles: 3.121561 seconds
Total number of scan cycles -> 8000
Slot Number: 1, Successful: 8000, Failed: 0 
//Single run of Process 2 - Step 2: logs in a continuous loop.
Duration for 1000 main cycles: 7.027475 seconds
Slot Number: 1, Successful: 2000, Failed: 0 
Duration for 1000 main cycles: 7.039133 seconds
Slot Number: 1, Successful: 3000, Failed: 0 
//Process 1 logs when both process are running 
Total number of scan cycles -> 3000
Slot Number: 1, Successfull: 1268, Failed: 1732 
Duration for 1000 main cycles: 14.515857 seconds
Total number of scan cycles -> 4000
Slot Number: 1, Successful: 1786, Failed: 2214 
Duration for 1000 main cycles: 12.842769 seconds
//Process 2 logs when both process are running 
Duration for 1000 main cycles: 7.081596 seconds
Slot Number: 1, Successful: 32000, Failed: 0 
Duration for 1000 main cycles: 7.081268 seconds
Slot Number: 1, Successful: 33000, Failed: 0 
Duration for 1000 main cycles: 7.033406 seconds
Slot Number: 1, Successful: 34000, Failed: 0 

These logs may be unnecessary but I feel like it may lead to to source of the problem with logs given in other cases.
To summarize while each process is running by itself we don't have communication problem but while two processes are running concurrently communication problems start to occur. I tried to read-write different registers sizes and what I've seen as follows for Process 1 ->
16-length register write,1-length register read -> all the errors occur on write operation and read operation functions properly
16-length register write,16-length register read -> errors occur on both r-w operations in similar ratios.
1-length register write,16-length register read -> less errors occur on both r-w operations. Most of the errors occur on read operations. Logs for this case is given below: 'r' and 'w' log means on which operation error occurred.

Duration for 1000 main cycles: 3.587441 seconds
rrwrwrwrwrwrwrrrrrrTotal number of scan cycles -> 43000
Slot Number: 1, Successful: 42816, Failed: 184 
Duration for 1000 main cycles: 3.588082 seconds
rrrrrTotal number of scan cycles -> 44000
Slot Number: 1, Successfull: 43811, Failed: 189 
Duration for 1000 main cycles: 3.537136 seconds
rrrrrrrrrrrTotal number of scan cycles -> 45000
Slot Number: 1, Successful: 44800, Failed: 200 

Error log for Process 1 while both processes are running concurrently

Waiting for a confirmation...
<01><03><02><00><00><B8><44>
[01][10][00][0A][00][10][20][00][01][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][92][A9]
Waiting for a confirmation...
<01><10><00><0A><00><10><E1><C7>
[01][03][00][1A][00][01][A5][CD]
Waiting for a confirmation...
<01><03><02><00><00><B8><44>
[01][10][00][0A][00][10][20][00][01][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][92][A9]
Waiting for a confirmation...
ERROR Connection timed out: select
w[01][03][00][1A][00][01][A5][CD]
Waiting for a confirmation...
<01><03><02><00><00><B8><44>
[01][10][00][0A][00][10][20][00][01][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][92][A9]
Waiting for a confirmation...
ERROR Connection timed out: select
w[01][03][00][1A][00][01][A5][CD]
Waiting for a confirmation...
<01><03><02><00><00><B8><44>
[01][10][00][0A][00][10][20][00][01][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][92][A9]

Same timeouts may seem odd for different baud rates but I wanted to put with the ones I have tested. Increasing timeout for the process reduces communication errors but 1000 cycles duration increases proportional to timeout with less communication errors.

Mentioning I am not that experienced with Linux side , I think that problem is occurring due to interrupt and process scheduling latencies within OS. I haven't tried my setup on different OS but the problem doesn't seem to be hardware issue. Times calculated for checking related file descriptors may be changed on libmodbus side for this case.

@watsocd
Copy link

watsocd commented Dec 28, 2023 via email

@mhei
Copy link
Contributor

mhei commented Dec 29, 2023

The timeouts are really tight and with respect to the baudrates, I wonder whether the whole approach with Linux as none-RTOS system makes sense. As already mentioned, I'd also try to slow down. And since you use RS-485, you can try to connect a 3rd observer system to each RS-485 line and let is sniff into the Modbus traffic. So you can at least see whether the Modbus server's replies are correct and complete and so on...

@hfelek
Copy link
Author

hfelek commented Jan 3, 2024

To let everyone know, issue is related to RPi4 UART driver. Changes on related chip's driver must be done. There are related discussions on Raspberry forum.

Thank you for the help @mhei .

@stephane
Copy link
Owner

OK, thank you for the update @hfelek and @mhei for the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants