Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition when tracing is enabled #135

Open
wants to merge 5,588 commits into
base: master
Choose a base branch
from

Conversation

simonbeaudoin0935
Copy link

@simonbeaudoin0935 simonbeaudoin0935 commented Jul 20, 2020

I discovered a very snicky and tricky race condition scenario when integrating tracealyzer code into our project.

I suspect it might be a problem with other ports as well that behave like the CortexR5

A little background on CortexR5 : When the IRQ line (comming from the interrupt controller, to which every peripheral IRQ lines connect) of the processor rises and the IRQ Enable bit in the status register of the CPU permits it, the CPU traps into interrupt mode. Several things happen. First, the CPU finishes the instruction it was performing. Second, it places the content of the CPSR register into the SPSR_irq register. And third, the mode of the CPU is changed to IRQ_Mode and /!\ THE IRQ ENABLE BIT IN CPSR_irq IS AUTOMATICALLY CLEARED /!\ . The reason is to ensure that upon landing into IRQ code, we find ourselves automatically in a critical section because we cannot be interrupted again because the bit is cleared. The programmer can, if he wants, re-enable IRQs inside IRQ code itself to allow interrupt nesting. But it has to be wanted and meditated.

Now, inside portASM.S, at the end of 'FreeRTOS_IRQ_Handler' assembly function, a call to 'vTaskSwitchContextConst' is made if the variable 'ulPortYieldRequired' was set by someone while executing the interrupt. Before branching to that function, a 'CPSID i' instruction was placed to ensure that interrupts are disabled in case someone re-enabled it. Inside 'vTaskSwitchContext', there is the macro 'traceTASK_SWITCHED_OUT' that gets populated when tracing is enabled.

The bug is right there.. If the macro is populated and inside that macro there is a matching call to 'ulPortSetInterruptMask' and 'vPortClearInterruptMask', a race condition can occure is there is a OS Tick timer interrupt waiting at the interrupt controller's door. Upon calling 'vTaskSwitchContext', the interrupts might not masked in the interrupt controller, the only barrier against the CPU servicing that tick interrupt while already performing the function is that the IRQ Enable bit cleared. 'ulPortSetInterruptMask'
does what's its supposed to do, but doesn't take into account the IRQ Enable bit in CPSR. Wheter or not the bit was cleared, the function sets it at the end. When calling the matching 'vPortClearInterruptMask', the function clears the interrupt mask in the interrupt controller. Because the IRQ Enable bit (that was cleared) has been set no matter what in 'ulPortSetInterruptMask', the CPU services the OS Tick Interrupt right away.

The bug is there : instead of completing the 'vTaskSwitchContext' function, the CPU re-enters the switch context path right after 'traceTASK_SWITCHED_OUT' thus corrupting the CPU state and eventually triggering either an undefined instruction, data or instruction abort.

I know it's a very specific scenario (FreeRTOS + Trace added and OS Tick interrupt waiting just at the right time), but it was a big problem for us. The fact that the problem would occur frequently enough to be fixed had to to with the fact that each call to tracing macro would take some time; enough for a OS Tick to happen. We used tracealyser in streaming mode, thus data would get flushed out to the fast uart in the FPGA at each passage onto a trace macro. That's how this vulnerability was exposed.

The same pull request has been opened on FreeRTOS-Kernel repo since it's not referenced by this one.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Srinivas Neeli and others added 30 commits June 4, 2020 10:27
CANFD2.0 core uses BRAM for storing acceptance filter ID and MASK
registers. So by default acceptance filter ID(AFID) and MASK (AFMASK)
registers updating with random data. Due to random data on AFID and
AFMASK registers, not able to receive all CAN id's.

Initializing AFID and AFMASK with Zero before enabling Acceptance
filter to receive all packets irrespective of ID and Mask.

Signed-off-by: Srinivas Neeli <[email protected]>

Acked-for-series: Naga Sureshkumar Relli <[email protected]>
- update the lmx and lmk configuration with the
new settings.
- add the adc and dac frequency lists
- apply clang-format

Signed-off-by: Dragan Cvetic <[email protected]>
Acked-by: Conall O'Griofa <[email protected]>

Acked-by: Conall O'Griofa <[email protected]>
Update doxygen comments in xilpm client library of ZynqMP
and Versal for documentation purpose.

Signed-off-by: Tejas Patel <[email protected]>
Acked-by: Jyotheeswar Reddy Mutthareddyvari <[email protected]>

Acked-by: Jyotheeswar Reddy Mutthareddyvari <[email protected]>
H10 chip topology has some new nodes. That includes CPMv5
(power node), GTYP and GTM (device) nodes. Added these as
part of topology support in PM library

Signed-off-by: Amit Sunil Dhamne <[email protected]>
Acked-by: Jyotheeswar Reddy Mutthareddyvari <[email protected]>

Acked-by: Jyotheeswar Reddy Mutthareddyvari <[email protected]>
PL devices should not be down during init finalize if PL subsystem
is online. So check for PL subsystem state before powering down
PL in init finalize.

Signed-off-by: Rajan Vaja <[email protected]>
Acked-by: Jolly Shah <[email protected]>

Acked-by: Jolly Shah <[email protected]>
…ller's I2C.

This is a fix for the I2C multi-master conflict arising in I2C Controllers.
All the I2C masters should have the same frequency.

Signed-off-by: Raviteja Narayanam <[email protected]>

Acked-by: Amanda Nicole Baze <[email protected]>
Add an example to test SD raw transfers to and from SD/eMMC.

Signed-off-by: Manish Narani <[email protected]>

Acked-for-series: Sai Krishna Potthuri <[email protected]>
To support automation testing, add dependencies.props file at
"xilpm/data/dependencies.props" for xilpm example.

Signed-off-by: Ravi Patel <[email protected]>
Acked-by: Jolly Shah <[email protected]>

Acked-by: Jolly Shah <[email protected]>
1. Correct include of XRFClk_Init for ZCU111
2. Add the LMK, ADC and DAC LMX config array dimensions
3. Add ADC and DAC frequency list for ZCU111

Signed-off-by: Dragan Cvetic <[email protected]>
Acked-by: Anand Ashok Dumbre <[email protected]>

Acked-by: Anand Ashok Dumbre <[email protected]>
Set NPI Root PCSR PCOMPLETE bit to indicate that NPD is operational
. This is done after Housecleaning is complete. Used to indicate to
debugging tools that NPI root is initialized and can be used to
access NPI slaves

Signed-off-by: Amit Sunil Dhamne <[email protected]>
Acked-by: Jolly Shah <[email protected]>

Acked-by: Jolly Shah <[email protected]>
Assert macros for non void functions are returning 0 on assert
failure. It is incorrect, as 0 is return code  for success.

Signed-off-by: Mubin Usman Sayyed <[email protected]>
Acked-by: Siva Durga Prasad Paladugu <[email protected]>

Acked-by: Siva Durga Prasad Paladugu <[email protected]>
Updated Changelog for uartps driver

Signed-off-by: Raviteja Narayanam <[email protected]>

Acked-for-series: Srinivas Goud <[email protected]>
Currently the driver enables all the modes including 6G and 12G modes
even when only 3G mode is supported from IP configuration. This leads to
assertions while enabling the modes.

This patch fixes the assertion by enabling the modes based on IP Line
rate configuration.

Signed-off-by: Vishal Sagar <[email protected]>
Acked-by: Praveen Vuppala <[email protected]>

Acked-by: Praveen Vuppala <[email protected]>
To avoid the unnecessary triggering of a power up/down request,
check the power state first.

Signed-off-by: Ravi Patel <[email protected]>

Acked-for-series: Felix Burton <[email protected]>
Existing PSM->PLM IPI communication is blocking which may leads to
dead lock condition if PSM and PLM sends IPI to each other at
nearly same time and waiting for each other to process IPI.

To avoid this, make PSM->PLM IPI as non-blocking call by using
shared PSM RAM address to log the power up/down events and act
on those events at PLM side.
When any direct power up/down interrupt occurs, PSM sets
corresponding event at shared address and triggers IPI interrupt
(without waiting for IPI to free) to notify PLM that PSM event is
arrived. On the other side, PLM will process all the events
(which is set by PSM) and clears it when processed.

Signed-off-by: Ravi Patel <[email protected]>

Acked-for-series: Felix Burton <[email protected]>
To support DP v1.2 protocol, the DP159 retimer chip had to be programmed.
This patch removes support for DP v1.2 protocol by removing the DP159
programming.This makes the dp14rxss support only DP v1.4 protocol.

Signed-off-by: Rajesh Gugulothu <[email protected]>

Acked-for-series: Vishal Sagar <[email protected]>
Update changelog for axicdma driver.

Signed-off-by: Radhey Shyam Pandey <[email protected]>

Acked-for-series: Harini Katakam <[email protected]>
This patch removes magic number used in XSecure_SssDmaLoopBack

Signed-off-by: Harsha <[email protected]>

Acked-for-series: VNSL Durga Challa <[email protected]>
This reverts commit 3d2e0d6.

IAR compiler is throwing compilation error given below for the
functions which are returning data types other than int. Hence
reverting this commit to avoid IAR build failures.

[Pe120]: return value type does not match the function type

Signed-off-by: Mubin Usman Sayyed <[email protected]>
Acked-by: Appana Durga Kedareswara rao <[email protected]>
Signed-off-by: Kapil Usgaonkar <[email protected]>

Acked by : Pankaj Kumbhare <[email protected]>
Signed-off-by: Niharika S <[email protected]>
Acked-by: Kondalarao Polisetti [email protected]
Added golden string "Successfully ran" to example pass console
message.

Signed-off-by: Mubin Usman Sayyed <[email protected]>
Acked-by: Siva Durga Prasad Paladugu <[email protected]>

Acked-by: Siva Durga Prasad Paladugu <[email protected]>
Updated changelog file for wdttb driver.

Signed-off-by: Srinivas Neeli <[email protected]>

Acked-for-series: Srinivas Goud <[email protected]>
This patch adds support of Xilfpga for Versal platform.
As of now it supports two APIs, XFpga_Initialize() and
XFpga_PL_BitStream_Load().

Signed-off-by: Nava kishore Manne <[email protected]>

Acked-for-series: Appana Durga Kedareswara rao <[email protected]>
Add extern C in the AIE driver header, as the AIE driver is written
in C, but it can be used by CPP application.

Signed-off-by: Wendy Liang <[email protected]>

Acked-for-series: Tejus Siddagangaigah <[email protected]>
Add the following timer APIs:
* set timer trigger low/high event value API
* reset timer API
* set reset timer event API

Signed-off-by: Wendy Liang <[email protected]>

Acked-for-series: Tejus Siddagangaigah <[email protected]>
This patch adds "Successfully ran" golden string to XilPuf
examples.

Signed-off-by: Kalyani Akula <[email protected]>

Acked-for-series: VNSL Durga Challa <[email protected]>
This patch updates XilNvm examples with "Successfully ran"
golden string.

Signed-off-by: Kalyani Akula <[email protected]>

Acked-for-series: VNSL Durga Challa <[email protected]>
…e to one subsystem

This routine checks all subsystems that could request a device to determine whether only one subsystem has allocated the device at a time.

Signed-off-by: Saeed Nowshadi <[email protected]>

Acked-for-series: Jolly Shah <[email protected]>
Kalyani Akula and others added 20 commits June 4, 2020 11:22
This patch moves few xilnvm_utils APIs to xilnvm efuse
example as those are not common to both all XilNvm components.
And removed all conversion APIs added as those are no more
required and we are acheiving the same functionality using
bit-wise operators in XilNvm.

Signed-off-by: Kalyani Akula <[email protected]>

Acked-for-series: VNSL Durga Challa <[email protected]>
This patch addressess most of the optimization comments.
- Removing the use Convesion APIs to convert bit array
  to byte array and acheiving the same functionality with
  bit-wise operators.
- Reduced number of cache reloads.
- Added a single interface for writing any eFuse.
- Common logic is added to single API
- Updated independent APIs to provide easy access to user
  for run time usage.

Signed-off-by: Kalyani Akula <[email protected]>

Acked-for-series: VNSL Durga Challa <[email protected]>
This patch resolves review comments in xilnvm_efuse_versal_example.
- Removes Puf sec ctrl bits programming as it is moved to xilpuf.
- Made some restruturing in example so that only once the library
 API is called to program all requested eFuses.

Signed-off-by: Kalyani Akula <[email protected]>

Acked-for-series: VNSL Durga Challa <[email protected]>
This patch updates xilpuf_example.c with updated xilnvm
struture names and variables.

Signed-off-by: Kalyani Akula <[email protected]>

Acked-for-series: VNSL Durga Challa <[email protected]>
This patch updates version-less source in misc folder

Signed-off-by: Sharath Kumar Dasari <[email protected]>

Acked-by: Vikram Sreenivasa Batchali <[email protected]>
This patch updates MiscCtrlBits pointer in XNvm_EfuseData
structure when PPK2_INVLD macro is set in xilnvm efuse example.

Signed-off-by: Kalyani Akula <[email protected]>
Acked-by: VNSL Durga Challa <[email protected]>
…ZCU104 and ZCU111 boards if I2C0 is not enabled in design"

This reverts commit fd234fe.

Signed-off-by: Vikram Sreenivasa Batchali <[email protected]>
Acked-by: Krishna Chaitanya Patakamuri <[email protected]>
This patch updates following changes in the examples
    - Style related issues
    - Adding goto END whenever failure is encountered
    - Updated help for user config parameters in xilpuf_example.h

Signed-off-by: Harsha <[email protected]>

Acked-for-series: Mohan Marutirao Dhanawade <[email protected]>
This patch replaces the if else condition with conditional macros and adds
generate PUF Key.

Signed-off-by: Harsha <[email protected]>

Acked-for-series: Mohan Marutirao Dhanawade <[email protected]>
This patch replaces the for loop used to print array in the examples
by XPuf_ShowData function.It also removes printing of GCM tag for xilpuf_example.c as
it is not required.

Signed-off-by: Harsha <[email protected]>

Acked-for-series: Mohan Marutirao Dhanawade <[email protected]>
Currently due to the manner in which the PUF helper data is printed, there is a mismatch
in the endianness of the syndrome data and chash and aux. This patch fixes the issue by
printing the helper data using an array so that there is no mismatch in endianness.

Signed-off-by: Harsha <[email protected]>

Acked-for-series: Mohan Marutirao Dhanawade <[email protected]>
This patch corrects the endianness of the black key before programming so
that the programmed black key can be used for secure boot.

Signed-off-by: Harsha <[email protected]>

Acked-for-series: Mohan Marutirao Dhanawade <[email protected]>
removing Xilinx copyright and licenses identifier for aes256.c and .h files
as these are not owned by Xilinx.
and removed extra copyright line for v_hdmirxss driver.

Signed-off-by: Manikanta Sreeram <[email protected]>
Acked-by: Prasad Gutti <[email protected]>
Acked-by: Prasad Gutti <[email protected]>
Updated src version with mdd version for rfdc

Signed-off-by: Meena Paleti <[email protected]>

Acked-by: Siva Addepalli<[email protected]>
This patch adds zeroization of the BBRAM key before every write.
As per hardware design, without zeroization before every write,
CRC check will fail for BBRAM write.

Signed-off-by: Harsha <[email protected]>
Acked-by: Mohan Marutirao Dhanawade <[email protected]>
Updated src version with mdd version for hdcp22_cipher_dp

Signed-off-by: Meena Paleti <[email protected]>

Acked-by: Siva Addepalli<[email protected]>
The patch has changes in misc folder to enusre that
versionless Versal PLM is identical to  the PLM build from Vitis.

Signed-off-by: Sharath Kumar Dasari <[email protected]>
Acked-by: Vikram Sreenivasa Batchali <[email protected]>
I discovered a very snicky and tricky race condition scenario when integrating tracealyzer code into our project.

I suspect it might be a problem with other ports as well that behave like the CortexR5

A little background on CortexR5 : When the IRQ line (comming from the interrupt controller, to which every peripheral IRQ lines connect) of the processor rises and the IRQ Enable bit in the status register of the CPU permits it, the CPU traps into interrupt mode. Several things happen. First, the CPU finishes the instruction it was performing. Second, it places the content of the CPSR register into the SPSR_irq register. And third, the mode of the CPU is changed to IRQ_Mode and /!\ THE IRQ ENABLE BIT IN CPSR_irq IS AUTOMATICALLY CLEARED /!\ . The reason is to ensure that upon landing into IRQ code, we find ourselves automatically in a critical section because we cannot be interrupted again because the bit is cleared. The programmer can, if he wants, re-enable IRQs inside IRQ code itself to allow interrupt nesting. But it has to be wanted and meditated.

Now, inside portASM.S, at the end of 'FreeRTOS_IRQ_Handler' assembly function, a call to 'vTaskSwitchContextConst' is made if the variable 'ulPortYieldRequired' was set by someone while executing the interrupt. Before branching to that function, a 'CPSID i' instruction was placed to ensure that interrupts are disabled in case someone re-enabled it. Inside 'vTaskSwitchContext', there is the macro 'traceTASK_SWITCHED_OUT' that gets populated when tracing is enabled.

The bug is right there.. If the macro is populated and inside that macro there is a matching call to 'ulPortSetInterruptMask' and 'vPortClearInterruptMask', a race condition can occure is there is a OS Tick timer interrupt waiting at the interrupt controller's door. Upon calling 'vTaskSwitchContext', the interrupts might not masked in the interrupt controller, the only barrier against the CPU servicing that tick interrupt while already performing the function is that the IRQ Enable bit cleared. 'ulPortSetInterruptMask'
does what's its supposed to do, but doesn't take into account the IRQ Enable bit in CPSR. Wheter or not the bit was cleared, the function sets it at the end. When calling the matching 'vPortClearInterruptMask', the function clears the interrupt mask in the interrupt controller. Because the IRQ Enable bit (that was cleared) has been set no matter what in 'ulPortSetInterruptMask', the CPU services the OS Tick Interrupt right away.

The bug is there : instead of completing the 'vTaskSwitchContext' function, the CPU re-enters the switch context path right after 'traceTASK_SWITCHED_OUT' thus corrupting the CPU state and eventually triggering either an undefined instruction, data or instruction abort.

I know it's a very specific scenario (FreeRTOS + Trace added and OS Tick interrupt waiting just at the right time), but it was a big problem for us. The fact that the problem would occur frequently enough to be fixed had to to with the fact that each call to tracing macro would take some time; enough for a OS Tick to happen. We used tracealyser in streaming mode, thus data would get flushed out to the fast uart in the FPGA at each passage onto a trace macro. That's how this vulnerability was exposed. 

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Comment on lines +159 to +172


/* Adding the necessary stuff in order to be able to determine from C code wheter or not the IRQs are enabled at the processor level (not interrupt controller level) */
#define GET_CPSR() ({u32 rval = 0U; \
__asm__ __volatile__(\
"mrs %0, cpsr\n"\
: "=r" (rval)\
);\
rval;\
})

#define CPSR_IRQ_ENABLE_MASK 0x80U

#define IS_IRQ_DISABLED() ({unsigned int val = 0; val = (GET_CPSR() & CPSR_IRQ_ENABLE_MASK) ? 1 : 0; val;})
Copy link
Author

@simonbeaudoin0935 simonbeaudoin0935 Jul 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because in the FreeRTOS-Kernel repo the file pseudo_asm_gcc.h doesn't exist, there is no ready to use macro like mrcprs() to retreive the CPSR, thats why I redefine it here under an other name

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.