Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DPI related bug-fixes #5

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

shubham-cdot
Copy link
Contributor

No description provided.

shubham-cdot and others added 3 commits January 13, 2021 16:19
For TCP sessions, L4 payload data length of first packet
(SYN packet) is zero. So the first packet of TCP session
will not be sent to nDPI due to data_len != 0 checking in
dpi_ndpi_session_first_packet().

However, nDPI uses TCP SYN packets internally for connection
tracking and other purposes. For example, in nDPI library
function ndpi_detection_process_packet():
a) ndpi_connection_tracking() is called for connection
tracking and updating TCP flag states
b) app detection is given up for TCP sessions in some cases if
the first packet of session sent to nDPI is not a SYN packet

Hence, remove data_len != 0 check from the function
dpi_ndpi_session_first_packet() and send the first packet of
a session to nDPI irrespective of data length.

Co-authored-by: Subhajit Chatterjee <[email protected]>
Signed-off-by: Shubham Shrivastava <[email protected]>
Add USE_NDPI flag to initialise ‘engines’ and 'engines_len'
before calling dpi_session_first_packet() function in order
to avoid error due to missing dpi_engine_procs for nDPI engine
when dataplane compilation is done without USE_NDPI flag.

Co-authored-by: Subhajit Chatterjee <[email protected]>
Signed-off-by: Shubham Shrivastava <[email protected]>
Use loop variable 'j' instead of 'i' during flow cleanup.

Co-authored-by: Subhajit Chatterjee <[email protected]>
Signed-off-by: Shubham Shrivastava <[email protected]>
@pjaitken
Copy link
Contributor

@shubham-cdot could you tell us why you had to remove the zero data length check from dpi_ndpi_session_first_packet?

Are there some applications which are not being identified? Is this particular to 3.4?

Can you give us details of, "app detection is given up for TCP sessions in some cases if the first packet of session sent to nDPI is not a SYN packet"?

I'm trying to understand why this wasn't an issue for us before, and I'd appreciate any information which we could use to improve our internal DPI testing.

Thanks.

@shubham-cdot
Copy link
Contributor Author

@pjaitken In ndpi_detection_process_packet function (file: ndpi_main.c):

if((flow->num_processed_pkts == 1) &&
(ret.master_protocol == NDPI_PROTOCOL_UNKNOWN) &&
(ret.app_protocol == NDPI_PROTOCOL_UNKNOWN) &&
flow->packet.tcp &&
(flow->packet.tcp->syn == 0) &&
(flow->guessed_protocol_id == 0)) {
u_int8_t protocol_was_guessed;

    /*
      This is a TCP flow
      - whose first packet is NOT a SYN
      - no protocol has been detected

      We don't see how future packets can match anything
      hence we giveup here
    */
    ret = ndpi_detection_giveup(ndpi_str, flow, 0, &protocol_was_guessed);

}

The above checking was first added to nDPI via commit id e4f01976a and is present since nDPI version 3.0:

commit e4f01976a66f1943bde7b253b62430d36c6d9e74
Author: Luca [email protected]
Date: Thu Aug 30 11:10:30 2018 +0200

Added missing categorization when giveup/guess is called
Added optimization for TCP flows that do not start with a SYN packet: early giveup is performed
Code cleanup

As per our understanding, app detection will be given up for a TCP flow as per above checking in case of TCP flows for which nDPI is unable to detect the app protocol using the first packet sent to nDPI after the SYN packet, by applying both:

  • protocol guessing (for example based on IP address, TCP ports etc.) and
  • protocol dissectors

This would be more likely to happen for protocols running on non-standard ports for which multiple packets of the flow are required for app identification.

We tried to simulate this scenario but currently unable to do so due to limitations of test setup at our end. For more info, please refer to packet handling in ndpi_detection_process_packet.
Thanks!

@pjaitken
Copy link
Contributor

@shubham-cdot, I'm copying some internal feedback from @dfawcus.

@@ -315,8 +316,7 @@ dpi_ndpi_session_first_packet(struct npf_session *se __unused,
if (!flow->dest_id)
goto dest_id_error;

if (data_len != 0 && !dpi_ndpi_process_pkt(
(struct dpi_engine_flow *)flow, mbuf, dir))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dfawcus

In dpi.c:dpi_get_data_len() there is a comment around where the UDP length is validated, intended to simply cause packets to be dropped if the length is invalid.

The above change would seem to make the call graph to here broken, in that such packets now will not be dropped, so there would need to be another way of dealing with that case.

It is also worth keeping in mind that (eventually) UDP packets will have the possibility of having 0 traditional data, while actually carrying elsewhere because of the UDP-Options draft which is in progress.

It strikes me that the data_length field should maybe be removed from the prototype for this function, as it arose when the mbuf was not passed in, and we passed instead a data+length tuple. It is rather superfluous at the moment, as it can be extracted from the mbuf - see dpi_get_data_length().

#else
uint8_t engines[] = { IANA_USER };
size_t engines_len = 1;
#endif /* USER_NDPI */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dfawcus

Not directly related, but possibly important.

I believe that when the app-fw was created, 0 data-length packets were not fed through it, and so did not count towards the number of initial packets it had to see before converging on an answer. So TCP flags only updates would not count.

The context of this PR implies that this is no longer the case, and it may inform upon why we had issues with the initial count behaviour in the post Qosmos world. So there may be something to be updated in the app-fw now as to which packets it counts towards that initial set - possibly 0-data length TCP packet should not be counted?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly copied to the wrong place, this makes more sense in relation to the prior commit.
(I can't recall which commit I added the comment to in the internal PR system).

size_t engines_len = 2;
#else
uint8_t engines[] = { IANA_USER };
size_t engines_len = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dfawcus

The second line would be better outwith of the conditional and written as:

size_t engines_len = ARRAY_SIZE(engines);

Which raises another question, each other call to dpi_session_first_packet() passes an final pair or parameters, being 'engines_length' and 'engines', however 'engines_length' was a constant '2' in each case.

This change implies that something related to that pair should be altered - somehow.

size_t engines_len = 2;
#else
uint8_t engines[] = {IANA_USER};
size_t engines_len = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dfawcus

ARRAY_SIZE, as earlier.

size_t engines_len = 2;
#else
uint8_t engines[] = {IANA_USER};
size_t engines_len = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dfawcus

ARRAY_SIZE, as earlier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this repeated code should be in a little helper function so USE_NDPI is kept to a minimum, and there's only one place to add additional DPI engines in future.

@subhajit-cdot
Copy link

@pjaitken What if we remove the check for UDP packets from dpi.c:dpi_get_data_len() ? How will it impact the DPI feature? The issue might arise for some protocol detection (TCP based) running on non standard ports. The nDPI might give up if data_len == 0 packets are skipped.

We can use ARRAY_SIZE() to find num of engines rather than static entry , however can you elaborate "there's only one place to add additional DPI engines in future." This is true if we use num of engines = 2 ?

Regards,
S

@dfawcus
Copy link
Contributor

dfawcus commented Feb 3, 2021

What if we remove the check for UDP packets from dpi.c:dpi_get_data_len() ? How will it impact the DPI feature? The issue might arise for some protocol detection (TCP based) running on non standard ports. The nDPI might give up if data_len == 0 packets are skipped.

My point is that the code I referred to needs a mechanism to force some packets to be dropped. The code originally made use of indicating the length was zero to achieve that. If now a length of zero can not be used to signal that the packet should be dropped, then some other mechanism needs to be provided.

I am not suggesting disallowing a length zero packet to processed, I am suggesting that you need to add a alternate mechanism to force some packets to be dropped.

@pjaitken
Copy link
Contributor

pjaitken commented Feb 3, 2021

@subhajit-cdot, I mean that engines[] and engines_len should be set in only one place (ie, in a new function) rather than in multiple places throughout the code.
If a new engine DPI engine is added then it will only need to be added in that one function rather than in multiple places.

@shubham-cdot
Copy link
Contributor Author

My point is that the code I referred to needs a mechanism to force some packets to be dropped. The code originally made use of indicating the length was zero to achieve that. If now a length of zero can not be used to signal that the packet should be dropped, then some other mechanism needs to be provided.

In order to distinguish invalid data length case from 0 (valid) data length, we can do one of the following:

  1. add an error flag to dpi.c:dpi_get_data_len() as an output argument to indicate invalid data length case
  2. add new function just to check for invalid data length

In case of option 2, the checking for invalid data length will be moved from dpi_get_data_len() to the new function.

@shubham-cdot
Copy link
Contributor Author

@pjaitken @dfawcus Few queries:

a) In the original DANOS 2009 code, the check for data_len != 0 is done only in ndpi.c:dpi_ndpi_session_first_packet() i.e. for the first packet of session. Will this check not be required for subsequent packets of the session?

b) In dpi.c:dpi_get_data_len(), data length is validated only for UDP. Will similar check not be required for TCP?

@subhajit-cdot
Copy link

Hello @pjaitken
Can you please update about the current status of the two pending bug fixes (data_len and engines)?
Thanks
Subhajit

VyattaInternal pushed a commit that referenced this pull request Mar 2, 2021
se_sen field in struct session may be NULL in two cases for newly
created sessions not yet added to sentry hash list, or after sesssion
removed from sentry hash list during reclaim in session gc.

This causes the following segmentaion fault infrequently.

[Current thread is 1 (Thread 0x7ff5c3fff700 (LWP 30235))]
 #0  0x000055ce31a8978d in csync_get_session_from_init_sentry (
    cse=<synthetic pointer>, cs=<synthetic pointer>, sp=0x7ff5f004ef36)
    at ../src/npf/csync/csync_session_unpack.c:38
 #1  csync_session_unpack_update (csu=0x7ff5f004ef2e)
    at ../src/npf/csync/csync_session_unpack.c:71
 #2  csync_unpack_session (size=<optimized out>, msg=0x7ff5f004ef26)
    at ../src/npf/csync/csync_session_unpack.c:421
 #3  csync_recv_session_update (frame=<optimized out>)
    at ../src/npf/csync/csync_session_unpack.c:501
 #4  0x000055ce31b2d57d in csync_restore_sessions (n=<optimized out>,
    flist=<optimized out>) at ../src/csync/csync_transfer.c:218
 #5  csync_pull_batch (info=0x7ff58400b880) at
../src/csync/csync_transfer.c:506
 #6  csync_xfer_backup (pipe=0x7ff58400b8e0, arg=0x7ff58400b880)
    at ../src/csync/csync_transfer.c:301
 #7  0x00007ff6629618d3 in ?? () from
/usr/lib/x86_64-linux-gnu/libczmq.so.4
 #8  0x00007ff6611ad4a4 in start_thread ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
 #9  0x00007ff660eefd0f in clone () from /lib/x86_64-linux-gnu/libc.so.6

Fixed by doing a safe derefernce and checking for NULL.

VRVDR-54586
VyattaInternal pushed a commit that referenced this pull request Mar 3, 2021
se_sen field in struct session may be NULL in two cases for newly
created sessions not yet added to sentry hash list, or after sesssion
removed from sentry hash list during reclaim in session gc.

This causes the following segmentaion fault infrequently.

[Current thread is 1 (Thread 0x7ff5c3fff700 (LWP 30235))]
 #0  0x000055ce31a8978d in csync_get_session_from_init_sentry (
    cse=<synthetic pointer>, cs=<synthetic pointer>, sp=0x7ff5f004ef36)
    at ../src/npf/csync/csync_session_unpack.c:38
 #1  csync_session_unpack_update (csu=0x7ff5f004ef2e)
    at ../src/npf/csync/csync_session_unpack.c:71
 #2  csync_unpack_session (size=<optimized out>, msg=0x7ff5f004ef26)
    at ../src/npf/csync/csync_session_unpack.c:421
 #3  csync_recv_session_update (frame=<optimized out>)
    at ../src/npf/csync/csync_session_unpack.c:501
 #4  0x000055ce31b2d57d in csync_restore_sessions (n=<optimized out>,
    flist=<optimized out>) at ../src/csync/csync_transfer.c:218
 #5  csync_pull_batch (info=0x7ff58400b880) at
../src/csync/csync_transfer.c:506
 #6  csync_xfer_backup (pipe=0x7ff58400b8e0, arg=0x7ff58400b880)
    at ../src/csync/csync_transfer.c:301
 #7  0x00007ff6629618d3 in ?? () from
/usr/lib/x86_64-linux-gnu/libczmq.so.4
 #8  0x00007ff6611ad4a4 in start_thread ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
 #9  0x00007ff660eefd0f in clone () from /lib/x86_64-linux-gnu/libc.so.6

Fixed by doing a safe derefernce and checking for NULL.

VRVDR-54586

(cherry picked from commit 5605da3)
VyattaInternal pushed a commit that referenced this pull request Mar 3, 2021
se_sen field in struct session may be NULL in two cases for newly
created sessions not yet added to sentry hash list, or after sesssion
removed from sentry hash list during reclaim in session gc.

This causes the following segmentaion fault infrequently.

[Current thread is 1 (Thread 0x7ff5c3fff700 (LWP 30235))]
 #0  0x000055ce31a8978d in csync_get_session_from_init_sentry (
    cse=<synthetic pointer>, cs=<synthetic pointer>, sp=0x7ff5f004ef36)
    at ../src/npf/csync/csync_session_unpack.c:38
 #1  csync_session_unpack_update (csu=0x7ff5f004ef2e)
    at ../src/npf/csync/csync_session_unpack.c:71
 #2  csync_unpack_session (size=<optimized out>, msg=0x7ff5f004ef26)
    at ../src/npf/csync/csync_session_unpack.c:421
 #3  csync_recv_session_update (frame=<optimized out>)
    at ../src/npf/csync/csync_session_unpack.c:501
 #4  0x000055ce31b2d57d in csync_restore_sessions (n=<optimized out>,
    flist=<optimized out>) at ../src/csync/csync_transfer.c:218
 #5  csync_pull_batch (info=0x7ff58400b880) at
../src/csync/csync_transfer.c:506
 #6  csync_xfer_backup (pipe=0x7ff58400b8e0, arg=0x7ff58400b880)
    at ../src/csync/csync_transfer.c:301
 #7  0x00007ff6629618d3 in ?? () from
/usr/lib/x86_64-linux-gnu/libczmq.so.4
 #8  0x00007ff6611ad4a4 in start_thread ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
 #9  0x00007ff660eefd0f in clone () from /lib/x86_64-linux-gnu/libc.so.6

Fixed by doing a safe derefernce and checking for NULL.

VRVDR-54586

(cherry picked from commit 5605da3)
@pjaitken
Copy link
Contributor

@shubham-cdot, @subhajit-cdot,

The data_len != 0 check should be removed. All valid packets should be sent to the DPI engine.

Should packets with an invalid length be sent to the DPI engine or not? If the DPI engine validates packets itself, and does not overrun the packet or crash, then we can simply send all traffic into the engine. If we don't send invalid length packets to the DPI engine, then should further packets be sent to the engine after an invalid length packet has been blocked?

If we agree to block invalid length packets, then a new "invalid length" check should be added - for all packets (not just the first) and for all protocols (not just UDP). It seems sensible to add the invalid length check to dpi_get_data_len and return an output argument. It would be useful to return the length of other transport protocols too, though this could be done later.

The DPI statistics should only count packets which are being sent to the DPI engine.

Regarding engines, engines[] and engines_len should be set in a new function so this is done in only one place rather than in multiple places throughout the code.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants