Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CRASH ] We got a segfault here #3464

Closed
ar45 opened this issue Sep 11, 2024 · 5 comments · Fixed by #3467
Closed

[CRASH ] We got a segfault here #3464

ar45 opened this issue Sep 11, 2024 · 5 comments · Fixed by #3467

Comments

@ar45
Copy link
Contributor

ar45 commented Sep 11, 2024

(unsigned int)*(unsigned long *)(*connected_ts) < curl_conn_lifetime)) {

Perhaps this is supposed to be (unsigned int *)(unsigned long *)(*connected_ts) < curl_conn_lifetime))

@ar45 ar45 changed the title We got a segfault here [CRASH ] We got a segfault here Sep 11, 2024
ar45 added a commit to ar45/opensips that referenced this issue Sep 11, 2024
when setting `connected_ts` set the address which is being used as value?
liviuchircu pushed a commit that referenced this issue Sep 11, 2024
when setting `connected_ts` set the address which is being used as value?

(cherry picked from commit a3dd31a)
liviuchircu pushed a commit that referenced this issue Sep 11, 2024
when setting `connected_ts` set the address which is being used as value?

(cherry picked from commit a3dd31a)
@liviuchircu
Copy link
Member

liviuchircu commented Sep 11, 2024

Hi @ar45! I'm afraid the fix in #3467 might not be entirely accurate, could you please pull to latest and test out the following additional patch, available here?

Long story short:

  • the map_get() gives you a changeable 8-byte holder (hence void **), so you can store arbitrary data inside the map
  • here, I only needed to store an unsigned int as map value (the "last used" timestamp), so I abused the void * holder and only stored a 4-byte unsigned int as being the holder itself, casted as unsigned long.
  • there is no need to check for *connected_ts, as there is no point in testing the timestamp value itself, only the holder ((void **)connected_ts)
  • indeed, the bug was in the line you suggested above, because of extra * (fixed by attached patch)

Let me know how it goes, thank you!

EDIT: new gist revision pushed, hopefully it's the last one :)

@ar45
Copy link
Contributor Author

ar45 commented Sep 11, 2024

Hi @liviuchircu Thanks for the quick merge.

I tested the additional patch at https://gist.githubusercontent.com/liviuchircu/2fd29529c06680b3eebac34c85eaa1f3/raw/214d7faf3626760379f37233a0275054587ac3d4/rest-client-ptr-management-quickfix and that segfaults at *(unsigned long *)(*connected_ts) = (unsigned long)get_ticks();

The pointer to pointer business is confusing here...

@ar45
Copy link
Contributor Author

ar45 commented Sep 11, 2024

@liviuchircu Attached is a corrected patch based on what you provided.
rest-client-ptr-management-quickfix-2.patch

Besides that. I see the way it is implemented, the curl_conn_lifetime is just an arbitrary value which says every curl_conn_lifetime we expect to reconnect and therefore, grab a lock until done. The connected_ts is also not updated on subsequent requests within curl_conn_lifetime.

I would expect that curl_conn_lifetime would control the CURLOPT_MAXLIFETIME_CONN or CURLOPT_MAXAGE_CONN or CURLOPT_TCP_KEEPIDLE.

while no_concurrent_connects would control that whenever curl does not have an exiting connection, the connect phase would be wrapped within the hash_lock(rcl_parallel_connects, he);

@liviuchircu
Copy link
Member

liviuchircu commented Sep 12, 2024

Thank you for the feedback, @ar45! Indeed, I updated the gist afterwards, but the RAW URL (linked in the email) seems to be unique for each revision, so you still got sent to the old one. Now that part is functional -- all in all, there was just an extra * to be removed in order to fix the logic, once you add up both of our fix-patches. I will take care of backporting it.

With regards to the connection "MRU timestamping" logic, I just wanted to put out a starting point, which we could then adjust based on various cURL library behaviors we find out. For example, I haven't found any way to control the long-lived connection timeouts in the cURLs I tried, so I assumed they default to the OS settings (documented in curl_conn_lifetime). So if a shared timestamp isn't updated at all within curl_conn_lifetime seconds on a given hostname (no OpenSIPS worker has used it), the module assumes all cURL long-lived TCP connections will have been dropped at that point and will re-activate the "one process connects at a time, others get a quick error" logic.

Cheers!

@ar45
Copy link
Contributor Author

ar45 commented Sep 12, 2024

Thanks @liviuchircu for the quick response and updates.

I think that using the following callbacks we can keep track internally if there is an actual connect happening.

curl_easy_setopt(curl, CURLOPT_RESOLVER_START_FUNCTION, start_resolve_cb);

curl_easy_setopt(curl, CURLOPT_OPENSOCKETFUNCTION, opensocket_cb);

When opensocket_cb is called, we try to acquire the lock for rcl_parallel_connects and when the connect is done we release it.

in the callback to CURLOPT_OPENSOCKETFUNCTION we can also implement check_blacklist_rule([bl_name], ip[, port [, proto]]) opensips check_blacklist

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants