-
-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve code support #401
improve code support #401
Conversation
Add support in inline and multi-line code snippets from pip Example: https://pypi.org/project/openai-function-call/
Add support in inline and multi-line code snippets powered by hljs Example: https://medium.com/@jxnlco/seamless-integration-with-openai-and-pydantic-a-powerful-duo-for-output-parsing-fcb1e616167b
When fetching Medium pages without javascript enabled, code snippets are structured as <pre><span>code</span></pre> Example: https://medium.com/@jxnlco/seamless-integration-with-openai-and-pydantic-a-powerful-duo-for-output-parsing-fcb1e616167b
Use `<lb/>` instead of `\n` for code snippets
Codecov Report
@@ Coverage Diff @@
## master #401 +/- ##
==========================================
+ Coverage 96.81% 96.82% +0.01%
==========================================
Files 22 22
Lines 3323 3334 +11
==========================================
+ Hits 3217 3228 +11
Misses 106 106
|
Hi @idoshamun, thanks for the PR! Could you please add coverage for the line that is missing? |
@adbar will do! |
@adbar we should be ready to go! |
@idoshamun Thanks for adding a corresponding test! While your changes make perfect sense the cases tackled by them are quite specific and subject to change over time. I just want to make sure that the software can continue being used as a general-purpose extractor, the questions above arise from this fact. |
@adbar I'll give it a look hopefully during the weekend :) |
@idoshamun I tried to address part of our concerns, feel free to have a look at the code. The tests are failing right now but this is something else, I'll retry later. |
@adbar sorry for the delay on my side kinda busy these days. We'll do my best to review all changes asap |
@adbar thanks for the fixes, I only have one concern. I don't see where we replace spaces with |
It's at the end of this long line: I still need to address the questions above but we're moving forward, thanks for your help! |
@adbar this long line switches the encoded space to a regular space. But I don't see where we do the other way around. It was removed in your changes but I didn't find the alternative |
@idoshamun I see what you mean, you added lines like this: The tests are passing though, so either it's not so important or you didn't add a corresponding test. Could you please check if the current version works for you? If not feel free to put back the corresponding lines along with a suitable test case. |
@adbar I think I saw in one of the commits you removed the spaces in the tests. I'll give it a look |
@idoshamun yes, my bad, it was in 52688ea because I didn't understand the point. Now I see. |
@adbar is there anything I can do at this point to help? |
@idoshamun I was able to solve most problems mentioned above and will now merge the PR as substantial progress has been made. The spacing issue remains, it's not clear to me how to address it at this point. You could work on it in another PR, as well as add further tests if you witness issues with the robustness of code extraction. |
@adbar thanks a lot. Can you give me some pointers of where to look for the spaces? What will be the best place? |
@idoshamun This commit was fine but I'm not sure it works with the latest changes, the function 6ca5a20#diff-abef8bfc97e60c84035950c7da832f5ca07580db79e4fb9e77ef0f8b82112b30R229 |
The problem is that the spaces are stripped at some point, I can look further into it if you don't find the culprit. |
I'm gonna have a look now at what will be a better way to implement the spaces. Any pointers? |
This PR improves support in detecting and handling code snippets as follows: