-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ruler fixes #6
Ruler fixes #6
Conversation
Fixes the following bug: ``` $ echo -e "- - -\n\ntest\n- - -\n" | ./smu <hr /> <p>test- - -</p> ```
This is ugly, since the rulers are redefined in `p_end_regex`. Better solutions welcome!
I think the problem here is same as: #1. There are certain things which should interrupt a paragraph, but they don't. I feel like the regex is just trying to hack around this limitation. I think a cleaner approach would be keeping track of weather we're in a paragraph or not and then having a flag The following is a very fast and dirty patch, only used a proof of concept. It solves both the ruler issue and the list issue from some trivial testing. From b0e9c578686af72901db4049a1f678a01d44358a Mon Sep 17 00:00:00 2001
From: NRK <[email protected]>
Date: Mon, 27 Jun 2022 05:28:25 +0600
Subject: [PATCH] [WIP]: rework paragraph handling
---
smu.c | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)
diff --git a/smu.c b/smu.c
index 28d413c..45d756b 100644
--- a/smu.c
+++ b/smu.c
@@ -17,6 +17,7 @@ typedef struct {
char *search;
int process;
char *before, *after;
+ int interrupt_paragraph;
} Tag;
static int doamp(const char *begin, const char *end, int newblock); /* Parser for & */
@@ -40,6 +41,7 @@ static Parser parsers[] = { dounderline, docomment, dolineprefix,
dolist, doparagraph, dogtlt, dosurround, dolink,
doshortlink, dohtml, doamp, doreplace };
static int nohtml = 0;
+static int in_paragraph = 0;
static Tag lineprefix[] = {
{ " ", 0, "<pre><code>", "\n</code></pre>" },
@@ -51,7 +53,7 @@ static Tag lineprefix[] = {
{ "### ", 1, "<h3>", "</h3>" },
{ "## ", 1, "<h2>", "</h2>" },
{ "# ", 1, "<h1>", "</h1>" },
- { "- - -\n", 1, "<hr />", ""},
+ { "- - -\n", 1, "<hr />", "", 1},
};
static Tag underline[] = {
@@ -186,15 +188,16 @@ dohtml(const char *begin, const char *end, int newblock) {
int
dolineprefix(const char *begin, const char *end, int newblock) {
- unsigned int i, j, l;
+ unsigned int i, j, l, k = 0;
char *buffer;
const char *p;
if(newblock)
p = begin;
- else if(*begin == '\n')
+ else if(*begin == '\n') {
p = begin + 1;
- else
+ k = 1;
+ } else
return 0;
for(i = 0; i < LENGTH(lineprefix); i++) {
l = strlen(lineprefix[i].search);
@@ -204,10 +207,14 @@ dolineprefix(const char *begin, const char *end, int newblock) {
continue;
if(*begin == '\n')
fputc('\n', stdout);
+ if (in_paragraph && lineprefix[i].interrupt_paragraph)
+ fputs("</p>", stdout);
fputs(lineprefix[i].before, stdout);
+ if (in_paragraph && lineprefix[i].interrupt_paragraph)
+ fputs("<p>", stdout);
if(lineprefix[i].search[l-1] == '\n') {
fputc('\n', stdout);
- return l - 1;
+ return l - 1 + k;
}
if(!(buffer = malloc(BUFSIZ)))
eprint("Malloc failed.");
@@ -362,6 +369,8 @@ dolist(const char *begin, const char *end, int newblock) {
for(p++; p != end && (*p == ' ' || *p == '\t'); p++);
indent = p - q;
buffer = ereallocz(buffer, BUFSIZ);
+ if (in_paragraph)
+ fputs("</p>", stdout);
if(!newblock)
fputc('\n', stdout);
fputs(ul ? "<ul>\n" : "<ol>\n", stdout);
@@ -421,6 +430,8 @@ dolist(const char *begin, const char *end, int newblock) {
free(buffer);
p--;
while(*(--p) == '\n');
+ if (in_paragraph)
+ fputs("<p>", stdout);
return -(p - begin + 1);
}
@@ -436,8 +447,10 @@ doparagraph(const char *begin, const char *end, int newblock) {
if(p - begin <= 1)
return 0;
fputs("<p>", stdout);
+ in_paragraph = 1;
process(begin, p, 0);
fputs("</p>\n", stdout);
+ in_paragraph = 0;
return -(p - begin);
}
--
2.35.1
|
I was trying to keep the paragraph handling out of the other handler functions to separate the different concerns. But it looks like your are right and I have to admit defeat on that, since it won't work at all for lists. |
The way smu currently handles paragraphs is not HTML compliant, as the current output has paragraphs which contain block-level elements (e.g. another paragraph). Per 9.3.1 Paragraphs: the P element of the HTML 4.01 specification:
Fixing this will cause large diffs in the test output, which might obscure side effects of the main changes in this PR. I will try to untangle the different issues into multiple PRs. |
I'm not very well versed on html standard but do note that the patch I proposed would end up generating some empty paragraph:
All the parsers already recieve |
Bugs reported in Gottox#22 and Gottox#23.
Suggestions on how to make the new tests pass in a simpler way are very welcome!