Ruler fixes #6

karlb · 2022-06-26T19:34:55Z

Bugs reported in Gottox#22 and Gottox#23.

Suggestions on how to make the new tests pass in a simpler way are very welcome!

Bugs reported in Gottox#22 and Gottox#23.

Closes Gottox#22.

Fixes the following bug: ``` $ echo -e "- - -\n\ntest\n- - -\n" | ./smu <hr /> test- - - ```

This is ugly, since the rulers are redefined in `p_end_regex`. Better solutions welcome!

N-R-K · 2022-06-26T23:30:28Z

I think the problem here is same as: #1. There are certain things which should interrupt a paragraph, but they don't.

I feel like the regex is just trying to hack around this limitation. I think a cleaner approach would be keeping track of weather we're in a paragraph or not and then having a flag interrupt_paragraph (or something along those lines) on Tag-s which are supposed to interrupt a paragraph.

The following is a very fast and dirty patch, only used a proof of concept. It solves both the ruler issue and the list issue from some trivial testing.

From b0e9c578686af72901db4049a1f678a01d44358a Mon Sep 17 00:00:00 2001
From: NRK <[email protected]>
Date: Mon, 27 Jun 2022 05:28:25 +0600
Subject: [PATCH] [WIP]: rework paragraph handling

---
 smu.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/smu.c b/smu.c
index 28d413c..45d756b 100644
--- a/smu.c
+++ b/smu.c
@@ -17,6 +17,7 @@ typedef struct {
 	char *search;
 	int process;
 	char *before, *after;
+	int interrupt_paragraph;
 } Tag;
 
 static int doamp(const char *begin, const char *end, int newblock);       /* Parser for & */
@@ -40,6 +41,7 @@ static Parser parsers[] = { dounderline, docomment, dolineprefix,
                             dolist, doparagraph, dogtlt, dosurround, dolink,
                             doshortlink, dohtml, doamp, doreplace };
 static int nohtml = 0;
+static int in_paragraph = 0;
 
 static Tag lineprefix[] = {
 	{ "    ",	0,	"<pre><code>", "\n</code></pre>" },
@@ -51,7 +53,7 @@ static Tag lineprefix[] = {
 	{ "### ",	1,	"<h3>",		"</h3>" },
 	{ "## ",	1,	"<h2>",		"</h2>" },
 	{ "# ",		1,	"<h1>",		"</h1>" },
-	{ "- - -\n",	1,	"<hr />",	""},
+	{ "- - -\n",	1,	"<hr />",	"", 1},
 };
 
 static Tag underline[] = {
@@ -186,15 +188,16 @@ dohtml(const char *begin, const char *end, int newblock) {
 
 int
 dolineprefix(const char *begin, const char *end, int newblock) {
-	unsigned int i, j, l;
+	unsigned int i, j, l, k = 0;
 	char *buffer;
 	const char *p;
 
 	if(newblock)
 		p = begin;
-	else if(*begin == '\n')
+	else if(*begin == '\n') {
 		p = begin + 1;
-	else
+		k = 1;
+	} else
 		return 0;
 	for(i = 0; i < LENGTH(lineprefix); i++) {
 		l = strlen(lineprefix[i].search);
@@ -204,10 +207,14 @@ dolineprefix(const char *begin, const char *end, int newblock) {
 			continue;
 		if(*begin == '\n')
 			fputc('\n', stdout);
+		if (in_paragraph && lineprefix[i].interrupt_paragraph)
+			fputs("</p>", stdout);
 		fputs(lineprefix[i].before, stdout);
+		if (in_paragraph && lineprefix[i].interrupt_paragraph)
+			fputs("<p>", stdout);
 		if(lineprefix[i].search[l-1] == '\n') {
 			fputc('\n', stdout);
-			return l - 1;
+			return l - 1 + k;
 		}
 		if(!(buffer = malloc(BUFSIZ)))
 			eprint("Malloc failed.");
@@ -362,6 +369,8 @@ dolist(const char *begin, const char *end, int newblock) {
 	for(p++; p != end && (*p == ' ' || *p == '\t'); p++);
 	indent = p - q;
 	buffer = ereallocz(buffer, BUFSIZ);
+	if (in_paragraph)
+		fputs("</p>", stdout);
 	if(!newblock)
 		fputc('\n', stdout);
 	fputs(ul ? "<ul>\n" : "<ol>\n", stdout);
@@ -421,6 +430,8 @@ dolist(const char *begin, const char *end, int newblock) {
 	free(buffer);
 	p--;
 	while(*(--p) == '\n');
+	if (in_paragraph)
+		fputs("<p>", stdout);
 	return -(p - begin + 1);
 }
 
@@ -436,8 +447,10 @@ doparagraph(const char *begin, const char *end, int newblock) {
 	if(p - begin <= 1)
 		return 0;
 	fputs("<p>", stdout);
+	in_paragraph = 1;
 	process(begin, p, 0);
 	fputs("</p>\n", stdout);
+	in_paragraph = 0;
 	return -(p - begin);
 }
 
-- 
2.35.1

karlb · 2022-07-17T08:25:37Z

I was trying to keep the paragraph handling out of the other handler functions to separate the different concerns. But it looks like your are right and I have to admit defeat on that, since it won't work at all for lists.
Unless I require lists to start after an empty line. But that would make users unhappy by both diverging from CommonMark and breaking backwards compatibility. So the moderate increase in complexity seems to be the lesser evil.

karlb · 2022-07-17T09:14:37Z

The way smu currently handles paragraphs is not HTML compliant, as the current output has paragraphs which contain block-level elements (e.g. another paragraph).

Per 9.3.1 Paragraphs: the P element of the HTML 4.01 specification:

The P element represents a paragraph. It cannot contain block-level elements (including P itself).

Fixing this will cause large diffs in the test output, which might obscure side effects of the main changes in this PR. I will try to untangle the different issues into multiple PRs.

N-R-K · 2022-07-17T10:53:21Z

I'm not very well versed on html standard but do note that the patch I proposed would end up generating some empty paragraph: . If these aren't html compliant then we'd probably have to attack the issue from another angle.

I was trying to keep the paragraph handling out of the other handler functions to separate the different concerns.

All the parsers already recieve newblock as an argument. Maybe possible to do something similar with paragraph.

karlb · 2022-07-17T10:57:30Z

This PR is superseded by combining #8 and #10.

Maybe possible to do something similar with paragraph.

Yes, I think so too. But I wanted to fix some problems before going into another rabbit hole, so it's #10 for now.

Thanks for all the feedback!

karlb added 4 commits June 26, 2022 21:33

Add test for ruler handling bugs

55906ce

Bugs reported in Gottox#22 and Gottox#23.

Fix wrong consumed input counter in dolineprefix

d8fab6c

Closes Gottox#22.

Fix off by one error in dolineprefix

bfc1bec

Fixes the following bug: ``` $ echo -e "- - -\n\ntest\n- - -\n" | ./smu <hr /> test- - - ```

End paragraphs before rulers

cc60737

This is ugly, since the rulers are redefined in `p_end_regex`. Better solutions welcome!

karlb mentioned this pull request Jun 26, 2022

Bad horizontal break substitution in Master branch Gottox/smu#22

Open

karlb mentioned this pull request Jul 17, 2022

Uncritical ruler fixes #8

Merged

karlb closed this Jul 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ruler fixes #6

Ruler fixes #6

karlb commented Jun 26, 2022 •

edited

Loading

N-R-K commented Jun 26, 2022

karlb commented Jul 17, 2022

karlb commented Jul 17, 2022

N-R-K commented Jul 17, 2022

karlb commented Jul 17, 2022

Ruler fixes #6

Ruler fixes #6

Conversation

karlb commented Jun 26, 2022 • edited Loading

N-R-K commented Jun 26, 2022

karlb commented Jul 17, 2022

karlb commented Jul 17, 2022

N-R-K commented Jul 17, 2022

karlb commented Jul 17, 2022

karlb commented Jun 26, 2022 •

edited

Loading