Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ruler fixes #6

Closed
wants to merge 4 commits into from
Closed

Ruler fixes #6

wants to merge 4 commits into from

Conversation

karlb
Copy link
Owner

@karlb karlb commented Jun 26, 2022

Bugs reported in Gottox#22 and Gottox#23.

Suggestions on how to make the new tests pass in a simpler way are very welcome!

Fixes the following bug:
```
$ echo -e "- - -\n\ntest\n- - -\n" | ./smu
<hr />
<p>test- - -</p>
```
This is ugly, since the rulers are redefined in `p_end_regex`. Better
solutions welcome!
@N-R-K
Copy link
Collaborator

N-R-K commented Jun 26, 2022

I think the problem here is same as: #1. There are certain things which should interrupt a paragraph, but they don't.

I feel like the regex is just trying to hack around this limitation. I think a cleaner approach would be keeping track of weather we're in a paragraph or not and then having a flag interrupt_paragraph (or something along those lines) on Tag-s which are supposed to interrupt a paragraph.

The following is a very fast and dirty patch, only used a proof of concept. It solves both the ruler issue and the list issue from some trivial testing.

From b0e9c578686af72901db4049a1f678a01d44358a Mon Sep 17 00:00:00 2001
From: NRK <[email protected]>
Date: Mon, 27 Jun 2022 05:28:25 +0600
Subject: [PATCH] [WIP]: rework paragraph handling

---
 smu.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/smu.c b/smu.c
index 28d413c..45d756b 100644
--- a/smu.c
+++ b/smu.c
@@ -17,6 +17,7 @@ typedef struct {
 	char *search;
 	int process;
 	char *before, *after;
+	int interrupt_paragraph;
 } Tag;
 
 static int doamp(const char *begin, const char *end, int newblock);       /* Parser for & */
@@ -40,6 +41,7 @@ static Parser parsers[] = { dounderline, docomment, dolineprefix,
                             dolist, doparagraph, dogtlt, dosurround, dolink,
                             doshortlink, dohtml, doamp, doreplace };
 static int nohtml = 0;
+static int in_paragraph = 0;
 
 static Tag lineprefix[] = {
 	{ "    ",	0,	"<pre><code>", "\n</code></pre>" },
@@ -51,7 +53,7 @@ static Tag lineprefix[] = {
 	{ "### ",	1,	"<h3>",		"</h3>" },
 	{ "## ",	1,	"<h2>",		"</h2>" },
 	{ "# ",		1,	"<h1>",		"</h1>" },
-	{ "- - -\n",	1,	"<hr />",	""},
+	{ "- - -\n",	1,	"<hr />",	"", 1},
 };
 
 static Tag underline[] = {
@@ -186,15 +188,16 @@ dohtml(const char *begin, const char *end, int newblock) {
 
 int
 dolineprefix(const char *begin, const char *end, int newblock) {
-	unsigned int i, j, l;
+	unsigned int i, j, l, k = 0;
 	char *buffer;
 	const char *p;
 
 	if(newblock)
 		p = begin;
-	else if(*begin == '\n')
+	else if(*begin == '\n') {
 		p = begin + 1;
-	else
+		k = 1;
+	} else
 		return 0;
 	for(i = 0; i < LENGTH(lineprefix); i++) {
 		l = strlen(lineprefix[i].search);
@@ -204,10 +207,14 @@ dolineprefix(const char *begin, const char *end, int newblock) {
 			continue;
 		if(*begin == '\n')
 			fputc('\n', stdout);
+		if (in_paragraph && lineprefix[i].interrupt_paragraph)
+			fputs("</p>", stdout);
 		fputs(lineprefix[i].before, stdout);
+		if (in_paragraph && lineprefix[i].interrupt_paragraph)
+			fputs("<p>", stdout);
 		if(lineprefix[i].search[l-1] == '\n') {
 			fputc('\n', stdout);
-			return l - 1;
+			return l - 1 + k;
 		}
 		if(!(buffer = malloc(BUFSIZ)))
 			eprint("Malloc failed.");
@@ -362,6 +369,8 @@ dolist(const char *begin, const char *end, int newblock) {
 	for(p++; p != end && (*p == ' ' || *p == '\t'); p++);
 	indent = p - q;
 	buffer = ereallocz(buffer, BUFSIZ);
+	if (in_paragraph)
+		fputs("</p>", stdout);
 	if(!newblock)
 		fputc('\n', stdout);
 	fputs(ul ? "<ul>\n" : "<ol>\n", stdout);
@@ -421,6 +430,8 @@ dolist(const char *begin, const char *end, int newblock) {
 	free(buffer);
 	p--;
 	while(*(--p) == '\n');
+	if (in_paragraph)
+		fputs("<p>", stdout);
 	return -(p - begin + 1);
 }
 
@@ -436,8 +447,10 @@ doparagraph(const char *begin, const char *end, int newblock) {
 	if(p - begin <= 1)
 		return 0;
 	fputs("<p>", stdout);
+	in_paragraph = 1;
 	process(begin, p, 0);
 	fputs("</p>\n", stdout);
+	in_paragraph = 0;
 	return -(p - begin);
 }
 
-- 
2.35.1

@karlb
Copy link
Owner Author

karlb commented Jul 17, 2022

I was trying to keep the paragraph handling out of the other handler functions to separate the different concerns. But it looks like your are right and I have to admit defeat on that, since it won't work at all for lists.
Unless I require lists to start after an empty line. But that would make users unhappy by both diverging from CommonMark and breaking backwards compatibility. So the moderate increase in complexity seems to be the lesser evil.

@karlb
Copy link
Owner Author

karlb commented Jul 17, 2022

The way smu currently handles paragraphs is not HTML compliant, as the current output has paragraphs which contain block-level elements (e.g. another paragraph).

Per 9.3.1 Paragraphs: the P element of the HTML 4.01 specification:

The P element represents a paragraph. It cannot contain block-level elements (including P itself).

Fixing this will cause large diffs in the test output, which might obscure side effects of the main changes in this PR. I will try to untangle the different issues into multiple PRs.

@karlb karlb mentioned this pull request Jul 17, 2022
@N-R-K
Copy link
Collaborator

N-R-K commented Jul 17, 2022

I'm not very well versed on html standard but do note that the patch I proposed would end up generating some empty paragraph: <p></p>. If these aren't html compliant then we'd probably have to attack the issue from another angle.

I was trying to keep the paragraph handling out of the other handler functions to separate the different concerns.

All the parsers already recieve newblock as an argument. Maybe possible to do something similar with paragraph.

@karlb
Copy link
Owner Author

karlb commented Jul 17, 2022

This PR is superseded by combining #8 and #10.

Maybe possible to do something similar with paragraph.

Yes, I think so too. But I wanted to fix some problems before going into another rabbit hole, so it's #10 for now.

Thanks for all the feedback!

@karlb karlb closed this Jul 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants