Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: auto_link() fails to include trailing slashes in URLs #9165

Closed
corenominal opened this issue Sep 3, 2024 · 13 comments · Fixed by #9169
Closed

Bug: auto_link() fails to include trailing slashes in URLs #9165

corenominal opened this issue Sep 3, 2024 · 13 comments · Fixed by #9169
Labels
bug Verified issues on the current code behavior or pull requests that will fix them

Comments

@corenominal
Copy link

corenominal commented Sep 3, 2024

PHP Version

8.2

CodeIgniter4 Version

4.5.4

CodeIgniter4 Installation Method

Composer (using codeigniter4/appstarter)

Which operating systems have you tested for this bug?

Linux

Which server did you use?

apache

Database

No response

What happened?

Using the URL Helper auto_link() function to automatically turn URLs contained in a string into links/HTML anchor elements. If the last character of the URL is a /, it is not included in either the anchor href value or text value.

Steps to Reproduce

Input:

$string = 'This is a test link to https://example.com/.';
$html = auto_link($string);
echo $html;

Output:

This is a test link to <a href="https://example.com">https://example.com</a>/.

Expected Output

This is a test link to <a href="https://example.com/">https://example.com/</a>.

Anything else?

I came across this as it seems that the default behaviour of Chrome and Firefox is to include the trailing slash when copying the URL from the browser's address bar. I tend to do this a lot and noticed that the auto_link() function always missed the trailing slash when creating anchor elements.

@corenominal corenominal added the bug Verified issues on the current code behavior or pull requests that will fix them label Sep 3, 2024
@kenjis
Copy link
Member

kenjis commented Sep 5, 2024

#158 The first commit of auto_link().

The regex is not changed from the beginning.

preg_match_all('#(\w*://|www\.)[^\s()<>;]+\w#i', $str, $matches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER)

But the regex in CI3 is the following:

preg_match_all('#(\w*://|www\.)[a-z0-9]+(-+[a-z0-9]+)*(\.[a-z0-9]+(-+[a-z0-9]+)*)+(/([^\s()<>;]+\w)?/?)?#i', $str, $matches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER)

It seems it was changed in bcit-ci/CodeIgniter@8c9e510.
But before the change, there was already /? (not …+\w#i but …+\w/?#i) in the end.

@corenominal
Copy link
Author

Thanks for looking into this. I messed up with my PR, apologies.

@kenjis kenjis mentioned this issue Sep 5, 2024
5 tasks
@kenjis
Copy link
Member

kenjis commented Sep 5, 2024

No problem. It is almost impossible for anyone to send a perfect PR at first.

I sent #9169. Please check.

@kenjis
Copy link
Member

kenjis commented Sep 5, 2024

@codeigniter4/core-team
This is clearly documented, but why auto_link() recognizes a string starting with ://?

The link <a href="://codeigniter.com">://codeigniter.com</a> does not seem to work.
If I put the link in the Welcome page, the URL will be http://localhost:8080/://codeigniter.com.

Note
The only URLs recognized are those that start with www. or with ://.
https://codeigniter4.github.io/CodeIgniter4/helpers/url_helper.html#auto_link

'This one: ://codeigniter.com must not break this one: http://codeigniter.com',
'This one: <a href="://codeigniter.com">://codeigniter.com</a> must not break this one: <a href="http://codeigniter.com">http://codeigniter.com</a>',

@michalsn
Copy link
Member

michalsn commented Sep 5, 2024

I guess that :// would support cases for: https, http, ftp, etc.

@kenjis
Copy link
Member

kenjis commented Sep 6, 2024

Yes, 'test abc://www.codeigniter.com test' will be 'test <a href="abc://www.codeigniter.com">abc://www.codeigniter.com</a> test'. But it does not make sense.

@michalsn
Copy link
Member

michalsn commented Sep 6, 2024

The range of what will "catch" on this rule is large, but thanks to this we have support for all direct links that are supported by many apps, like slack:// - https://api.slack.com/reference/deep-linking#client

@neznaika0
Copy link
Contributor

Telegram API https://core.telegram.org/api/links

@kenjis
Copy link
Member

kenjis commented Sep 6, 2024

Okay, the current behavior seems useful, and can catch unknown schemes.
But catching ://codeigniter.com does not make sense.
Should it catch [a-z0-9]+://?

@corenominal
Copy link
Author

RFC 3986 states that the scheme component is required.

The scheme and path components are required, though the path may be empty (no characters).

I would think that it should catch [a-z0-9]+://.

@kenjis
Copy link
Member

kenjis commented Sep 6, 2024

scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
https://datatracker.ietf.org/doc/html/rfc3986#section-3.1

[a-z][a-z0-9+\-.]*:// is better?

@michalsn
Copy link
Member

michalsn commented Sep 7, 2024

[a-z][a-z0-9+\-.]*:// is better?

Yes.

@kenjis
Copy link
Member

kenjis commented Sep 8, 2024

I sent #9180

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Verified issues on the current code behavior or pull requests that will fix them
Projects
None yet
4 participants