-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doesn't handle embedded comments #1
Comments
hmm, seems like a reasonable expectation. also seems like comment matching or stripping would be beyond the scope of this though. not sure what others think. perhaps we can add something to the readme that suggests that users strip comments first? |
👍 |
I think the only thing that we should change is first var match2 = re.exec('functionfoo(){}'); use case |
haha, but it's intentional.. lol because the |
@tunnckoCore Off the top of my head, use: E.g. |
@blakeembrey Hmm.. nope exactly. More use cases? |
Always.. regexes are not enough. |
I can't think of any reason why a regex shouldn't work (and correctly handle comments) for ES5 functions, but it won't work for ES6 functions:
|
I was wrong about the "nit". I'd forgotten that JavaScript doesn't provide a way to make |
Something like this should work (though note the parameter list includes comments):
e.g.: generatefunction /* 1 */ test /* 2 */ ( /* 3 */ foo /* 4 */ , /* 5 */ bar /* 6 */ ) /* 7 */ {
return foo;
}
function str (re) {
return re.toString().slice(1, -1);
}
var fn = str(/^ function (?: space+ (ident) )? space* \( space* (params)? space* \) space* \{ (body) \} $ /);
var params = str(/ ident (?: space* , space* ident )* /);
var space = str(/ (?: (?: \s+ ) | (?: \/ \* [\s\S]*? \* \/ ) | (?: \/\/ [^\r\n]* ) ) /);
var ident = str(/ (?: [a-zA-Z_$] [\w$]* ) /);
var body = str(/ (?: [\s\S]*? ) /);
fn = fn.replace(/params/g, params)
.replace(/space/g, space)
.replace(/ident/g, ident)
.replace(/body/g, body)
.replace(/\s+/g, '');
var re = new RegExp(fn);
console.log("test: %j", test.toString());
console.log(re);
console.log(re.exec(test.toString())) function
regex/^function(?:(?:(?:\s+)|(?:\/\*[\s\S]*?\*\/)|(?:\/\/[^\r\n]*))+((?:[a-zA-Z_$][\w$]*)))?(?:(?:\s+)|(?:\/\*[\s\S]*?\*\/)|(?:\/\/[^\r\n]*))*\((?:(?:\s+)|(?:\/\*[\s\S]*?\*\/)|(?:\/\/[^\r\n]*))*((?:[a-zA-Z_$][\w$]*)(?:(?:(?:\s+)|(?:\/\*[\s\S]*?\*\/)|(?:\/\/[^\r\n]*))*,(?:(?:\s+)|(?:\/\*[\s\S]*?\*\/)|(?:\/\/[^\r\n]*))*(?:[a-zA-Z_$][\w$]*))*)?(?:(?:\s+)|(?:\/\*[\s\S]*?\*\/)|(?:\/\/[^\r\n]*))*\)(?:(?:\s+)|(?:\/\*[\s\S]*?\*\/)|(?:\/\/[^\r\n]*))*\{((?:[\s\S]*?))\}$/ match
|
Hm. Looks good, but what about |
The regex originally included them. Put them back in if you want them :-) var fn = str(/^ function (?: space+ (ident) )? space* \( (params)? \) space* \{ (body) \} $ /);
var params = str(/ space* ident (?: space* , space* ident )* space* /);
|
Hm, good but I keep thinking of that
we already have libs for stripping comments, so its normal first to strip them after that using this regex. I dont know, others @regexhq/owners ? |
but we should only handle this case |
That's already handled by this regex, as well as other fixes e.g.: current regex// invalid identifiers: shouldn't match, but does
require('function-regex')().exec("function 1 (2, 3, 4) { }")
this regex// invalid identifiers: doesn't match
re.exec("function 1 (2, 3, 4) { }")
At any rate, this approach is designed to be easy to read, understand and modify, which is more than can be said for hacking on the regex directly. If you're happy for this regex to break on comments, you can just remove them from the var space = str(/ (?: \s+ ) /); |
I noticed that as well. Before we go to far down this path, what are the use cases we're discussing? IMHO the regex should do what's necessary to meet the requirements of the lib. is this going to be used in esprima or acorn? is the penalty of stripping comments first too much for this too be useful? or should it be used for fast parsing for code context? I'm not really nitpicking on this, I think the dialog is great. The bigger point is that the lib should have a well-defined purpose. that will answer your questions |
? Why would it be used in a parser? Parsers are tools for doing this correctly in all cases. This is a tool for doing it quickly in cases where a) it's sufficient (i.e. ES5 functions) and b) the documented exceptions (e.g. not supporting comments) don't matter. |
right, that's my point |
I use it in parse-function and I think we should not handle cases with comments and add notice in the readme to strip them before use the regex. |
btw |
There's no harm in the name, but that's not the kind of parser I was referring to. I wouldn't expect Acorn or Esprima to silently choke on such a fundamental feature of valid/real-world JavaScript as comments :-)
|
There are several issues with the regex:
functionfoo () { }
to be a valid function declarationNit:\{([\w\W\s\S]*)\}
is a roundabout way of saying\{(.*)\}
(though I'd go with\{(.*?)\}
)test
output:
The text was updated successfully, but these errors were encountered: