Hierarchy
Autolinker.htmlParser.HtmlNodeAutolinker.htmlParser.CommentNodeFiles
Represents an HTML comment node that has been parsed by the\nAutolinker.htmlParser.HtmlParser.
\n\nSee this class's superclass (Autolinker.htmlParser.HtmlNode) for more\ndetails.
\nThe text inside the comment tag. This text is stripped of any leading or\ntrailing whitespace.
\nDefaults to: ''
The offset of the HTML node in the original text that was parsed.
\nDefaults to: 0
The text that was matched for the HtmlNode.
\n\nDefaults to: ''
The configuration options for this class, specified\n in an Object.
\nOverrides: Autolinker.htmlParser.HtmlNode.constructor
Returns the comment inside the comment tag.
\nRetrieves the offset of the HtmlNode. This is the offset of the\nHTML node in the original string that was parsed.
\nReturns a string name for the type of node that this class represents.
\nHierarchy
Autolinker.htmlParser.HtmlNodeAutolinker.htmlParser.ElementNodeFiles
Represents an HTML element node that has been parsed by the Autolinker.htmlParser.HtmlParser.
\n\nSee this class's superclass (Autolinker.htmlParser.HtmlNode) for more\ndetails.
\ntrue
if the element (tag) is a closing tag, false
if its an opening\ntag.
Defaults to: false
The offset of the HTML node in the original text that was parsed.
\nDefaults to: 0
The name of the tag that was matched.
\nDefaults to: ''
The text that was matched for the HtmlNode.
\n\nDefaults to: ''
The configuration options for this class, specified\n in an Object.
\nOverrides: Autolinker.htmlParser.HtmlNode.constructor
Retrieves the offset of the HtmlNode. This is the offset of the\nHTML node in the original string that was parsed.
\nReturns the HTML element's (tag's) name. Ex: for an <img> tag,\nreturns \"img\".
\nReturns a string name for the type of node that this class represents.
\nDetermines if the HTML element (tag) is a closing tag. Ex: <div>\nreturns false
, while </div> returns true
.
Hierarchy
Autolinker.htmlParser.HtmlNodeAutolinker.htmlParser.EntityNodeFiles
Represents a known HTML entity node that has been parsed by the Autolinker.htmlParser.HtmlParser.\nEx: ' ', or '&#160;' (which will be retrievable from the getText\nmethod.
\n\nNote that this class will only be returned from the HtmlParser for the set of\nchecked HTML entity nodes defined by the Autolinker.htmlParser.HtmlParser.htmlCharacterEntitiesRegex.
\n\nSee this class's superclass (Autolinker.htmlParser.HtmlNode) for more\ndetails.
\nThe offset of the HTML node in the original text that was parsed.
\nDefaults to: 0
The text that was matched for the HtmlNode.
\n\nDefaults to: ''
The configuration properties for the Match instance,\nspecified in an Object (map).
\nRetrieves the offset of the HtmlNode. This is the offset of the\nHTML node in the original string that was parsed.
\nReturns a string name for the type of node that this class represents.
\nSubclasses
Files
Represents an HTML node found in an input string. An HTML node is one of the\nfollowing:
\n\nThe offset of the HTML node in the original text that was parsed.
\nDefaults to: 0
The text that was matched for the HtmlNode.
\n\nDefaults to: ''
The configuration properties for the Match instance,\nspecified in an Object (map).
\nFiles
An HTML parser implementation which simply walks an HTML string and returns an array of\nHtmlNodes that represent the basic HTML structure of the input string.
\n\nAutolinker uses this to only link URLs/emails/mentions within text nodes, effectively ignoring / \"walking\naround\" HTML tags.
\nFactory method to create an CommentNode.
\nThe offset of the match within the original HTML\n string.
\nThe full text of the tag (comment) that was\n matched, including its <!-- and -->.
\nThe full text of the comment that was matched.
\nFactory method to create an ElementNode.
\nThe offset of the match within the original HTML\n string.
\nThe full text of the tag (element) that was\n matched, including its attributes.
\nThe name of the tag. Ex: An <img> tag would\n be passed to this method as \"img\".
\ntrue
if it's a closing tag, false\n otherwise.
Factory method to create a EntityNode.
\nThe offset of the match within the original HTML\n string.
\nThe text that was matched for the HTML entity (such\n as ' ').
\nFactory method to create a TextNode.
\nThe offset of the match within the original HTML\n string.
\nThe text that was matched.
\nParses an HTML string and returns a simple array of HtmlNodes\nto represent the HTML structure of the input string.
\nThe HTML to parse.
\nParses text and HTML entity nodes from a given string. The input string\nshould not have any HTML tags (elements) within it.
\nThe offset of the text node match within the\n original HTML string.
\nThe string of text to parse. This is from an HTML\n text node.
\nAn array of HtmlNodes to\n represent the TextNodes and\n EntityNodes found.
\nHierarchy
Autolinker.htmlParser.HtmlNodeAutolinker.htmlParser.TextNodeFiles
Represents a text node that has been parsed by the Autolinker.htmlParser.HtmlParser.
\n\nSee this class's superclass (Autolinker.htmlParser.HtmlNode) for more\ndetails.
\nThe offset of the HTML node in the original text that was parsed.
\nDefaults to: 0
The text that was matched for the HtmlNode.
\n\nDefaults to: ''
The configuration properties for the Match instance,\nspecified in an Object (map).
\nRetrieves the offset of the HtmlNode. This is the offset of the\nHTML node in the original string that was parsed.
\nReturns a string name for the type of node that this class represents.
\nHierarchy
Autolinker.matcher.MatcherAutolinker.matcher.UrlFiles
Matcher to find URL matches in an input string.
\n\nSee this class's superclass (Autolinker.matcher.Matcher) for more details.
\ntrue
to decode percent-encoded characters in URL matches, false
to keep\n the percent-encoded characters.
Example when true
: https://en.wikipedia.org/wiki/San_Jos%C3%A9
will\n be displayed as https://en.wikipedia.org/wiki/San_José
.
Defaults to: true
The Object form of Autolinker.stripPrefix.
\nDefaults to: {scheme: true, www: true}
true
to remove the trailing slash from URL matches, false
to keep\n the trailing slash.
Example when true
: http://google.com/
will be displayed as\n http://google.com
.
Defaults to: true
The regular expression to match closing parenthesis in a URL match. See\nopenParensRe for more information.
\nDefaults to: /\\)/g
The regular expression to match URLs with an optional scheme, port\nnumber, path, query string, and hash anchor.
\n\nExample matches:
\n\nhttp://google.com\nwww.google.com\ngoogle.com/path/to/file?q1=1&q2=2#myAnchor\n
\n\nThis regular expression will have the following capturing groups:
\n\nThe regular expression to match opening parenthesis in a URL match.
\n\nThis is to determine if we have unbalanced parenthesis in the URL, and to\ndrop the final parenthesis that was matched if so.
\n\nEx: The text \"(check out: wikipedia.com/something(disambiguation))\"\nshould only autolink the inner \"wikipedia.com/something(disambiguation)\"\npart, so if we find that we have unbalanced parenthesis, we will drop the\nlast one for the match.
\nDefaults to: /\\(/g
A regular expression to use to check the character before a protocol-relative\nURL match. We don't want to match a protocol-relative URL if it is part\nof another word.
\n\nFor example, we want to match something like \"Go to: //google.com\",\nbut we don't want to match something like \"abc//google.com\"
\n\nThis regular expression is used to test the character before the '//'.
\n\nwordCharRegExp
\nThe configuration properties for the Match instance,\n specified in an Object (map).
\nOverrides: Autolinker.matcher.Matcher.constructor
Determine if there's an invalid character after the TLD in a URL. Valid\ncharacters after TLD are ':/?#'. Exclude scheme matched URLs from this\ncheck.
\nThe matched URL, if there was one. Will be an\n empty string if the match is not a URL match.
\nThe match URL string for a scheme\n match. Ex: 'http://yahoo.com'. This is used to match something like\n 'http://localhost', where we won't double check that the domain name\n has at least one '.' in it.
\nthe position where the invalid character was found. If\n no such character was found, returns -1
\nDetermines if a match found has an unmatched closing parenthesis,\nsquare bracket or curly bracket. If so, the symbol will be removed\nfrom the match itself, and appended after the generated anchor tag.
\n\nA match may have an extra closing parenthesis at the end of the match\nbecause the regular expression must include parenthesis for URLs such as\n\"wikipedia.com/something_(disambiguation)\", which should be auto-linked.
\n\nHowever, an extra parenthesis will be included when the URL itself is\nwrapped in parenthesis, such as in the case of:\n \"(wikipedia.com/something_(disambiguation))\"\nIn this case, the last closing parenthesis should not be part of the\nURL itself, and this method will return true
.
For square brackets in URLs such as in PHP arrays, the same behavior as\nparenthesis discussed above should happen:\n \"[http://www.example.com/foo.php?bar[]=1&bar[]=2&bar[]=3]\"\nThe closing square bracket should not be part of the URL itself, and this\nmethod will return true
.
The full match string from the matcherRegex.
\ntrue
if there is an unbalanced closing parenthesis or\n square bracket at the end of the matchStr
, false
otherwise.
Hierarchy
Autolinker.matcher.MatcherAutolinker.matcher.UrlFiles
Matcher to find URL matches in an input string.
\n\nSee this class's superclass (Autolinker.matcher.Matcher) for more details.
\ntrue
to decode percent-encoded characters in URL matches, false
to keep\n the percent-encoded characters.
Example when true
: https://en.wikipedia.org/wiki/San_Jos%C3%A9
will\n be displayed as https://en.wikipedia.org/wiki/San_José
.
Defaults to: true
The Object form of Autolinker.stripPrefix.
\nDefaults to: {scheme: true, www: true}
true
to remove the trailing slash from URL matches, false
to keep\n the trailing slash.
Example when true
: http://google.com/
will be displayed as\n http://google.com
.
Defaults to: true
The regular expression to match URLs with an optional scheme, port\nnumber, path, query string, and hash anchor.
\n\nExample matches:
\n\nhttp://google.com\nwww.google.com\ngoogle.com/path/to/file?q1=1&q2=2#myAnchor\n
\n\nThis regular expression will have the following capturing groups:
\n\nA regular expression to use to check the character before a protocol-relative\nURL match. We don't want to match a protocol-relative URL if it is part\nof another word.
\n\nFor example, we want to match something like \"Go to: //google.com\",\nbut we don't want to match something like \"abc//google.com\"
\n\nThis regular expression is used to test the character before the '//'.
\n\nwordCharRegExp
\nThe configuration properties for the Match instance,\n specified in an Object (map).
\nOverrides: Autolinker.matcher.Matcher.constructor
Determine if there's an invalid character after the TLD in a URL. Valid\ncharacters after TLD are ':/?#'. Exclude scheme matched URLs from this\ncheck.
\nThe matched URL, if there was one. Will be an\n empty string if the match is not a URL match.
\nThe match URL string for a scheme\n match. Ex: 'http://yahoo.com'. This is used to match something like\n 'http://localhost', where we won't double check that the domain name\n has at least one '.' in it.
\nthe position where the invalid character was found. If\n no such character was found, returns -1
\nDetermines if a match found has an unmatched closing parenthesis,\nsquare bracket or curly bracket. If so, the symbol will be removed\nfrom the match itself, and appended after the generated anchor tag.
\n\nA match may have an extra closing parenthesis at the end of the match\nbecause the regular expression must include parenthesis for URLs such as\n\"wikipedia.com/something_(disambiguation)\", which should be auto-linked.
\n\nHowever, an extra parenthesis will be included when the URL itself is\nwrapped in parenthesis, such as in the case of:\n \"(wikipedia.com/something_(disambiguation))\"\nIn this case, the last closing parenthesis should not be part of the\nURL itself, and this method will return true
.
For square brackets in URLs such as in PHP arrays, the same behavior as\nparenthesis discussed above should happen:\n \"[http://www.example.com/foo.php?bar[]=1&bar[]=2&bar[]=3]\"\nThe closing square bracket should not be part of the URL itself, and this\nmethod will return true
.
The full match string from the matcherRegex.
\ntrue
if there is an unbalanced closing parenthesis or\n square bracket at the end of the matchStr
, false
otherwise.
Files
NOTE: This is a private utility class for internal use by the framework. Don't rely on its existence.
Used by Autolinker to filter out false URL positives from the\nUrlMatcher.
\n\nDue to the limitations of regular expressions (including the missing feature\nof look-behinds in JS regular expressions), we cannot always determine the\nvalidity of a given match. This class applies a bit of additional logic to\nfilter out any false positives that have been matched by the\nUrlMatcher.
\nRegex to test for a full protocol, with the two trailing slashes. Ex: 'http://'
\nDefaults to: /^[A-Za-z][-.+A-Za-z0-9]*:\\/\\//
Regex to determine if at least one word char exists after the protocol (i.e. after the ':')
\nRegex to determine if the string is a valid IP address
\nDefaults to: /[0-9][0-9]?[0-9]?\\.[0-9][0-9]?[0-9]?\\.[0-9][0-9]?[0-9]?\\.[0-9][0-9]?[0-9]?(:[0-9]*)?\\/?$/
Regex to find the URI scheme, such as 'mailto:'.
\n\nThis is used to filter out 'javascript:' and 'vbscript:' schemes.
\nDefaults to: /^[A-Za-z][-.+A-Za-z0-9]*:/
Determines if a given URL match found by the UrlMatcher\nis valid. Will return false
for:
1) URL matches which do not have at least have one period ('.') in the\n domain name (effectively skipping over matches like \"abc:def\").\n However, URL matches with a protocol will be allowed (ex: 'http://localhost')\n2) URL matches which do not have at least one word character in the\n domain name (effectively skipping over matches like \"git:1.0\").\n3) A protocol-relative url match (a URL beginning with '//') whose\n previous character is a word character (effectively skipping over\n strings like \"abc//google.com\")
\n\nOtherwise, returns true
.
The matched URL, if there was one. Will be an\n empty string if the match is not a URL match.
\nThe match URL string for a protocol\n match. Ex: 'http://yahoo.com'. This is used to match something like\n 'http://localhost', where we won't double check that the domain name\n has at least one '.' in it.
\ntrue
if the match given is valid and should be\n processed, or false
if the match is invalid and/or should just not be\n processed.
Determines if the URI scheme is a valid scheme to be autolinked. Returns\nfalse
if the scheme is 'javascript:' or 'vbscript:'
The match URL string for a full URI scheme\n match. Ex: 'http://yahoo.com' or 'mailto:a@a.com'.
\ntrue
if the scheme is a valid one, false
otherwise.
Determines if a URL match does not have at least one word character after\nthe protocol (i.e. in the domain name).
\n\nAt least one letter character must exist in the domain name after a\nprotocol match. Ex: skip over something like \"git:1.0\"
\nThe matched URL, if there was one. Will be an\n empty string if the match is not a URL match.
\nThe match URL string for a protocol\n match. Ex: 'http://yahoo.com'. This is used to know whether or not we\n have a protocol in the URL string, in order to check for a word\n character after the protocol separator (':').
\ntrue
if the URL match does not have at least one word\n character in it after the protocol, false
otherwise.
Determines if a URL match does not have either:
\n\na) a full protocol (i.e. 'http://'), or\nb) at least one dot ('.') in the domain name (for a non-full-protocol\n match).
\n\nEither situation is considered an invalid URL (ex: 'git:d' does not have\neither the '://' part, or at least one dot in the domain name. If the\nmatch was 'git:abc.com', we would consider this valid.)
\nThe matched URL, if there was one. Will be an\n empty string if the match is not a URL match.
\nThe match URL string for a protocol\n match. Ex: 'http://yahoo.com'. This is used to match something like\n 'http://localhost', where we won't double check that the domain name\n has at least one '.' in it.
\ntrue
if the URL match does not have a full protocol,\n or at least one dot ('.') in a non-full-protocol match.
Files
NOTE: This is a private utility class for internal use by the framework. Don't rely on its existence.
Used by Autolinker to filter out false URL positives from the\nUrlMatcher.
\n\nDue to the limitations of regular expressions (including the missing feature\nof look-behinds in JS regular expressions), we cannot always determine the\nvalidity of a given match. This class applies a bit of additional logic to\nfilter out any false positives that have been matched by the\nUrlMatcher.
\nRegex to test for a full protocol, with the two trailing slashes. Ex: 'http://'
\nDefaults to: /^[A-Za-z][-.+A-Za-z0-9]*:\\/\\//
Regex to determine if at least one word char exists after the protocol (i.e. after the ':')
\nRegex to determine if the string is a valid IP address
\nDefaults to: /[0-9][0-9]?[0-9]?\\.[0-9][0-9]?[0-9]?\\.[0-9][0-9]?[0-9]?\\.[0-9][0-9]?[0-9]?(:[0-9]*)?\\/?$/
Regex to find the URI scheme, such as 'mailto:'.
\n\nThis is used to filter out 'javascript:' and 'vbscript:' schemes.
\nDefaults to: /^[A-Za-z][-.+A-Za-z0-9]*:/
Determines if a given URL match found by the UrlMatcher\nis valid. Will return false
for:
1) URL matches which do not have at least have one period ('.') in the\n domain name (effectively skipping over matches like \"abc:def\").\n However, URL matches with a protocol will be allowed (ex: 'http://localhost')\n2) URL matches which do not have at least one word character in the\n domain name (effectively skipping over matches like \"git:1.0\").\n However, URL matches with a protocol will be allowed (ex: 'intra-net://271219.76')\n3) A protocol-relative url match (a URL beginning with '//') whose\n previous character is a word character (effectively skipping over\n strings like \"abc//google.com\")
\n\nOtherwise, returns true
.
The matched URL, if there was one. Will be an\n empty string if the match is not a URL match.
\nThe match URL string for a protocol\n match. Ex: 'http://yahoo.com'. This is used to match something like\n 'http://localhost', where we won't double check that the domain name\n has at least one '.' in it.
\ntrue
if the match given is valid and should be\n processed, or false
if the match is invalid and/or should just not be\n processed.
Determines if the URI scheme is a valid scheme to be autolinked. Returns\nfalse
if the scheme is 'javascript:' or 'vbscript:'
The match URL string for a full URI scheme\n match. Ex: 'http://yahoo.com' or 'mailto:a@a.com'.
\ntrue
if the scheme is a valid one, false
otherwise.
Determines if a URL match does not have either:
\n\na) a full protocol (i.e. 'http://'), or\nb) at least one word character after the protocol (i.e. in the domain name)
\n\nAt least one letter character must exist in the domain name after a\nprotocol match. Ex: skip over something like \"git:1.0\"
\nThe matched URL, if there was one. Will be an\n empty string if the match is not a URL match.
\nThe match URL string for a protocol\n match. Ex: 'http://yahoo.com'. This is used to know whether or not we\n have a protocol in the URL string, in order to check for a word\n character after the protocol separator (':').
\ntrue
if the URL match does not have a full protocol, or\nat least one word character in it, false
otherwise.
Determines if a URL match does not have either:
\n\na) a full protocol (i.e. 'http://'), or\nb) at least one dot ('.') in the domain name (for a non-full-protocol\n match).
\n\nEither situation is considered an invalid URL (ex: 'git:d' does not have\neither the '://' part, or at least one dot in the domain name. If the\nmatch was 'git:abc.com', we would consider this valid.)
\nThe matched URL, if there was one. Will be an\n empty string if the match is not a URL match.
\nThe match URL string for a protocol\n match. Ex: 'http://yahoo.com'. This is used to match something like\n 'http://localhost', where we won't double check that the domain name\n has at least one '.' in it.
\ntrue
if the URL match does not have a full protocol,\n or at least one dot ('.') in a non-full-protocol match.
Global variables and functions.
\nThe string form of a regular expression that would match all of the\nalphabetic (\"letter\") chars, emoji, and combining marks in the unicode character set\nwhen placed in a RegExp character class ([]
). This includes all\ninternational alphabetic characters.
These would be the characters matched by unicode regex engines \\p{L}\\p{M}
\nescapes and emoji characters.
The string form of a regular expression that would match all of the\nalphabetic (\"letter\") chars in the unicode character set when placed in a\nRegExp character class ([]
). This includes all international alphabetic\ncharacters.
These would be the characters matched by unicode regex engines \\p{L}
\nescape (\"all letters\").
Taken from the XRegExp library: http://xregexp.com/ (thanks @https://github.com/slevithan)\nSpecifically: http://xregexp.com/v/3.2.0/xregexp-all.js, the 'Letter'\n regex's bmp
\n\nVERY IMPORTANT: This set of characters is defined inside of a Regular\n Expression literal rather than a string literal to prevent UglifyJS from\n compressing the unicode escape sequences into their actual unicode\n characters. If Uglify compresses these into the unicode characters\n themselves, this results in the error \"Range out of order in character\n class\" when these characters are used inside of a Regular Expression\n character class ([]
). See usages of this const. Alternatively, we can set\n the UglifyJS option ascii_only
to true for the build, but that doesn't\n help others who are pulling in Autolinker into their own build and running\n UglifyJS themselves.
The string form of a regular expression that would match all of the\nletters, combining marks, and decimal number chars in the unicode character\nset when placed in a RegExp character class ([]
).
These would be the characters matched by unicode regex engines\n[\\p{L}\\p{M}\\p{Nd}]
escape (\"all letters, combining marks, and decimal\nnumbers\")
The string form of a regular expression that would match all of the\nletters and decimal number chars in the unicode character set when placed in\na RegExp character class ([]
).
These would be the characters matched by unicode regex engines\n[\\p{L}\\p{Nd}]
escape (\"all letters and decimal numbers\")
Regular expression to match the range of ASCII control characters (0-31), and\nthe backspace char (127)
\nDefaults to: /[\\x00-\\x1F\\x7F]/
The string form of a regular expression that would match all of the\ndecimal number chars in the unicode character set when placed in a RegExp\ncharacter class ([]
).
These would be the characters matched by unicode regex engines \\p{Nd}
\nescape (\"all decimal numbers\")
Taken from the XRegExp library: http://xregexp.com/ (thanks @https://github.com/slevithan)\nSpecifically: http://xregexp.com/v/3.2.0/xregexp-all.js, the 'Decimal_Number'\n regex's bmp
\n\nVERY IMPORTANT: This set of characters is defined inside of a Regular\n Expression literal rather than a string literal to prevent UglifyJS from\n compressing the unicode escape sequences into their actual unicode\n characters. If Uglify compresses these into the unicode characters\n themselves, this results in the error \"Range out of order in character\n class\" when these characters are used inside of a Regular Expression\n character class ([]
). See usages of this const. Alternatively, we can set\n the UglifyJS option ascii_only
to true for the build, but that doesn't\n help others who are pulling in Autolinker into their own build and running\n UglifyJS themselves.
Regular expression to match ASCII digits
\nDefaults to: /[0-9]/
A regular expression that is simply the character class of the characters\nthat may be used in a domain name, minus the '-' or '.'
\nA regular expression to match domain names of a URL or email address.\nEx: 'google', 'yahoo', 'some-other-company', etc.
\nThe string form of a regular expression that would match all emoji characters\nBased on the emoji regex defined in this article: https://thekevinscott.com/emojis-in-javascript/
\nThe regular expression that matches common HTML character entities.
\n\nIgnoring & as it could be part of a query string -- handling it separately.
\nDefaults to: /( | |<|<|>|>|"|"|')/gi
The regular expression used to pull out HTML tags from a string. Handles namespaced HTML tags and\nattribute names, as specified by http://www.w3.org/TR/html-markup/syntax.html.
\n\nCapturing groups:
\n\n<!--
and -->
.Regular expression to match upper and lowercase ASCII letters
\nDefaults to: /[A-Za-z]/
The string form of a regular expression that would match all of the\ncombining mark characters in the unicode character set when placed in a\nRegExp character class ([]
).
These would be the characters matched by unicode regex engines \\p{M}
\nescape (\"all marks\").
Taken from the XRegExp library: http://xregexp.com/ (thanks @https://github.com/slevithan)\nSpecifically: http://xregexp.com/v/3.2.0/xregexp-all.js, the 'Mark'\n regex's bmp
\n\nVERY IMPORTANT: This set of characters is defined inside of a Regular\n Expression literal rather than a string literal to prevent UglifyJS from\n compressing the unicode escape sequences into their actual unicode\n characters. If Uglify compresses these into the unicode characters\n themselves, this results in the error \"Range out of order in character\n class\" when these characters are used inside of a Regular Expression\n character class ([]
). See usages of this const. Alternatively, we can set\n the UglifyJS option ascii_only
to true for the build, but that doesn't\n help others who are pulling in Autolinker into their own build and running\n UglifyJS themselves.
Regular expression to match quote characters
\nDefaults to: /['"]/
Regular expression to match whitespace
\nDefaults to: /\\s/
Captures the tag name from the start of the tag to the current character\nindex, and converts it to lower case
\nAssigns (shallow copies) the properties of src
onto dest
, if the\ncorresponding property on dest
=== undefined
.
The destination object.
\nThe source object.
\nThe destination object (dest
)
Truncates the str
at len - ellipsisChars.length
, and adds the ellipsisChars
to the\nend of the string (by default, two periods: '..'). If the str
length does not exceed\nlen
, the string will be returned unchanged.
The string to truncate and add an ellipsis to.
\nThe length to truncate the string at.
\nThe ellipsis character(s) to add to the end of str
\n when truncated. Defaults to '...'
Defaults to: ...
Once we've decided to emit an open tag, that means we can also emit the\ntext node before it.
\nA function to match domain names of a URL or email address.\nEx: 'google', 'yahoo', 'some-other-company', etc.
\nSupports Array.prototype.indexOf()
functionality for old IE (IE8 and below).
The array to find an element of.
\nThe element to find in the array, and return the index of.
\nThe index of the element
, or -1 if it was not found.
Parses an HTML string, calling the callbacks to notify of tags and text.
\n\nThis file previously used a regular expression to find html tags in the input\ntext. Unfortunately, we ran into a bunch of catastrophic backtracking issues\nwith certain input text, causing Autolinker to either hang or just take a\nreally long time to parse the string.
\n\nThe current code is intended to be a O(n) algorithm that walks through\nthe string in one pass, and tries to be as cheap as possible. We don't need\nto implement the full HTML spec, but rather simply determine where the string\nlooks like an HTML tag, and where it looks like text (so that we can autolink\nthat).
\n\nThis state machine parser is intended just to be a simple but performant\nparser of HTML for the subset of requirements we have. We simply need to:
\n\nWe don't need to:
\n\nThe other intention behind this is that we didn't want to add external\ndependencies on the Autolinker utility which would increase its size. For\ninstance, adding htmlparser2 adds 125kb to the minified output file,\nincreasing its final size from 47kb to 172kb (at the time of writing). It\nalso doesn't work exactly correctly, treating the string \"<3 blah blah blah\"\nas an HTML tag.
\n\nReference for HTML spec:
\n\nhttps://www.w3.org/TR/html51/syntax.html#sec-tokenization\n
\nThe HTML to parse
\nCallback function to call when an open\n tag is parsed. Called with the tagName as its argument.
\nCallback function to call when a close\n tag is parsed. Called with the tagName as its argument. If a self-closing\n tag is found, onCloseTag
is called immediately after onOpenTag
.
Callback function to call when text (i.e\n not an HTML tag) is parsed. Called with the text (string) as its first\n argument, and offset (number) into the string as its second.
\nCauses the main loop to re-consume the current character, such as after\nencountering a \"parse error\" that changed state and needs to reconsume\nthe same character in that new state.
\nRemoves array elements based on a filtering function. Mutates the input\narray.
\n\nUsing this instead of the ES5 Array.prototype.filter() function, to allow\nAutolinker compatibility with IE8, and also to prevent creating many new\narrays in memory for filtering.
\nThe array to remove elements from. This array is\n mutated.
\nA function which should return true
to\n remove an element.
The mutated input arr
.
Resets the state back to the Data state, and removes the current tag.
\n\nWe'll generally run this function whenever a \"parse error\" is\nencountered, where the current tag that is being read no longer looks\nlike a real HTML tag.
\nPerforms the functionality of what modern browsers do when String.prototype.split()
is called\nwith a regular expression that contains capturing parenthesis.
For example:
\n\n// Modern browsers:\n\"a,b,c\".split( /(,)/ ); // --> [ 'a', ',', 'b', ',', 'c' ]\n\n// Old IE (including IE8):\n\"a,b,c\".split( /(,)/ ); // --> [ 'a', 'b', 'c' ]\n
\n\nThis method emulates the functionality of modern browsers for the old IE case.
\nThe string to split.
\nThe regular expression to split the input str
on. The splitting\n character(s) will be spliced into the array, as in the \"modern browsers\" example in the\n description of this method.\n Note #1: the supplied regular expression must have the 'g' flag specified.\n Note #2: for simplicity's sake, the regular expression does not need\n to contain capturing parenthesis - it will be assumed that any match has them.
The split array of strings, with the splitting character(s) included.
\nStarts a new HTML tag at the current index, ignoring any previous HTML\ntag that was being read.
\n\nWe'll generally run this function whenever we read a new '<' character,\nincluding when we read a '<' character inside of an HTML tag that we were\npreviously reading.
\nFor DOCTYPES in particular, we don't care about the attributes. Just\nadvance to the '>' character and emit the tag, unless we find a '<'\ncharacter in which case we'll start a new tag.
\n\nExample doctype tag:\n <!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01//EN\" \"http://www.w3.org/TR/html4/strict.dtd\">
\n\nActual spec: https://www.w3.org/TR/html51/syntax.html#doctype-state
\nFunction that should never be called but is used to check that every\nenum value is handled using TypeScript's 'never' type.
\nA truncation feature where the ellipsis will be placed at the end of the URL.
\nThe maximum length of the truncated output URL string.
\nThe characters to place within the url, e.g. \"..\".
\nThe truncated URL.
\nDate: 2015-10-05\nAuthor: Kasper Søfren soefritz@gmail.com (https://github.com/kafoso)
\n\nA truncation feature, where the ellipsis will be placed in the dead-center of the URL.
\nA URL.
\nThe maximum length of the truncated output URL string.
\nThe characters to place within the url, e.g. \"..\".
\nThe truncated URL.
\nDate: 2015-10-05\nAuthor: Kasper Søfren soefritz@gmail.com (https://github.com/kafoso)
\n\nA truncation feature, where the ellipsis will be placed at a section within\nthe URL making it still somewhat human readable.
\nA URL.
\nThe maximum length of the truncated output URL string.
\nThe characters to place within the url, e.g. \"...\".
\nThe truncated URL.
\nGlobal variables and functions.
\nThe string form of a regular expression that would match all of the\nalphabetic (\"letter\") chars, emoji, and combining marks in the unicode character set\nwhen placed in a RegExp character class ([]
). This includes all\ninternational alphabetic characters.
These would be the characters matched by unicode regex engines \\p{L}\\p{M}
\nescapes and emoji characters.
The string form of a regular expression that would match all of the\nalphabetic (\"letter\") chars in the unicode character set when placed in a\nRegExp character class ([]
). This includes all international alphabetic\ncharacters.
These would be the characters matched by unicode regex engines \\p{L}
\nescape (\"all letters\").
Taken from the XRegExp library: http://xregexp.com/ (thanks @https://github.com/slevithan)\nSpecifically: http://xregexp.com/v/3.2.0/xregexp-all.js, the 'Letter'\n regex's bmp
\n\nVERY IMPORTANT: This set of characters is defined inside of a Regular\n Expression literal rather than a string literal to prevent UglifyJS from\n compressing the unicode escape sequences into their actual unicode\n characters. If Uglify compresses these into the unicode characters\n themselves, this results in the error \"Range out of order in character\n class\" when these characters are used inside of a Regular Expression\n character class ([]
). See usages of this const. Alternatively, we can set\n the UglifyJS option ascii_only
to true for the build, but that doesn't\n help others who are pulling in Autolinker into their own build and running\n UglifyJS themselves.
The string form of a regular expression that would match all of the\nletters, combining marks, and decimal number chars in the unicode character\nset when placed in a RegExp character class ([]
).
These would be the characters matched by unicode regex engines\n[\\p{L}\\p{M}\\p{Nd}]
escape (\"all letters, combining marks, and decimal\nnumbers\")
The string form of a regular expression that would match all of the\nletters and decimal number chars in the unicode character set when placed in\na RegExp character class ([]
).
These would be the characters matched by unicode regex engines\n[\\p{L}\\p{Nd}]
escape (\"all letters and decimal numbers\")
Regular expression to match the range of ASCII control characters (0-31), and\nthe backspace char (127)
\nDefaults to: /[\\x00-\\x1F\\x7F]/
The string form of a regular expression that would match all of the\ndecimal number chars in the unicode character set when placed in a RegExp\ncharacter class ([]
).
These would be the characters matched by unicode regex engines \\p{Nd}
\nescape (\"all decimal numbers\")
Taken from the XRegExp library: http://xregexp.com/ (thanks @https://github.com/slevithan)\nSpecifically: http://xregexp.com/v/3.2.0/xregexp-all.js, the 'Decimal_Number'\n regex's bmp
\n\nVERY IMPORTANT: This set of characters is defined inside of a Regular\n Expression literal rather than a string literal to prevent UglifyJS from\n compressing the unicode escape sequences into their actual unicode\n characters. If Uglify compresses these into the unicode characters\n themselves, this results in the error \"Range out of order in character\n class\" when these characters are used inside of a Regular Expression\n character class ([]
). See usages of this const. Alternatively, we can set\n the UglifyJS option ascii_only
to true for the build, but that doesn't\n help others who are pulling in Autolinker into their own build and running\n UglifyJS themselves.
Regular expression to match ASCII digits
\nDefaults to: /[0-9]/
A regular expression that is simply the character class of the characters\nthat may be used in a domain name, minus the '-' or '.'
\nA regular expression to match domain names of a URL or email address.\nEx: 'google', 'yahoo', 'some-other-company', etc.
\nThe string form of a regular expression that would match all emoji characters\nBased on the emoji regex defined in this article: https://thekevinscott.com/emojis-in-javascript/
\nRegular expression to match upper and lowercase ASCII letters
\nDefaults to: /[A-Za-z]/
The string form of a regular expression that would match all of the\ncombining mark characters in the unicode character set when placed in a\nRegExp character class ([]
).
These would be the characters matched by unicode regex engines \\p{M}
\nescape (\"all marks\").
Taken from the XRegExp library: http://xregexp.com/ (thanks @https://github.com/slevithan)\nSpecifically: http://xregexp.com/v/3.2.0/xregexp-all.js, the 'Mark'\n regex's bmp
\n\nVERY IMPORTANT: This set of characters is defined inside of a Regular\n Expression literal rather than a string literal to prevent UglifyJS from\n compressing the unicode escape sequences into their actual unicode\n characters. If Uglify compresses these into the unicode characters\n themselves, this results in the error \"Range out of order in character\n class\" when these characters are used inside of a Regular Expression\n character class ([]
). See usages of this const. Alternatively, we can set\n the UglifyJS option ascii_only
to true for the build, but that doesn't\n help others who are pulling in Autolinker into their own build and running\n UglifyJS themselves.
Regular expression to match quote characters
\nDefaults to: /['"]/
Regular expression to match whitespace
\nDefaults to: /\\s/
Captures the tag name from the start of the tag to the current character\nindex, and converts it to lower case
\nAssigns (shallow copies) the properties of src
onto dest
, if the\ncorresponding property on dest
=== undefined
.
The destination object.
\nThe source object.
\nThe destination object (dest
)
Truncates the str
at len - ellipsisChars.length
, and adds the ellipsisChars
to the\nend of the string (by default, two periods: '..'). If the str
length does not exceed\nlen
, the string will be returned unchanged.
The string to truncate and add an ellipsis to.
\nThe length to truncate the string at.
\nThe ellipsis character(s) to add to the end of str
\n when truncated. Defaults to '...'
Defaults to: ...
Once we've decided to emit an open tag, that means we can also emit the\ntext node before it.
\nA function to match domain names of a URL or email address.\nEx: 'google', 'yahoo', 'some-other-company', etc.
\nSupports Array.prototype.indexOf()
functionality for old IE (IE8 and below).
The array to find an element of.
\nThe element to find in the array, and return the index of.
\nThe index of the element
, or -1 if it was not found.
Parses an HTML string, calling the callbacks to notify of tags and text.
\n\nThis file previously used a regular expression to find html tags in the input\ntext. Unfortunately, we ran into a bunch of catastrophic backtracking issues\nwith certain input text, causing Autolinker to either hang or just take a\nreally long time to parse the string.
\n\nThe current code is intended to be a O(n) algorithm that walks through\nthe string in one pass, and tries to be as cheap as possible. We don't need\nto implement the full HTML spec, but rather simply determine where the string\nlooks like an HTML tag, and where it looks like text (so that we can autolink\nthat).
\n\nThis state machine parser is intended just to be a simple but performant\nparser of HTML for the subset of requirements we have. We simply need to:
\n\nWe don't need to:
\n\nThe other intention behind this is that we didn't want to add external\ndependencies on the Autolinker utility which would increase its size. For\ninstance, adding htmlparser2 adds 125kb to the minified output file,\nincreasing its final size from 47kb to 172kb (at the time of writing). It\nalso doesn't work exactly correctly, treating the string \"<3 blah blah blah\"\nas an HTML tag.
\n\nReference for HTML spec:
\n\nhttps://www.w3.org/TR/html51/syntax.html#sec-tokenization\n
\nThe HTML to parse
\nCallback function to call when an open\n tag is parsed. Called with the tagName as its argument.
\nCallback function to call when a close\n tag is parsed. Called with the tagName as its argument. If a self-closing\n tag is found, onCloseTag
is called immediately after onOpenTag
.
Callback function to call when text (i.e\n not an HTML tag) is parsed. Called with the text (string) as its first\n argument, and offset (number) into the string as its second.
\nCauses the main loop to re-consume the current character, such as after\nencountering a \"parse error\" that changed state and needs to reconsume\nthe same character in that new state.
\nRemoves array elements based on a filtering function. Mutates the input\narray.
\n\nUsing this instead of the ES5 Array.prototype.filter() function, to allow\nAutolinker compatibility with IE8, and also to prevent creating many new\narrays in memory for filtering.
\nThe array to remove elements from. This array is\n mutated.
\nA function which should return true
to\n remove an element.
The mutated input arr
.
Resets the state back to the Data state, and removes the current tag.
\n\nWe'll generally run this function whenever a \"parse error\" is\nencountered, where the current tag that is being read no longer looks\nlike a real HTML tag.
\nPerforms the functionality of what modern browsers do when String.prototype.split()
is called\nwith a regular expression that contains capturing parenthesis.
For example:
\n\n// Modern browsers:\n\"a,b,c\".split( /(,)/ ); // --> [ 'a', ',', 'b', ',', 'c' ]\n\n// Old IE (including IE8):\n\"a,b,c\".split( /(,)/ ); // --> [ 'a', 'b', 'c' ]\n
\n\nThis method emulates the functionality of modern browsers for the old IE case.
\nThe string to split.
\nThe regular expression to split the input str
on. The splitting\n character(s) will be spliced into the array, as in the \"modern browsers\" example in the\n description of this method.\n Note #1: the supplied regular expression must have the 'g' flag specified.\n Note #2: for simplicity's sake, the regular expression does not need\n to contain capturing parenthesis - it will be assumed that any match has them.
The split array of strings, with the splitting character(s) included.
\nStarts a new HTML tag at the current index, ignoring any previous HTML\ntag that was being read.
\n\nWe'll generally run this function whenever we read a new '<' character,\nincluding when we read a '<' character inside of an HTML tag that we were\npreviously reading.
\nFor DOCTYPES in particular, we don't care about the attributes. Just\nadvance to the '>' character and emit the tag, unless we find a '<'\ncharacter in which case we'll start a new tag.
\n\nExample doctype tag:\n <!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01//EN\" \"http://www.w3.org/TR/html4/strict.dtd\">
\n\nActual spec: https://www.w3.org/TR/html51/syntax.html#doctype-state
\nFunction that should never be called but is used to check that every\nenum value is handled using TypeScript's 'never' type.
\nA truncation feature where the ellipsis will be placed at the end of the URL.
\nThe maximum length of the truncated output URL string.
\nThe characters to place within the url, e.g. \"..\".
\nThe truncated URL.
\nDate: 2015-10-05\nAuthor: Kasper Søfren soefritz@gmail.com (https://github.com/kafoso)
\n\nA truncation feature, where the ellipsis will be placed in the dead-center of the URL.
\nA URL.
\nThe maximum length of the truncated output URL string.
\nThe characters to place within the url, e.g. \"..\".
\nThe truncated URL.
\nDate: 2015-10-05\nAuthor: Kasper Søfren soefritz@gmail.com (https://github.com/kafoso)
\n\nA truncation feature, where the ellipsis will be placed at a section within\nthe URL making it still somewhat human readable.
\nA URL.
\nThe maximum length of the truncated output URL string.
\nThe characters to place within the url, e.g. \"...\".
\nThe truncated URL.
\n"use strict"; -Object.defineProperty(exports, "__esModule", { value: true }); -var tslib_1 = require("tslib"); -var html_node_1 = require("./html-node"); -/** - * @class Autolinker.htmlParser.CommentNode - * @extends Autolinker.htmlParser.HtmlNode - * - * Represents an HTML comment node that has been parsed by the - * {@link Autolinker.htmlParser.HtmlParser}. - * - * See this class's superclass ({@link Autolinker.htmlParser.HtmlNode}) for more - * details. - */ -var CommentNode = (function (_super) { - tslib_1.__extends(CommentNode, _super); - /** - * @method constructor - * @param {Object} cfg The configuration options for this class, specified - * in an Object. - */ - function CommentNode(cfg) { - var _this = _super.call(this, cfg) || this; - /** - * @cfg {String} comment (required) - * - * The text inside the comment tag. This text is stripped of any leading or - * trailing whitespace. - */ - _this.comment = ''; // default value just to get the above doc comment in the ES5 output and documentation generator - _this.comment = cfg.comment; - return _this; - } - /** - * Returns a string name for the type of node that this class represents. - * - * @return {String} - */ - CommentNode.prototype.getType = function () { - return 'comment'; - }; - /** - * Returns the comment inside the comment tag. - * - * @return {String} - */ - CommentNode.prototype.getComment = function () { - return this.comment; - }; - return CommentNode; -}(html_node_1.HtmlNode)); -exports.CommentNode = CommentNode; - -//# sourceMappingURL=comment-node.js.map -- - diff --git a/docs/api/source/element-node.html b/docs/api/source/element-node.html deleted file mode 100644 index eb965796..00000000 --- a/docs/api/source/element-node.html +++ /dev/null @@ -1,90 +0,0 @@ - - - - -
"use strict"; -Object.defineProperty(exports, "__esModule", { value: true }); -var tslib_1 = require("tslib"); -var html_node_1 = require("./html-node"); -/** - * @class Autolinker.htmlParser.ElementNode - * @extends Autolinker.htmlParser.HtmlNode - * - * Represents an HTML element node that has been parsed by the {@link Autolinker.htmlParser.HtmlParser}. - * - * See this class's superclass ({@link Autolinker.htmlParser.HtmlNode}) for more - * details. - */ -var ElementNode = (function (_super) { - tslib_1.__extends(ElementNode, _super); - /** - * @method constructor - * @param {Object} cfg The configuration options for this class, specified - * in an Object. - */ - function ElementNode(cfg) { - var _this = _super.call(this, cfg) || this; - /** - * @cfg {String} tagName (required) - * - * The name of the tag that was matched. - */ - _this.tagName = ''; // default value just to get the above doc comment in the ES5 output and documentation generator - /** - * @cfg {Boolean} closing (required) - * - * `true` if the element (tag) is a closing tag, `false` if its an opening - * tag. - */ - _this.closing = false; // default value just to get the above doc comment in the ES5 output and documentation generator - _this.tagName = cfg.tagName; - _this.closing = cfg.closing; - return _this; - } - /** - * Returns a string name for the type of node that this class represents. - * - * @return {String} - */ - ElementNode.prototype.getType = function () { - return 'element'; - }; - /** - * Returns the HTML element's (tag's) name. Ex: for an <img> tag, - * returns "img". - * - * @return {String} - */ - ElementNode.prototype.getTagName = function () { - return this.tagName; - }; - /** - * Determines if the HTML element (tag) is a closing tag. Ex: <div> - * returns `false`, while </div> returns `true`. - * - * @return {Boolean} - */ - ElementNode.prototype.isClosing = function () { - return this.closing; - }; - return ElementNode; -}(html_node_1.HtmlNode)); -exports.ElementNode = ElementNode; - -//# sourceMappingURL=element-node.js.map -- - diff --git a/docs/api/source/entity-node.html b/docs/api/source/entity-node.html deleted file mode 100644 index b69a2c06..00000000 --- a/docs/api/source/entity-node.html +++ /dev/null @@ -1,56 +0,0 @@ - - - - -
"use strict"; -Object.defineProperty(exports, "__esModule", { value: true }); -var tslib_1 = require("tslib"); -var html_node_1 = require("./html-node"); -/** - * @class Autolinker.htmlParser.EntityNode - * @extends Autolinker.htmlParser.HtmlNode - * - * Represents a known HTML entity node that has been parsed by the {@link Autolinker.htmlParser.HtmlParser}. - * Ex: '&nbsp;', or '&#160;' (which will be retrievable from the {@link #getText} - * method. - * - * Note that this class will only be returned from the HtmlParser for the set of - * checked HTML entity nodes defined by the {@link Autolinker.htmlParser.HtmlParser#htmlCharacterEntitiesRegex}. - * - * See this class's superclass ({@link Autolinker.htmlParser.HtmlNode}) for more - * details. - */ -var EntityNode = (function (_super) { - tslib_1.__extends(EntityNode, _super); - function EntityNode() { - return _super !== null && _super.apply(this, arguments) || this; - } - /** - * Returns a string name for the type of node that this class represents. - * - * @return {String} - */ - EntityNode.prototype.getType = function () { - return 'entity'; - }; - return EntityNode; -}(html_node_1.HtmlNode)); -exports.EntityNode = EntityNode; - -//# sourceMappingURL=entity-node.js.map -- - diff --git a/docs/api/source/env.html b/docs/api/source/env.html deleted file mode 100644 index 440cff65..00000000 --- a/docs/api/source/env.html +++ /dev/null @@ -1,25 +0,0 @@ - - - - -
"use strict"; -Object.defineProperty(exports, "__esModule", { value: true }); -exports.env = {}; - -//# sourceMappingURL=env.js.map -- - diff --git a/docs/api/source/html-node.html b/docs/api/source/html-node.html deleted file mode 100644 index 3edff205..00000000 --- a/docs/api/source/html-node.html +++ /dev/null @@ -1,92 +0,0 @@ - - - - -
"use strict"; -Object.defineProperty(exports, "__esModule", { value: true }); -/** - * @abstract - * @class Autolinker.htmlParser.HtmlNode - * - * Represents an HTML node found in an input string. An HTML node is one of the - * following: - * - * 1. An {@link Autolinker.htmlParser.ElementNode ElementNode}, which represents - * HTML tags. - * 2. A {@link Autolinker.htmlParser.CommentNode CommentNode}, which represents - * HTML comments. - * 3. A {@link Autolinker.htmlParser.TextNode TextNode}, which represents text - * outside or within HTML tags. - * 4. A {@link Autolinker.htmlParser.EntityNode EntityNode}, which represents - * one of the known HTML entities that Autolinker looks for. This includes - * common ones such as &quot; and &nbsp; - */ -var HtmlNode = (function () { - /** - * @method constructor - * @param {Object} cfg The configuration properties for the Match instance, - * specified in an Object (map). - */ - function HtmlNode(cfg) { - /** - * @cfg {Number} offset (required) - * - * The offset of the HTML node in the original text that was parsed. - */ - this.offset = 0; // default value just to get the above doc comment in the ES5 output and documentation generator - /** - * @cfg {String} text (required) - * - * The text that was matched for the HtmlNode. - * - * - In the case of an {@link Autolinker.htmlParser.ElementNode ElementNode}, - * this will be the tag's text. - * - In the case of an {@link Autolinker.htmlParser.CommentNode CommentNode}, - * this will be the comment's text. - * - In the case of a {@link Autolinker.htmlParser.TextNode TextNode}, this - * will be the text itself. - * - In the case of a {@link Autolinker.htmlParser.EntityNode EntityNode}, - * this will be the text of the HTML entity. - */ - this.text = ''; // default value just to get the above doc comment in the ES5 output and documentation generator - this.offset = cfg.offset; - this.text = cfg.text; - } - /** - * Retrieves the {@link #offset} of the HtmlNode. This is the offset of the - * HTML node in the original string that was parsed. - * - * @return {Number} - */ - HtmlNode.prototype.getOffset = function () { - return this.offset; - }; - /** - * Retrieves the {@link #text} for the HtmlNode. - * - * @return {String} - */ - HtmlNode.prototype.getText = function () { - return this.text; - }; - return HtmlNode; -}()); -exports.HtmlNode = HtmlNode; - -//# sourceMappingURL=html-node.js.map -- - diff --git a/docs/api/source/html-parser-old.html b/docs/api/source/html-parser-old.html deleted file mode 100644 index 12785384..00000000 --- a/docs/api/source/html-parser-old.html +++ /dev/null @@ -1,268 +0,0 @@ - - - - -
"use strict"; -Object.defineProperty(exports, "__esModule", { value: true }); -var utils_1 = require("../utils"); -var comment_node_1 = require("./comment-node"); -var element_node_1 = require("./element-node"); -var entity_node_1 = require("./entity-node"); -var text_node_1 = require("./text-node"); -/** - * @private - * @property {RegExp} htmlRegex - * - * The regular expression used to pull out HTML tags from a string. Handles namespaced HTML tags and - * attribute names, as specified by http://www.w3.org/TR/html-markup/syntax.html. - * - * Capturing groups: - * - * 1. The "!DOCTYPE" tag name, if a tag is a <!DOCTYPE> tag. - * 2. If it is an end tag, this group will have the '/'. - * 3. If it is a comment tag, this group will hold the comment text (i.e. - * the text inside the `<!--` and `-->`. - * 4. The tag name for a tag without attributes (other than the <!DOCTYPE> tag) - * 5. The tag name for a tag with attributes (other than the <!DOCTYPE> tag) - */ -var htmlRegex = (function () { - var commentTagRegex = /!--([\s\S]+?)--/, tagNameRegex = /[0-9a-zA-Z][0-9a-zA-Z:]*/, attrNameRegex = /[^\s"'>\/=\x00-\x1F\x7F]+/, // the hex range accounts for excluding control chars, and the delete char - attrValueRegex = /(?:"[^"]*?"|'[^']*?'|[^'"=<>`\s]+)/, // double quoted, single quoted, or unquoted attribute values - optionalAttrValueRegex = '(?:\\s*?=\\s*?' + attrValueRegex.source + ')?'; // optional '=[value]' - var getNameEqualsValueRegex = function (group) { - return '(?=(' + attrNameRegex.source + '))\\' + group + optionalAttrValueRegex; - }; - return new RegExp([ - // for <!DOCTYPE> tag. Ex: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">) - '(?:', - '<(!DOCTYPE)', - // Zero or more attributes following the tag name - '(?:', - '\\s+', - // Either: - // A. attr="value", or - // B. "value" alone (To cover example doctype tag: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">) - // *** Capturing Group 2 - Pseudo-atomic group for attrNameRegex - '(?:', getNameEqualsValueRegex(2), '|', attrValueRegex.source + ')', - ')*', - '>', - ')', - '|', - // All other HTML tags (i.e. tags that are not <!DOCTYPE>) - '(?:', - '<(/)?', - // *** Capturing Group 3: The slash or an empty string. Slash ('/') for end tag, empty string for start or self-closing tag. - '(?:', - commentTagRegex.source, - '|', - // Handle tag without attributes. - // Doing this separately from a tag that has attributes - // to fix a regex time complexity issue seen with the - // example in https://github.com/gregjacobs/Autolinker.js/issues/172 - '(?:', - // *** Capturing Group 5 - The tag name for a tag without attributes - '(' + tagNameRegex.source + ')', - '\\s*/?', - ')', - '|', - // Handle tag with attributes - // Doing this separately from a tag with no attributes - // to fix a regex time complexity issue seen with the - // example in https://github.com/gregjacobs/Autolinker.js/issues/172 - '(?:', - // *** Capturing Group 6 - The tag name for a tag with attributes - '(' + tagNameRegex.source + ')', - '\\s+', - // Zero or more attributes following the tag name - '(?:', - '(?:\\s+|\\b)', - // *** Capturing Group 7 - Pseudo-atomic group for attrNameRegex - getNameEqualsValueRegex(7), - ')*', - '\\s*/?', - ')', - ')', - '>', - ')' - ].join(""), 'gi'); -})(); -/** - * @private - * @property {RegExp} htmlCharacterEntitiesRegex - * - * The regular expression that matches common HTML character entities. - * - * Ignoring & as it could be part of a query string -- handling it separately. - */ -var htmlCharacterEntitiesRegex = /( | |<|<|>|>|"|"|')/gi; -/** - * @class Autolinker.htmlParser.HtmlParser - * @extends Object - * - * An HTML parser implementation which simply walks an HTML string and returns an array of - * {@link Autolinker.htmlParser.HtmlNode HtmlNodes} that represent the basic HTML structure of the input string. - * - * Autolinker uses this to only link URLs/emails/mentions within text nodes, effectively ignoring / "walking - * around" HTML tags. - */ -var HtmlParser = (function () { - function HtmlParser() { - } - /** - * Parses an HTML string and returns a simple array of {@link Autolinker.htmlParser.HtmlNode HtmlNodes} - * to represent the HTML structure of the input string. - * - * @param {String} html The HTML to parse. - * @return {Autolinker.htmlParser.HtmlNode[]} - */ - HtmlParser.prototype.parse = function (html) { - var currentResult, lastIndex = 0, textAndEntityNodes, nodes = []; // will be the result of the method - while ((currentResult = htmlRegex.exec(html)) !== null) { - var tagText = currentResult[0], commentText = currentResult[4], // if we've matched a comment - tagName = currentResult[1] || currentResult[5] || currentResult[6], // The <!DOCTYPE> tag (ex: "!DOCTYPE"), or another tag (ex: "a" or "img") - isClosingTag = !!currentResult[3], offset = currentResult.index, inBetweenTagsText = html.substring(lastIndex, offset); - // Push TextNodes and EntityNodes for any text found between tags - if (inBetweenTagsText) { - textAndEntityNodes = this.parseTextAndEntityNodes(lastIndex, inBetweenTagsText); - nodes.push.apply(nodes, textAndEntityNodes); - } - // Push the CommentNode or ElementNode - if (commentText) { - nodes.push(this.createCommentNode(offset, tagText, commentText)); - } - else { - nodes.push(this.createElementNode(offset, tagText, tagName, isClosingTag)); - } - lastIndex = offset + tagText.length; - } - // Process any remaining text after the last HTML element. Will process all of the text if there were no HTML elements. - if (lastIndex < html.length) { - var text = html.substring(lastIndex); - // Push TextNodes and EntityNodes for any text found between tags - if (text) { - textAndEntityNodes = this.parseTextAndEntityNodes(lastIndex, text); - // Note: the following 3 lines were previously: - // nodes.push.apply( nodes, textAndEntityNodes ); - // but this was causing a "Maximum Call Stack Size Exceeded" - // error on inputs with a large number of html entities. - textAndEntityNodes.forEach(function (node) { return nodes.push(node); }); - } - } - return nodes; - }; - /** - * Parses text and HTML entity nodes from a given string. The input string - * should not have any HTML tags (elements) within it. - * - * @private - * @param {Number} offset The offset of the text node match within the - * original HTML string. - * @param {String} text The string of text to parse. This is from an HTML - * text node. - * @return {Autolinker.htmlParser.HtmlNode[]} An array of HtmlNodes to - * represent the {@link Autolinker.htmlParser.TextNode TextNodes} and - * {@link Autolinker.htmlParser.EntityNode EntityNodes} found. - */ - HtmlParser.prototype.parseTextAndEntityNodes = function (offset, text) { - var nodes = [], textAndEntityTokens = utils_1.splitAndCapture(text, htmlCharacterEntitiesRegex); // split at HTML entities, but include the HTML entities in the results array - // Every even numbered token is a TextNode, and every odd numbered token is an EntityNode - // For example: an input `text` of "Test "this" today" would turn into the - // `textAndEntityTokens`: [ 'Test ', '"', 'this', '"', ' today' ] - for (var i = 0, len = textAndEntityTokens.length; i < len; i += 2) { - var textToken = textAndEntityTokens[i], entityToken = textAndEntityTokens[i + 1]; - if (textToken) { - nodes.push(this.createTextNode(offset, textToken)); - offset += textToken.length; - } - if (entityToken) { - nodes.push(this.createEntityNode(offset, entityToken)); - offset += entityToken.length; - } - } - return nodes; - }; - /** - * Factory method to create an {@link Autolinker.htmlParser.CommentNode CommentNode}. - * - * @private - * @param {Number} offset The offset of the match within the original HTML - * string. - * @param {String} tagText The full text of the tag (comment) that was - * matched, including its <!-- and -->. - * @param {String} commentText The full text of the comment that was matched. - */ - HtmlParser.prototype.createCommentNode = function (offset, tagText, commentText) { - return new comment_node_1.CommentNode({ - offset: offset, - text: tagText, - comment: commentText.trim() - }); - }; - /** - * Factory method to create an {@link Autolinker.htmlParser.ElementNode ElementNode}. - * - * @private - * @param {Number} offset The offset of the match within the original HTML - * string. - * @param {String} tagText The full text of the tag (element) that was - * matched, including its attributes. - * @param {String} tagName The name of the tag. Ex: An <img> tag would - * be passed to this method as "img". - * @param {Boolean} isClosingTag `true` if it's a closing tag, false - * otherwise. - * @return {Autolinker.htmlParser.ElementNode} - */ - HtmlParser.prototype.createElementNode = function (offset, tagText, tagName, isClosingTag) { - return new element_node_1.ElementNode({ - offset: offset, - text: tagText, - tagName: tagName.toLowerCase(), - closing: isClosingTag - }); - }; - /** - * Factory method to create a {@link Autolinker.htmlParser.EntityNode EntityNode}. - * - * @private - * @param {Number} offset The offset of the match within the original HTML - * string. - * @param {String} text The text that was matched for the HTML entity (such - * as '&nbsp;'). - * @return {Autolinker.htmlParser.EntityNode} - */ - HtmlParser.prototype.createEntityNode = function (offset, text) { - return new entity_node_1.EntityNode({ offset: offset, text: text }); - }; - /** - * Factory method to create a {@link Autolinker.htmlParser.TextNode TextNode}. - * - * @private - * @param {Number} offset The offset of the match within the original HTML - * string. - * @param {String} text The text that was matched. - * @return {Autolinker.htmlParser.TextNode} - */ - HtmlParser.prototype.createTextNode = function (offset, text) { - return new text_node_1.TextNode({ offset: offset, text: text }); - }; - return HtmlParser; -}()); -exports.HtmlParser = HtmlParser; - -//# sourceMappingURL=html-parser-old.js.map -- - diff --git a/docs/api/source/html-parser.html b/docs/api/source/html-parser.html deleted file mode 100644 index 0250ac93..00000000 --- a/docs/api/source/html-parser.html +++ /dev/null @@ -1,885 +0,0 @@ - - - - -
"use strict"; -/*! - * Modified version of htmlparser2 which has been stripped down to only provide - * the functionality needed by Autolinker in order to make the final bundle as - * small as possible. Original: - * - * Original copyright: - * - * Copyright 2010, 2011, Chris Winberry <chris@winberry.net>. All rights reserved. - * Permission is hereby granted, free of charge, to any person obtaining a copy - * of this software and associated documentation files (the "Software"), to - * deal in the Software without restriction, including without limitation the - * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or - * sell copies of the Software, and to permit persons to whom the Software is - * furnished to do so, subject to the following conditions: -*/ -Object.defineProperty(exports, "__esModule", { value: true }); -var i = 0; -var TEXT = i++; -var BEFORE_TAG_NAME = i++; //after < -var IN_TAG_NAME = i++; -var IN_SELF_CLOSING_TAG = i++; -var BEFORE_CLOSING_TAG_NAME = i++; -var IN_CLOSING_TAG_NAME = i++; -var AFTER_CLOSING_TAG_NAME = i++; -//attributes -var BEFORE_ATTRIBUTE_NAME = i++; -var IN_ATTRIBUTE_NAME = i++; -var AFTER_ATTRIBUTE_NAME = i++; -var BEFORE_ATTRIBUTE_VALUE = i++; -var IN_ATTRIBUTE_VALUE_DQ = i++; // " -var IN_ATTRIBUTE_VALUE_SQ = i++; // ' -var IN_ATTRIBUTE_VALUE_NQ = i++; -//declarations -var BEFORE_DECLARATION = i++; // ! -var IN_DECLARATION = i++; -//processing instructions -var IN_PROCESSING_INSTRUCTION = i++; // ? -//comments -var BEFORE_COMMENT = i++; -var IN_COMMENT = i++; -var AFTER_COMMENT_1 = i++; -var AFTER_COMMENT_2 = i++; -//cdata -var BEFORE_CDATA_1 = i++; // [ -var BEFORE_CDATA_2 = i++; // C -var BEFORE_CDATA_3 = i++; // D -var BEFORE_CDATA_4 = i++; // A -var BEFORE_CDATA_5 = i++; // T -var BEFORE_CDATA_6 = i++; // A -var IN_CDATA = i++; // [ -var AFTER_CDATA_1 = i++; // ] -var AFTER_CDATA_2 = i++; // ] -//special tags -var BEFORE_SPECIAL = i++; //S -var BEFORE_SPECIAL_END = i++; //S -var BEFORE_SCRIPT_1 = i++; //C -var BEFORE_SCRIPT_2 = i++; //R -var BEFORE_SCRIPT_3 = i++; //I -var BEFORE_SCRIPT_4 = i++; //P -var BEFORE_SCRIPT_5 = i++; //T -var AFTER_SCRIPT_1 = i++; //C -var AFTER_SCRIPT_2 = i++; //R -var AFTER_SCRIPT_3 = i++; //I -var AFTER_SCRIPT_4 = i++; //P -var AFTER_SCRIPT_5 = i++; //T -var BEFORE_STYLE_1 = i++; //T -var BEFORE_STYLE_2 = i++; //Y -var BEFORE_STYLE_3 = i++; //L -var BEFORE_STYLE_4 = i++; //E -var AFTER_STYLE_1 = i++; //T -var AFTER_STYLE_2 = i++; //Y -var AFTER_STYLE_3 = i++; //L -var AFTER_STYLE_4 = i++; //E -var BEFORE_ENTITY = i++; //& -var BEFORE_NUMERIC_ENTITY = i++; //# -var IN_NAMED_ENTITY = i++; -var IN_NUMERIC_ENTITY = i++; -var IN_HEX_ENTITY = i++; //X -var j = 0; -var SPECIAL_NONE = j++; -var SPECIAL_SCRIPT = j++; -var SPECIAL_STYLE = j++; -function parseHtml(html, ontext) { - var _state = TEXT, _buffer = html, _sectionStart = 0, _index = 0, _baseState = TEXT, _special = SPECIAL_NONE, _decodeEntities = true, _xmlMode = false; - // TEMPORARY - var _cbs = {}; - var entityMap = {}; - var xmlMap = {}; - var legacyMap = {}; - var decodeCodePoint = function (arg) { return ''; }; - _parse(); - function whitespace(c) { - return c === " " || c === "\n" || c === "\t" || c === "\f" || c === "\r"; - } - function ifElseState(upper, SUCCESS, FAILURE) { - var lower = upper.toLowerCase(); - if (upper === lower) { - return function (c) { - if (c === lower) { - _state = SUCCESS; - } - else { - _state = FAILURE; - _index--; - } - }; - } - else { - return function (c) { - if (c === lower || c === upper) { - _state = SUCCESS; - } - else { - _state = FAILURE; - _index--; - } - }; - } - } - function consumeSpecialNameChar(upper, NEXT_STATE) { - var lower = upper.toLowerCase(); - return function (c) { - if (c === lower || c === upper) { - _state = NEXT_STATE; - } - else { - _state = IN_TAG_NAME; - _index--; //consume the token again - } - }; - } - function _stateText(c) { - if (c === "<") { - if (_index > _sectionStart) { - ontext(_getSection()); - } - _state = BEFORE_TAG_NAME; - _sectionStart = _index; - } - else if (_decodeEntities && - _special === SPECIAL_NONE && - c === "&") { - if (_index > _sectionStart) { - ontext(_getSection()); - } - _baseState = TEXT; - _state = BEFORE_ENTITY; - _sectionStart = _index; - } - } - ; - function _stateBeforeTagName(c) { - if (c === "/") { - _state = BEFORE_CLOSING_TAG_NAME; - } - else if (c === "<") { - ontext(_getSection()); - _sectionStart = _index; - } - else if (c === ">" || _special !== SPECIAL_NONE || whitespace(c)) { - _state = TEXT; - } - else if (c === "!") { - _state = BEFORE_DECLARATION; - _sectionStart = _index + 1; - } - else if (c === "?") { - _state = IN_PROCESSING_INSTRUCTION; - _sectionStart = _index + 1; - } - else { - _state = - !_xmlMode && (c === "s" || c === "S") - ? BEFORE_SPECIAL - : IN_TAG_NAME; - _sectionStart = _index; - } - } - ; - function _stateInTagName(c) { - if (c === "/" || c === ">" || whitespace(c)) { - _emitToken("onopentagname"); - _state = BEFORE_ATTRIBUTE_NAME; - _index--; - } - } - ; - function _stateBeforeCloseingTagName(c) { - if (whitespace(c)) { } - else if (c === ">") { - _state = TEXT; - } - else if (_special !== SPECIAL_NONE) { - if (c === "s" || c === "S") { - _state = BEFORE_SPECIAL_END; - } - else { - _state = TEXT; - _index--; - } - } - else { - _state = IN_CLOSING_TAG_NAME; - _sectionStart = _index; - } - } - ; - function _stateInCloseingTagName(c) { - if (c === ">" || whitespace(c)) { - _emitToken("onclosetag"); - _state = AFTER_CLOSING_TAG_NAME; - _index--; - } - } - ; - function _stateAfterCloseingTagName(c) { - //skip everything until ">" - if (c === ">") { - _state = TEXT; - _sectionStart = _index + 1; - } - } - ; - function _stateBeforeAttributeName(c) { - if (c === ">") { - _cbs.onopentagend(); - _state = TEXT; - _sectionStart = _index + 1; - } - else if (c === "/") { - _state = IN_SELF_CLOSING_TAG; - } - else if (!whitespace(c)) { - _state = IN_ATTRIBUTE_NAME; - _sectionStart = _index; - } - } - ; - function _stateInSelfClosingTag(c) { - if (c === ">") { - _cbs.onselfclosingtag(); - _state = TEXT; - _sectionStart = _index + 1; - } - else if (!whitespace(c)) { - _state = BEFORE_ATTRIBUTE_NAME; - _index--; - } - } - ; - function _stateInAttributeName(c) { - if (c === "=" || c === "/" || c === ">" || whitespace(c)) { - _sectionStart = -1; - _state = AFTER_ATTRIBUTE_NAME; - _index--; - } - } - ; - function _stateAfterAttributeName(c) { - if (c === "=") { - _state = BEFORE_ATTRIBUTE_VALUE; - } - else if (c === "/" || c === ">") { - _state = BEFORE_ATTRIBUTE_NAME; - _index--; - } - else if (!whitespace(c)) { - _state = IN_ATTRIBUTE_NAME; - _sectionStart = _index; - } - } - ; - function _stateBeforeAttributeValue(c) { - if (c === '"') { - _state = IN_ATTRIBUTE_VALUE_DQ; - _sectionStart = _index + 1; - } - else if (c === "'") { - _state = IN_ATTRIBUTE_VALUE_SQ; - _sectionStart = _index + 1; - } - else if (!whitespace(c)) { - _state = IN_ATTRIBUTE_VALUE_NQ; - _sectionStart = _index; - _index--; //reconsume token - } - } - ; - function _stateInAttributeValueDoubleQuotes(c) { - if (c === '"') { - _emitToken("onattribdata"); - _state = BEFORE_ATTRIBUTE_NAME; - } - else if (_decodeEntities && c === "&") { - _emitToken("onattribdata"); - _baseState = _state; - _state = BEFORE_ENTITY; - _sectionStart = _index; - } - } - ; - function _stateInAttributeValueSingleQuotes(c) { - if (c === "'") { - _emitToken("onattribdata"); - _state = BEFORE_ATTRIBUTE_NAME; - } - else if (_decodeEntities && c === "&") { - _emitToken("onattribdata"); - _baseState = _state; - _state = BEFORE_ENTITY; - _sectionStart = _index; - } - } - ; - function _stateInAttributeValueNoQuotes(c) { - if (whitespace(c) || c === ">") { - _emitToken("onattribdata"); - _state = BEFORE_ATTRIBUTE_NAME; - _index--; - } - else if (_decodeEntities && c === "&") { - _emitToken("onattribdata"); - _baseState = _state; - _state = BEFORE_ENTITY; - _sectionStart = _index; - } - } - ; - function _stateBeforeDeclaration(c) { - _state = - c === "[" - ? BEFORE_CDATA_1 - : c === "-" - ? BEFORE_COMMENT - : IN_DECLARATION; - } - ; - function _stateInDeclaration(c) { - if (c === ">") { - _state = TEXT; - _sectionStart = _index + 1; - } - } - ; - function _stateInProcessingInstruction(c) { - if (c === ">") { - _state = TEXT; - _sectionStart = _index + 1; - } - } - ; - function _stateBeforeComment(c) { - if (c === "-") { - _state = IN_COMMENT; - _sectionStart = _index + 1; - } - else { - _state = IN_DECLARATION; - } - } - ; - function _stateInComment(c) { - if (c === "-") - _state = AFTER_COMMENT_1; - } - ; - function _stateAfterComment1(c) { - if (c === "-") { - _state = AFTER_COMMENT_2; - } - else { - _state = IN_COMMENT; - } - } - ; - function _stateAfterComment2(c) { - if (c === ">") { - //remove 2 trailing chars - _cbs.oncomment(_buffer.substring(_sectionStart, _index - 2)); - _state = TEXT; - _sectionStart = _index + 1; - } - else if (c !== "-") { - _state = IN_COMMENT; - } - // else: stay in AFTER_COMMENT_2 (`--->`) - } - ; - var _stateBeforeCdata1 = ifElseState("C", BEFORE_CDATA_2, IN_DECLARATION); - var _stateBeforeCdata2 = ifElseState("D", BEFORE_CDATA_3, IN_DECLARATION); - var _stateBeforeCdata3 = ifElseState("A", BEFORE_CDATA_4, IN_DECLARATION); - var _stateBeforeCdata4 = ifElseState("T", BEFORE_CDATA_5, IN_DECLARATION); - var _stateBeforeCdata5 = ifElseState("A", BEFORE_CDATA_6, IN_DECLARATION); - function _stateBeforeCdata6(c) { - if (c === "[") { - _state = IN_CDATA; - _sectionStart = _index + 1; - } - else { - _state = IN_DECLARATION; - _index--; - } - } - ; - function _stateInCdata(c) { - if (c === "]") - _state = AFTER_CDATA_1; - } - ; - function _stateAfterCdata1(c) { - if (c === "]") - _state = AFTER_CDATA_2; - else - _state = IN_CDATA; - } - ; - function _stateAfterCdata2(c) { - if (c === ">") { - _state = TEXT; - _sectionStart = _index + 1; - } - else if (c !== "]") { - _state = IN_CDATA; - } - //else: stay in AFTER_CDATA_2 (`]]]>`) - } - ; - function _stateBeforeSpecial(c) { - if (c === "c" || c === "C") { - _state = BEFORE_SCRIPT_1; - } - else if (c === "t" || c === "T") { - _state = BEFORE_STYLE_1; - } - else { - _state = IN_TAG_NAME; - _index--; //consume the token again - } - } - ; - function _stateBeforeSpecialEnd(c) { - if (_special === SPECIAL_SCRIPT && (c === "c" || c === "C")) { - _state = AFTER_SCRIPT_1; - } - else if (_special === SPECIAL_STYLE && (c === "t" || c === "T")) { - _state = AFTER_STYLE_1; - } - else - _state = TEXT; - } - ; - var _stateBeforeScript1 = consumeSpecialNameChar("R", BEFORE_SCRIPT_2); - var _stateBeforeScript2 = consumeSpecialNameChar("I", BEFORE_SCRIPT_3); - var _stateBeforeScript3 = consumeSpecialNameChar("P", BEFORE_SCRIPT_4); - var _stateBeforeScript4 = consumeSpecialNameChar("T", BEFORE_SCRIPT_5); - function _stateBeforeScript5(c) { - if (c === "/" || c === ">" || whitespace(c)) { - _special = SPECIAL_SCRIPT; - } - _state = IN_TAG_NAME; - _index--; //consume the token again - } - ; - var _stateAfterScript1 = ifElseState("R", AFTER_SCRIPT_2, TEXT); - var _stateAfterScript2 = ifElseState("I", AFTER_SCRIPT_3, TEXT); - var _stateAfterScript3 = ifElseState("P", AFTER_SCRIPT_4, TEXT); - var _stateAfterScript4 = ifElseState("T", AFTER_SCRIPT_5, TEXT); - function _stateAfterScript5(c) { - if (c === ">" || whitespace(c)) { - _special = SPECIAL_NONE; - _state = IN_CLOSING_TAG_NAME; - _sectionStart = _index - 6; - _index--; //reconsume the token - } - else - _state = TEXT; - } - ; - var _stateBeforeStyle1 = consumeSpecialNameChar("Y", BEFORE_STYLE_2); - var _stateBeforeStyle2 = consumeSpecialNameChar("L", BEFORE_STYLE_3); - var _stateBeforeStyle3 = consumeSpecialNameChar("E", BEFORE_STYLE_4); - function _stateBeforeStyle4(c) { - if (c === "/" || c === ">" || whitespace(c)) { - _special = SPECIAL_STYLE; - } - _state = IN_TAG_NAME; - _index--; //consume the token again - } - ; - var _stateAfterStyle1 = ifElseState("Y", AFTER_STYLE_2, TEXT); - var _stateAfterStyle2 = ifElseState("L", AFTER_STYLE_3, TEXT); - var _stateAfterStyle3 = ifElseState("E", AFTER_STYLE_4, TEXT); - function _stateAfterStyle4(c) { - if (c === ">" || whitespace(c)) { - _special = SPECIAL_NONE; - _state = IN_CLOSING_TAG_NAME; - _sectionStart = _index - 5; - _index--; //reconsume the token - } - else - _state = TEXT; - } - ; - var _stateBeforeEntity = ifElseState("#", BEFORE_NUMERIC_ENTITY, IN_NAMED_ENTITY); - var _stateBeforeNumericEntity = ifElseState("X", IN_HEX_ENTITY, IN_NUMERIC_ENTITY); - //for entities terminated with a semicolon - function _parseNamedEntityStrict() { - // TODO: For this section, use the regex /( | |<|<|>|>|"|"|')/ ??? - //offset = 1 - if (_sectionStart + 1 < _index) { - var entity = _buffer.substring(_sectionStart + 1, _index), map = _xmlMode ? xmlMap : entityMap; - if (map.hasOwnProperty(entity)) { - _emitPartial(map[entity]); - _sectionStart = _index + 1; - } - } - } - ; - //parses legacy entities (without trailing semicolon) - function _parseLegacyEntity() { - var start = _sectionStart + 1, limit = _index - start; - if (limit > 6) - limit = 6; //the max length of legacy entities is 6 - while (limit >= 2) { - //the min length of legacy entities is 2 - var entity = _buffer.substr(start, limit); - if (legacyMap.hasOwnProperty(entity)) { - _emitPartial(legacyMap[entity]); - _sectionStart += limit + 1; - return; - } - else { - limit--; - } - } - } - ; - function _stateInNamedEntity(c) { - if (c === ";") { - _parseNamedEntityStrict(); - if (_sectionStart + 1 < _index && !_xmlMode) { - _parseLegacyEntity(); - } - _state = _baseState; - } - else if ((c < "a" || c > "z") && - (c < "A" || c > "Z") && - (c < "0" || c > "9")) { - if (_xmlMode) { } - else if (_sectionStart + 1 === _index) { } - else if (_baseState !== TEXT) { - if (c !== "=") { - _parseNamedEntityStrict(); - } - } - else { - _parseLegacyEntity(); - } - _state = _baseState; - _index--; - } - } - ; - function _decodeNumericEntity(offset, base) { - var sectionStart = _sectionStart + offset; - if (sectionStart !== _index) { - //parse entity - var entity = _buffer.substring(sectionStart, _index); - var parsed = parseInt(entity, base); - _emitPartial(decodeCodePoint(parsed)); - _sectionStart = _index; - } - else { - _sectionStart--; - } - _state = _baseState; - } - ; - function _stateInNumericEntity(c) { - if (c === ";") { - _decodeNumericEntity(2, 10); - _sectionStart++; - } - else if (c < "0" || c > "9") { - if (!_xmlMode) { - _decodeNumericEntity(2, 10); - } - else { - _state = _baseState; - } - _index--; - } - } - ; - function _stateInHexEntity(c) { - if (c === ";") { - _decodeNumericEntity(3, 16); - _sectionStart++; - } - else if ((c < "a" || c > "f") && - (c < "A" || c > "F") && - (c < "0" || c > "9")) { - if (!_xmlMode) { - _decodeNumericEntity(3, 16); - } - else { - _state = _baseState; - } - _index--; - } - } - ; - function _cleanup() { - if (_sectionStart < 0) { - _buffer = ""; - _index = 0; - } - else { - if (_state === TEXT) { - if (_sectionStart !== _index) { - ontext(_buffer.substr(_sectionStart)); - } - _buffer = ""; - _index = 0; - } - else if (_sectionStart === _index) { - //the section just started - _buffer = ""; - _index = 0; - } - else { - //remove everything unnecessary - _buffer = _buffer.substr(_sectionStart); - _index -= _sectionStart; - } - _sectionStart = 0; - } - } - ; - function _parse() { - while (_index < _buffer.length) { - var c = _buffer.charAt(_index); - if (_state === TEXT) { - _stateText(c); - } - else if (_state === BEFORE_TAG_NAME) { - _stateBeforeTagName(c); - } - else if (_state === IN_TAG_NAME) { - _stateInTagName(c); - } - else if (_state === BEFORE_CLOSING_TAG_NAME) { - _stateBeforeCloseingTagName(c); - } - else if (_state === IN_CLOSING_TAG_NAME) { - _stateInCloseingTagName(c); - } - else if (_state === AFTER_CLOSING_TAG_NAME) { - _stateAfterCloseingTagName(c); - } - else if (_state === IN_SELF_CLOSING_TAG) { - _stateInSelfClosingTag(c); - } - else if (_state === BEFORE_ATTRIBUTE_NAME) { - /* - * attributes - */ - _stateBeforeAttributeName(c); - } - else if (_state === IN_ATTRIBUTE_NAME) { - _stateInAttributeName(c); - } - else if (_state === AFTER_ATTRIBUTE_NAME) { - _stateAfterAttributeName(c); - } - else if (_state === BEFORE_ATTRIBUTE_VALUE) { - _stateBeforeAttributeValue(c); - } - else if (_state === IN_ATTRIBUTE_VALUE_DQ) { - _stateInAttributeValueDoubleQuotes(c); - } - else if (_state === IN_ATTRIBUTE_VALUE_SQ) { - _stateInAttributeValueSingleQuotes(c); - } - else if (_state === IN_ATTRIBUTE_VALUE_NQ) { - _stateInAttributeValueNoQuotes(c); - } - else if (_state === BEFORE_DECLARATION) { - /* - * declarations - */ - _stateBeforeDeclaration(c); - } - else if (_state === IN_DECLARATION) { - _stateInDeclaration(c); - } - else if (_state === IN_PROCESSING_INSTRUCTION) { - /* - * processing instructions - */ - _stateInProcessingInstruction(c); - } - else if (_state === BEFORE_COMMENT) { - /* - * comments - */ - _stateBeforeComment(c); - } - else if (_state === IN_COMMENT) { - _stateInComment(c); - } - else if (_state === AFTER_COMMENT_1) { - _stateAfterComment1(c); - } - else if (_state === AFTER_COMMENT_2) { - _stateAfterComment2(c); - } - else if (_state === BEFORE_CDATA_1) { - /* - * cdata - */ - _stateBeforeCdata1(c); - } - else if (_state === BEFORE_CDATA_2) { - _stateBeforeCdata2(c); - } - else if (_state === BEFORE_CDATA_3) { - _stateBeforeCdata3(c); - } - else if (_state === BEFORE_CDATA_4) { - _stateBeforeCdata4(c); - } - else if (_state === BEFORE_CDATA_5) { - _stateBeforeCdata5(c); - } - else if (_state === BEFORE_CDATA_6) { - _stateBeforeCdata6(c); - } - else if (_state === IN_CDATA) { - _stateInCdata(c); - } - else if (_state === AFTER_CDATA_1) { - _stateAfterCdata1(c); - } - else if (_state === AFTER_CDATA_2) { - _stateAfterCdata2(c); - } - else if (_state === BEFORE_SPECIAL) { - /* - * special tags - */ - _stateBeforeSpecial(c); - } - else if (_state === BEFORE_SPECIAL_END) { - _stateBeforeSpecialEnd(c); - } - else if (_state === BEFORE_SCRIPT_1) { - /* - * script - */ - _stateBeforeScript1(c); - } - else if (_state === BEFORE_SCRIPT_2) { - _stateBeforeScript2(c); - } - else if (_state === BEFORE_SCRIPT_3) { - _stateBeforeScript3(c); - } - else if (_state === BEFORE_SCRIPT_4) { - _stateBeforeScript4(c); - } - else if (_state === BEFORE_SCRIPT_5) { - _stateBeforeScript5(c); - } - else if (_state === AFTER_SCRIPT_1) { - _stateAfterScript1(c); - } - else if (_state === AFTER_SCRIPT_2) { - _stateAfterScript2(c); - } - else if (_state === AFTER_SCRIPT_3) { - _stateAfterScript3(c); - } - else if (_state === AFTER_SCRIPT_4) { - _stateAfterScript4(c); - } - else if (_state === AFTER_SCRIPT_5) { - _stateAfterScript5(c); - } - else if (_state === BEFORE_STYLE_1) { - /* - * style - */ - _stateBeforeStyle1(c); - } - else if (_state === BEFORE_STYLE_2) { - _stateBeforeStyle2(c); - } - else if (_state === BEFORE_STYLE_3) { - _stateBeforeStyle3(c); - } - else if (_state === BEFORE_STYLE_4) { - _stateBeforeStyle4(c); - } - else if (_state === AFTER_STYLE_1) { - _stateAfterStyle1(c); - } - else if (_state === AFTER_STYLE_2) { - _stateAfterStyle2(c); - } - else if (_state === AFTER_STYLE_3) { - _stateAfterStyle3(c); - } - else if (_state === AFTER_STYLE_4) { - _stateAfterStyle4(c); - } - else if (_state === BEFORE_ENTITY) { - /* - * entities - */ - _stateBeforeEntity(c); - } - else if (_state === BEFORE_NUMERIC_ENTITY) { - _stateBeforeNumericEntity(c); - } - else if (_state === IN_NAMED_ENTITY) { - _stateInNamedEntity(c); - } - else if (_state === IN_NUMERIC_ENTITY) { - _stateInNumericEntity(c); - } - else if (_state === IN_HEX_ENTITY) { - _stateInHexEntity(c); - } - else { - _cbs.onerror(Error("unknown _state"), _state); - } - _index++; - } - _cleanup(); - } - ; - function _getSection() { - return _buffer.substring(_sectionStart, _index); - } - ; - function _emitToken(name) { - _cbs[name](_getSection()); - _sectionStart = -1; - } - ; - function _emitPartial(value) { - if (_baseState !== TEXT) { - _cbs.onattribdata(value); //TODO implement the new event - } - else { - _cbs.ontext(value); - } - } - ; -} -exports.parseHtml = parseHtml; - -//# sourceMappingURL=html-parser.js.map -- - diff --git a/docs/api/source/parser-old.html b/docs/api/source/parser-old.html deleted file mode 100644 index 99941057..00000000 --- a/docs/api/source/parser-old.html +++ /dev/null @@ -1,303 +0,0 @@ - - - - -
"use strict"; -// /* -// * Modified version of htmlparser2 which has been stripped down to only provide -// * the functionality needed by Autolinker in order to make the final bundle as -// * small as possible. -// * -// * See license in tokenizer.ts -// */ -// import { tokenizeHtml } from './tokenizer'; -// export function parseHtml( html: string, { -// } ) { -// /* -// Callbacks: -// oncdataend, -// oncdatastart, -// onclosetag, -// oncomment, -// oncommentend, -// onerror, -// onopentag, -// onprocessinginstruction, -// onreset, -// ontext -// */ -// var formTags = { -// input: true, -// option: true, -// optgroup: true, -// select: true, -// button: true, -// datalist: true, -// textarea: true -// }; -// var openImpliesClose = { -// tr: { tr: true, th: true, td: true }, -// th: { th: true }, -// td: { thead: true, th: true, td: true }, -// body: { head: true, link: true, script: true }, -// li: { li: true }, -// p: { p: true }, -// h1: { p: true }, -// h2: { p: true }, -// h3: { p: true }, -// h4: { p: true }, -// h5: { p: true }, -// h6: { p: true }, -// select: formTags, -// input: formTags, -// output: formTags, -// button: formTags, -// datalist: formTags, -// textarea: formTags, -// option: { option: true }, -// optgroup: { optgroup: true } -// }; -// var voidElements = { -// __proto__: null, -// area: true, -// base: true, -// basefont: true, -// br: true, -// col: true, -// command: true, -// embed: true, -// frame: true, -// hr: true, -// img: true, -// input: true, -// isindex: true, -// keygen: true, -// link: true, -// meta: true, -// param: true, -// source: true, -// track: true, -// wbr: true -// }; -// var foreignContextElements = { -// __proto__: null, -// math: true, -// svg: true -// }; -// var htmlIntegrationElements = { -// __proto__: null, -// mi: true, -// mo: true, -// mn: true, -// ms: true, -// mtext: true, -// "annotation-xml": true, -// foreignObject: true, -// desc: true, -// title: true -// }; -// var re_nameEnd = /\s|\//; -// let _options = options || {}; -// let _cbs = cbs || {}; -// let _tagname = ""; -// let _attribname = ""; -// let _attribvalue = ""; -// let _attribs = null; -// let _stack = []; -// let _foreignContext = []; -// let _lowerCaseTagNames = -// "lowerCaseTags" in _options -// ? !!_options.lowerCaseTags -// : !_options.xmlMode; -// let _lowerCaseAttributeNames = -// "lowerCaseAttributeNames" in _options -// ? !!_options.lowerCaseAttributeNames -// : !_options.xmlMode; -// tokenizeHtml( html, { -// ontext, -// onopentagname, -// onopentagend, -// onclosetag, -// onselfclosingtag, -// oncomment, -// onerror -// } ); -// //Tokenizer event handlers -// function ontext(data) { -// if (_cbs.ontext) _cbs.ontext(data); -// }; -// function onopentagname(name) { -// if (_lowerCaseTagNames) { -// name = name.toLowerCase(); -// } -// _tagname = name; -// if (!_options.xmlMode && name in openImpliesClose) { -// for ( -// var el; -// (el = _stack[_stack.length - 1]) in -// openImpliesClose[name]; -// onclosetag(el) -// ); -// } -// if (_options.xmlMode || !(name in voidElements)) { -// _stack.push(name); -// if (name in foreignContextElements) _foreignContext.push(true); -// else if (name in htmlIntegrationElements) -// _foreignContext.push(false); -// } -// if (_cbs.onopentagname) _cbs.onopentagname(name); -// if (_cbs.onopentag) _attribs = {}; -// }; -// function onopentagend() { -// if (_attribs) { -// if (_cbs.onopentag) -// _cbs.onopentag(_tagname, _attribs); -// _attribs = null; -// } -// if ( -// !_options.xmlMode && -// _cbs.onclosetag && -// _tagname in voidElements -// ) { -// _cbs.onclosetag(_tagname); -// } -// _tagname = ""; -// }; -// function onclosetag(name) { -// _updatePosition(1); -// if (_lowerCaseTagNames) { -// name = name.toLowerCase(); -// } -// if ( -// _stack.length && -// (!(name in voidElements) || _options.xmlMode) -// ) { -// var pos = _stack.lastIndexOf(name); -// if (pos !== -1) { -// if (_cbs.onclosetag) { -// pos = _stack.length - pos; -// while (pos--) _cbs.onclosetag(_stack.pop()); -// } else _stack.length = pos; -// } else if (name === "p" && !_options.xmlMode) { -// onopentagname(name); -// _closeCurrentTag(); -// } -// } else if (!_options.xmlMode && (name === "br" || name === "p")) { -// onopentagname(name); -// _closeCurrentTag(); -// } -// }; -// function onselfclosingtag() { -// if ( -// _options.xmlMode || -// _options.recognizeSelfClosing || -// _foreignContext[_foreignContext.length - 1] -// ) { -// _closeCurrentTag(); -// } else { -// onopentagend(); -// } -// }; -// function _closeCurrentTag() { -// var name = _tagname; -// onopentagend(); -// //self-closing tags will be on the top of the stack -// //(cheaper check than in onclosetag) -// if (_stack[_stack.length - 1] === name) { -// if (_cbs.onclosetag) { -// _cbs.onclosetag(name); -// } -// _stack.pop(); -// if (name in foreignContextElements || name in htmlIntegrationElements) { -// _foreignContext.pop(); -// } -// } -// }; -// function onattribname(name) { -// if (_lowerCaseAttributeNames) { -// name = name.toLowerCase(); -// } -// _attribname = name; -// }; -// function onattribdata(value) { -// _attribvalue += value; -// }; -// function onattribend() { -// if (_cbs.onattribute) -// _cbs.onattribute(_attribname, _attribvalue); -// if ( -// _attribs && -// !Object.prototype.hasOwnProperty.call(_attribs, _attribname) -// ) { -// _attribs[_attribname] = _attribvalue; -// } -// _attribname = ""; -// _attribvalue = ""; -// }; -// function _getInstructionName(value) { -// var idx = value.search(re_nameEnd), -// name = idx < 0 ? value : value.substr(0, idx); -// if (_lowerCaseTagNames) { -// name = name.toLowerCase(); -// } -// return name; -// }; -// function ondeclaration(value) { -// if (_cbs.onprocessinginstruction) { -// var name = _getInstructionName(value); -// _cbs.onprocessinginstruction("!" + name, "!" + value); -// } -// }; -// function onprocessinginstruction(value) { -// if (_cbs.onprocessinginstruction) { -// var name = _getInstructionName(value); -// _cbs.onprocessinginstruction("?" + name, "?" + value); -// } -// }; -// function oncomment(value) { -// _updatePosition(4); -// if (_cbs.oncomment) _cbs.oncomment(value); -// if (_cbs.oncommentend) _cbs.oncommentend(); -// }; -// function oncdata(value) { -// _updatePosition(1); -// if (_options.xmlMode || _options.recognizeCDATA) { -// if (_cbs.oncdatastart) _cbs.oncdatastart(); -// if (_cbs.ontext) _cbs.ontext(value); -// if (_cbs.oncdataend) _cbs.oncdataend(); -// } else { -// oncomment("[CDATA[" + value + "]]"); -// } -// }; -// function onerror(err) { -// if (_cbs.onerror) _cbs.onerror(err); -// }; -// function onend() { -// if (_cbs.onclosetag) { -// for ( -// var i = _stack.length; -// i > 0; -// _cbs.onclosetag(_stack[--i]) -// ); -// } -// if (_cbs.onend) _cbs.onend(); -// }; -// } - -//# sourceMappingURL=parser-old.js.map -- - diff --git a/docs/api/source/parser.html b/docs/api/source/parser.html deleted file mode 100644 index 58bcf1bc..00000000 --- a/docs/api/source/parser.html +++ /dev/null @@ -1,303 +0,0 @@ - - - - -
"use strict"; -// /* -// * Modified version of htmlparser2 which has been stripped down to only provide -// * the functionality needed by Autolinker in order to make the final bundle as -// * small as possible. -// * -// * See license in tokenizer.ts -// */ -// import { tokenizeHtml } from './tokenizer'; -// export function parseHtml( html: string, { -// } ) { -// /* -// Callbacks: -// oncdataend, -// oncdatastart, -// onclosetag, -// oncomment, -// oncommentend, -// onerror, -// onopentag, -// onprocessinginstruction, -// onreset, -// ontext -// */ -// var formTags = { -// input: true, -// option: true, -// optgroup: true, -// select: true, -// button: true, -// datalist: true, -// textarea: true -// }; -// var openImpliesClose = { -// tr: { tr: true, th: true, td: true }, -// th: { th: true }, -// td: { thead: true, th: true, td: true }, -// body: { head: true, link: true, script: true }, -// li: { li: true }, -// p: { p: true }, -// h1: { p: true }, -// h2: { p: true }, -// h3: { p: true }, -// h4: { p: true }, -// h5: { p: true }, -// h6: { p: true }, -// select: formTags, -// input: formTags, -// output: formTags, -// button: formTags, -// datalist: formTags, -// textarea: formTags, -// option: { option: true }, -// optgroup: { optgroup: true } -// }; -// var voidElements = { -// __proto__: null, -// area: true, -// base: true, -// basefont: true, -// br: true, -// col: true, -// command: true, -// embed: true, -// frame: true, -// hr: true, -// img: true, -// input: true, -// isindex: true, -// keygen: true, -// link: true, -// meta: true, -// param: true, -// source: true, -// track: true, -// wbr: true -// }; -// var foreignContextElements = { -// __proto__: null, -// math: true, -// svg: true -// }; -// var htmlIntegrationElements = { -// __proto__: null, -// mi: true, -// mo: true, -// mn: true, -// ms: true, -// mtext: true, -// "annotation-xml": true, -// foreignObject: true, -// desc: true, -// title: true -// }; -// var re_nameEnd = /\s|\//; -// let _options = options || {}; -// let _cbs = cbs || {}; -// let _tagname = ""; -// let _attribname = ""; -// let _attribvalue = ""; -// let _attribs = null; -// let _stack = []; -// let _foreignContext = []; -// let _lowerCaseTagNames = -// "lowerCaseTags" in _options -// ? !!_options.lowerCaseTags -// : !_options.xmlMode; -// let _lowerCaseAttributeNames = -// "lowerCaseAttributeNames" in _options -// ? !!_options.lowerCaseAttributeNames -// : !_options.xmlMode; -// tokenizeHtml( html, { -// ontext, -// onopentagname, -// onopentagend, -// onclosetag, -// onselfclosingtag, -// oncomment, -// onerror -// } ); -// //Tokenizer event handlers -// function ontext(data) { -// if (_cbs.ontext) _cbs.ontext(data); -// }; -// function onopentagname(name) { -// if (_lowerCaseTagNames) { -// name = name.toLowerCase(); -// } -// _tagname = name; -// if (!_options.xmlMode && name in openImpliesClose) { -// for ( -// var el; -// (el = _stack[_stack.length - 1]) in -// openImpliesClose[name]; -// onclosetag(el) -// ); -// } -// if (_options.xmlMode || !(name in voidElements)) { -// _stack.push(name); -// if (name in foreignContextElements) _foreignContext.push(true); -// else if (name in htmlIntegrationElements) -// _foreignContext.push(false); -// } -// if (_cbs.onopentagname) _cbs.onopentagname(name); -// if (_cbs.onopentag) _attribs = {}; -// }; -// function onopentagend() { -// if (_attribs) { -// if (_cbs.onopentag) -// _cbs.onopentag(_tagname, _attribs); -// _attribs = null; -// } -// if ( -// !_options.xmlMode && -// _cbs.onclosetag && -// _tagname in voidElements -// ) { -// _cbs.onclosetag(_tagname); -// } -// _tagname = ""; -// }; -// function onclosetag(name) { -// _updatePosition(1); -// if (_lowerCaseTagNames) { -// name = name.toLowerCase(); -// } -// if ( -// _stack.length && -// (!(name in voidElements) || _options.xmlMode) -// ) { -// var pos = _stack.lastIndexOf(name); -// if (pos !== -1) { -// if (_cbs.onclosetag) { -// pos = _stack.length - pos; -// while (pos--) _cbs.onclosetag(_stack.pop()); -// } else _stack.length = pos; -// } else if (name === "p" && !_options.xmlMode) { -// onopentagname(name); -// _closeCurrentTag(); -// } -// } else if (!_options.xmlMode && (name === "br" || name === "p")) { -// onopentagname(name); -// _closeCurrentTag(); -// } -// }; -// function onselfclosingtag() { -// if ( -// _options.xmlMode || -// _options.recognizeSelfClosing || -// _foreignContext[_foreignContext.length - 1] -// ) { -// _closeCurrentTag(); -// } else { -// onopentagend(); -// } -// }; -// function _closeCurrentTag() { -// var name = _tagname; -// onopentagend(); -// //self-closing tags will be on the top of the stack -// //(cheaper check than in onclosetag) -// if (_stack[_stack.length - 1] === name) { -// if (_cbs.onclosetag) { -// _cbs.onclosetag(name); -// } -// _stack.pop(); -// if (name in foreignContextElements || name in htmlIntegrationElements) { -// _foreignContext.pop(); -// } -// } -// }; -// function onattribname(name) { -// if (_lowerCaseAttributeNames) { -// name = name.toLowerCase(); -// } -// _attribname = name; -// }; -// function onattribdata(value) { -// _attribvalue += value; -// }; -// function onattribend() { -// if (_cbs.onattribute) -// _cbs.onattribute(_attribname, _attribvalue); -// if ( -// _attribs && -// !Object.prototype.hasOwnProperty.call(_attribs, _attribname) -// ) { -// _attribs[_attribname] = _attribvalue; -// } -// _attribname = ""; -// _attribvalue = ""; -// }; -// function _getInstructionName(value) { -// var idx = value.search(re_nameEnd), -// name = idx < 0 ? value : value.substr(0, idx); -// if (_lowerCaseTagNames) { -// name = name.toLowerCase(); -// } -// return name; -// }; -// function ondeclaration(value) { -// if (_cbs.onprocessinginstruction) { -// var name = _getInstructionName(value); -// _cbs.onprocessinginstruction("!" + name, "!" + value); -// } -// }; -// function onprocessinginstruction(value) { -// if (_cbs.onprocessinginstruction) { -// var name = _getInstructionName(value); -// _cbs.onprocessinginstruction("?" + name, "?" + value); -// } -// }; -// function oncomment(value) { -// _updatePosition(4); -// if (_cbs.oncomment) _cbs.oncomment(value); -// if (_cbs.oncommentend) _cbs.oncommentend(); -// }; -// function oncdata(value) { -// _updatePosition(1); -// if (_options.xmlMode || _options.recognizeCDATA) { -// if (_cbs.oncdatastart) _cbs.oncdatastart(); -// if (_cbs.ontext) _cbs.ontext(value); -// if (_cbs.oncdataend) _cbs.oncdataend(); -// } else { -// oncomment("[CDATA[" + value + "]]"); -// } -// }; -// function onerror(err) { -// if (_cbs.onerror) _cbs.onerror(err); -// }; -// function onend() { -// if (_cbs.onclosetag) { -// for ( -// var i = _stack.length; -// i > 0; -// _cbs.onclosetag(_stack[--i]) -// ); -// } -// if (_cbs.onend) _cbs.onend(); -// }; -// } - -//# sourceMappingURL=parser.js.map -- - diff --git a/docs/api/source/text-node.html b/docs/api/source/text-node.html deleted file mode 100644 index 5be7a6df..00000000 --- a/docs/api/source/text-node.html +++ /dev/null @@ -1,51 +0,0 @@ - - - - -
"use strict"; -Object.defineProperty(exports, "__esModule", { value: true }); -var tslib_1 = require("tslib"); -var html_node_1 = require("./html-node"); -/** - * @class Autolinker.htmlParser.TextNode - * @extends Autolinker.htmlParser.HtmlNode - * - * Represents a text node that has been parsed by the {@link Autolinker.htmlParser.HtmlParser}. - * - * See this class's superclass ({@link Autolinker.htmlParser.HtmlNode}) for more - * details. - */ -var TextNode = (function (_super) { - tslib_1.__extends(TextNode, _super); - function TextNode() { - return _super !== null && _super.apply(this, arguments) || this; - } - /** - * Returns a string name for the type of node that this class represents. - * - * @return {String} - */ - TextNode.prototype.getType = function () { - return 'text'; - }; - return TextNode; -}(html_node_1.HtmlNode)); -exports.TextNode = TextNode; - -//# sourceMappingURL=text-node.js.map -- - diff --git a/docs/api/source/tokenizer-old.html b/docs/api/source/tokenizer-old.html deleted file mode 100644 index 4e5bccb9..00000000 --- a/docs/api/source/tokenizer-old.html +++ /dev/null @@ -1,884 +0,0 @@ - - - - -
"use strict"; -/*! - * Modified version of htmlparser2 which has been stripped down to only provide - * the functionality needed by Autolinker in order to make the final bundle as - * small as possible. Original: https://github.com/fb55/htmlparser2 - * - * Original copyright: - * - * Copyright 2010, 2011, Chris Winberry <chris@winberry.net>. All rights reserved. - * Permission is hereby granted, free of charge, to any person obtaining a copy - * of this software and associated documentation files (the "Software"), to - * deal in the Software without restriction, including without limitation the - * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or - * sell copies of the Software, and to permit persons to whom the Software is - * furnished to do so, subject to the following conditions: -*/ -Object.defineProperty(exports, "__esModule", { value: true }); -var i = 0; -var TEXT = i++; -var BEFORE_TAG_NAME = i++; //after < -var IN_TAG_NAME = i++; -var IN_SELF_CLOSING_TAG = i++; -var BEFORE_CLOSING_TAG_NAME = i++; -var IN_CLOSING_TAG_NAME = i++; -var AFTER_CLOSING_TAG_NAME = i++; -//attributes -var BEFORE_ATTRIBUTE_NAME = i++; -var IN_ATTRIBUTE_NAME = i++; -var AFTER_ATTRIBUTE_NAME = i++; -var BEFORE_ATTRIBUTE_VALUE = i++; -var IN_ATTRIBUTE_VALUE_DQ = i++; // " -var IN_ATTRIBUTE_VALUE_SQ = i++; // ' -var IN_ATTRIBUTE_VALUE_NQ = i++; -//declarations -var BEFORE_DECLARATION = i++; // ! -var IN_DECLARATION = i++; -//processing instructions -var IN_PROCESSING_INSTRUCTION = i++; // ? -//comments -var BEFORE_COMMENT = i++; -var IN_COMMENT = i++; -var AFTER_COMMENT_1 = i++; -var AFTER_COMMENT_2 = i++; -//cdata -var BEFORE_CDATA_1 = i++; // [ -var BEFORE_CDATA_2 = i++; // C -var BEFORE_CDATA_3 = i++; // D -var BEFORE_CDATA_4 = i++; // A -var BEFORE_CDATA_5 = i++; // T -var BEFORE_CDATA_6 = i++; // A -var IN_CDATA = i++; // [ -var AFTER_CDATA_1 = i++; // ] -var AFTER_CDATA_2 = i++; // ] -//special tags -var BEFORE_SPECIAL = i++; //S -var BEFORE_SPECIAL_END = i++; //S -var BEFORE_SCRIPT_1 = i++; //C -var BEFORE_SCRIPT_2 = i++; //R -var BEFORE_SCRIPT_3 = i++; //I -var BEFORE_SCRIPT_4 = i++; //P -var BEFORE_SCRIPT_5 = i++; //T -var AFTER_SCRIPT_1 = i++; //C -var AFTER_SCRIPT_2 = i++; //R -var AFTER_SCRIPT_3 = i++; //I -var AFTER_SCRIPT_4 = i++; //P -var AFTER_SCRIPT_5 = i++; //T -var BEFORE_STYLE_1 = i++; //T -var BEFORE_STYLE_2 = i++; //Y -var BEFORE_STYLE_3 = i++; //L -var BEFORE_STYLE_4 = i++; //E -var AFTER_STYLE_1 = i++; //T -var AFTER_STYLE_2 = i++; //Y -var AFTER_STYLE_3 = i++; //L -var AFTER_STYLE_4 = i++; //E -var BEFORE_ENTITY = i++; //& -var BEFORE_NUMERIC_ENTITY = i++; //# -var IN_NAMED_ENTITY = i++; -var IN_NUMERIC_ENTITY = i++; -var IN_HEX_ENTITY = i++; //X -var j = 0; -var SPECIAL_NONE = j++; -var SPECIAL_SCRIPT = j++; -var SPECIAL_STYLE = j++; -function tokenizeHtml(html, _a) { - var ontext = _a.ontext, onopentagname = _a.onopentagname, onopentagend = _a.onopentagend, onclosetag = _a.onclosetag, onselfclosingtag = _a.onselfclosingtag, oncomment = _a.oncomment, onerror = _a.onerror; - var _state = TEXT, _buffer = html, _sectionStart = 0, _index = 0, _baseState = TEXT, _special = SPECIAL_NONE, _decodeEntities = true, _xmlMode = false; - // TEMPORARY - var entityMap = {}; - var xmlMap = {}; - var legacyMap = {}; - var decodeCodePoint = function (arg) { return ''; }; - _parse(); - function whitespace(c) { - return c === " " || c === "\n" || c === "\t" || c === "\f" || c === "\r"; - } - function ifElseState(upper, SUCCESS, FAILURE) { - var lower = upper.toLowerCase(); - if (upper === lower) { - return function (c) { - if (c === lower) { - _state = SUCCESS; - } - else { - _state = FAILURE; - _index--; - } - }; - } - else { - return function (c) { - if (c === lower || c === upper) { - _state = SUCCESS; - } - else { - _state = FAILURE; - _index--; - } - }; - } - } - function consumeSpecialNameChar(upper, NEXT_STATE) { - var lower = upper.toLowerCase(); - return function (c) { - if (c === lower || c === upper) { - _state = NEXT_STATE; - } - else { - _state = IN_TAG_NAME; - _index--; //consume the token again - } - }; - } - function _stateText(c) { - if (c === "<") { - if (_index > _sectionStart) { - ontext(_getSection()); - } - _state = BEFORE_TAG_NAME; - _sectionStart = _index; - } - else if (_decodeEntities && - _special === SPECIAL_NONE && - c === "&") { - if (_index > _sectionStart) { - ontext(_getSection()); - } - _baseState = TEXT; - _state = BEFORE_ENTITY; - _sectionStart = _index; - } - } - ; - function _stateBeforeTagName(c) { - if (c === "/") { - _state = BEFORE_CLOSING_TAG_NAME; - } - else if (c === "<") { - ontext(_getSection()); - _sectionStart = _index; - } - else if (c === ">" || _special !== SPECIAL_NONE || whitespace(c)) { - _state = TEXT; - } - else if (c === "!") { - _state = BEFORE_DECLARATION; - _sectionStart = _index + 1; - } - else if (c === "?") { - _state = IN_PROCESSING_INSTRUCTION; - _sectionStart = _index + 1; - } - else { - _state = - !_xmlMode && (c === "s" || c === "S") - ? BEFORE_SPECIAL - : IN_TAG_NAME; - _sectionStart = _index; - } - } - ; - function _stateInTagName(c) { - if (c === "/" || c === ">" || whitespace(c)) { - _emitToken("onopentagname"); - _state = BEFORE_ATTRIBUTE_NAME; - _index--; - } - } - ; - function _stateBeforeCloseingTagName(c) { - if (whitespace(c)) { } - else if (c === ">") { - _state = TEXT; - } - else if (_special !== SPECIAL_NONE) { - if (c === "s" || c === "S") { - _state = BEFORE_SPECIAL_END; - } - else { - _state = TEXT; - _index--; - } - } - else { - _state = IN_CLOSING_TAG_NAME; - _sectionStart = _index; - } - } - ; - function _stateInCloseingTagName(c) { - if (c === ">" || whitespace(c)) { - _emitToken("onclosetag"); - _state = AFTER_CLOSING_TAG_NAME; - _index--; - } - } - ; - function _stateAfterCloseingTagName(c) { - //skip everything until ">" - if (c === ">") { - _state = TEXT; - _sectionStart = _index + 1; - } - } - ; - function _stateBeforeAttributeName(c) { - if (c === ">") { - onopentagend(); - _state = TEXT; - _sectionStart = _index + 1; - } - else if (c === "/") { - _state = IN_SELF_CLOSING_TAG; - } - else if (!whitespace(c)) { - _state = IN_ATTRIBUTE_NAME; - _sectionStart = _index; - } - } - ; - function _stateInSelfClosingTag(c) { - if (c === ">") { - onselfclosingtag(); - _state = TEXT; - _sectionStart = _index + 1; - } - else if (!whitespace(c)) { - _state = BEFORE_ATTRIBUTE_NAME; - _index--; - } - } - ; - function _stateInAttributeName(c) { - if (c === "=" || c === "/" || c === ">" || whitespace(c)) { - _sectionStart = -1; - _state = AFTER_ATTRIBUTE_NAME; - _index--; - } - } - ; - function _stateAfterAttributeName(c) { - if (c === "=") { - _state = BEFORE_ATTRIBUTE_VALUE; - } - else if (c === "/" || c === ">") { - _state = BEFORE_ATTRIBUTE_NAME; - _index--; - } - else if (!whitespace(c)) { - _state = IN_ATTRIBUTE_NAME; - _sectionStart = _index; - } - } - ; - function _stateBeforeAttributeValue(c) { - if (c === '"') { - _state = IN_ATTRIBUTE_VALUE_DQ; - _sectionStart = _index + 1; - } - else if (c === "'") { - _state = IN_ATTRIBUTE_VALUE_SQ; - _sectionStart = _index + 1; - } - else if (!whitespace(c)) { - _state = IN_ATTRIBUTE_VALUE_NQ; - _sectionStart = _index; - _index--; //reconsume token - } - } - ; - function _stateInAttributeValueDoubleQuotes(c) { - if (c === '"') { - _state = BEFORE_ATTRIBUTE_NAME; - } - else if (_decodeEntities && c === "&") { - _baseState = _state; - _state = BEFORE_ENTITY; - _sectionStart = _index; - } - } - ; - function _stateInAttributeValueSingleQuotes(c) { - if (c === "'") { - _state = BEFORE_ATTRIBUTE_NAME; - } - else if (_decodeEntities && c === "&") { - _baseState = _state; - _state = BEFORE_ENTITY; - _sectionStart = _index; - } - } - ; - function _stateInAttributeValueNoQuotes(c) { - if (whitespace(c) || c === ">") { - _state = BEFORE_ATTRIBUTE_NAME; - _index--; - } - else if (_decodeEntities && c === "&") { - _baseState = _state; - _state = BEFORE_ENTITY; - _sectionStart = _index; - } - } - ; - function _stateBeforeDeclaration(c) { - _state = - c === "[" - ? BEFORE_CDATA_1 - : c === "-" - ? BEFORE_COMMENT - : IN_DECLARATION; - } - ; - function _stateInDeclaration(c) { - if (c === ">") { - _state = TEXT; - _sectionStart = _index + 1; - } - } - ; - function _stateInProcessingInstruction(c) { - if (c === ">") { - _state = TEXT; - _sectionStart = _index + 1; - } - } - ; - function _stateBeforeComment(c) { - if (c === "-") { - _state = IN_COMMENT; - _sectionStart = _index + 1; - } - else { - _state = IN_DECLARATION; - } - } - ; - function _stateInComment(c) { - if (c === "-") - _state = AFTER_COMMENT_1; - } - ; - function _stateAfterComment1(c) { - if (c === "-") { - _state = AFTER_COMMENT_2; - } - else { - _state = IN_COMMENT; - } - } - ; - function _stateAfterComment2(c) { - if (c === ">") { - //remove 2 trailing chars - oncomment(_buffer.substring(_sectionStart, _index - 2)); - _state = TEXT; - _sectionStart = _index + 1; - } - else if (c !== "-") { - _state = IN_COMMENT; - } - // else: stay in AFTER_COMMENT_2 (`--->`) - } - ; - var _stateBeforeCdata1 = ifElseState("C", BEFORE_CDATA_2, IN_DECLARATION); - var _stateBeforeCdata2 = ifElseState("D", BEFORE_CDATA_3, IN_DECLARATION); - var _stateBeforeCdata3 = ifElseState("A", BEFORE_CDATA_4, IN_DECLARATION); - var _stateBeforeCdata4 = ifElseState("T", BEFORE_CDATA_5, IN_DECLARATION); - var _stateBeforeCdata5 = ifElseState("A", BEFORE_CDATA_6, IN_DECLARATION); - function _stateBeforeCdata6(c) { - if (c === "[") { - _state = IN_CDATA; - _sectionStart = _index + 1; - } - else { - _state = IN_DECLARATION; - _index--; - } - } - ; - function _stateInCdata(c) { - if (c === "]") - _state = AFTER_CDATA_1; - } - ; - function _stateAfterCdata1(c) { - if (c === "]") - _state = AFTER_CDATA_2; - else - _state = IN_CDATA; - } - ; - function _stateAfterCdata2(c) { - if (c === ">") { - _state = TEXT; - _sectionStart = _index + 1; - } - else if (c !== "]") { - _state = IN_CDATA; - } - //else: stay in AFTER_CDATA_2 (`]]]>`) - } - ; - function _stateBeforeSpecial(c) { - if (c === "c" || c === "C") { - _state = BEFORE_SCRIPT_1; - } - else if (c === "t" || c === "T") { - _state = BEFORE_STYLE_1; - } - else { - _state = IN_TAG_NAME; - _index--; //consume the token again - } - } - ; - function _stateBeforeSpecialEnd(c) { - if (_special === SPECIAL_SCRIPT && (c === "c" || c === "C")) { - _state = AFTER_SCRIPT_1; - } - else if (_special === SPECIAL_STYLE && (c === "t" || c === "T")) { - _state = AFTER_STYLE_1; - } - else - _state = TEXT; - } - ; - var _stateBeforeScript1 = consumeSpecialNameChar("R", BEFORE_SCRIPT_2); - var _stateBeforeScript2 = consumeSpecialNameChar("I", BEFORE_SCRIPT_3); - var _stateBeforeScript3 = consumeSpecialNameChar("P", BEFORE_SCRIPT_4); - var _stateBeforeScript4 = consumeSpecialNameChar("T", BEFORE_SCRIPT_5); - function _stateBeforeScript5(c) { - if (c === "/" || c === ">" || whitespace(c)) { - _special = SPECIAL_SCRIPT; - } - _state = IN_TAG_NAME; - _index--; //consume the token again - } - ; - var _stateAfterScript1 = ifElseState("R", AFTER_SCRIPT_2, TEXT); - var _stateAfterScript2 = ifElseState("I", AFTER_SCRIPT_3, TEXT); - var _stateAfterScript3 = ifElseState("P", AFTER_SCRIPT_4, TEXT); - var _stateAfterScript4 = ifElseState("T", AFTER_SCRIPT_5, TEXT); - function _stateAfterScript5(c) { - if (c === ">" || whitespace(c)) { - _special = SPECIAL_NONE; - _state = IN_CLOSING_TAG_NAME; - _sectionStart = _index - 6; - _index--; //reconsume the token - } - else - _state = TEXT; - } - ; - var _stateBeforeStyle1 = consumeSpecialNameChar("Y", BEFORE_STYLE_2); - var _stateBeforeStyle2 = consumeSpecialNameChar("L", BEFORE_STYLE_3); - var _stateBeforeStyle3 = consumeSpecialNameChar("E", BEFORE_STYLE_4); - function _stateBeforeStyle4(c) { - if (c === "/" || c === ">" || whitespace(c)) { - _special = SPECIAL_STYLE; - } - _state = IN_TAG_NAME; - _index--; //consume the token again - } - ; - var _stateAfterStyle1 = ifElseState("Y", AFTER_STYLE_2, TEXT); - var _stateAfterStyle2 = ifElseState("L", AFTER_STYLE_3, TEXT); - var _stateAfterStyle3 = ifElseState("E", AFTER_STYLE_4, TEXT); - function _stateAfterStyle4(c) { - if (c === ">" || whitespace(c)) { - _special = SPECIAL_NONE; - _state = IN_CLOSING_TAG_NAME; - _sectionStart = _index - 5; - _index--; //reconsume the token - } - else - _state = TEXT; - } - ; - var _stateBeforeEntity = ifElseState("#", BEFORE_NUMERIC_ENTITY, IN_NAMED_ENTITY); - var _stateBeforeNumericEntity = ifElseState("X", IN_HEX_ENTITY, IN_NUMERIC_ENTITY); - //for entities terminated with a semicolon - function _parseNamedEntityStrict() { - // TODO: For this section, use the regex /( | |<|<|>|>|"|"|')/ ??? - //offset = 1 - if (_sectionStart + 1 < _index) { - var entity = _buffer.substring(_sectionStart + 1, _index), map = _xmlMode ? xmlMap : entityMap; - if (map.hasOwnProperty(entity)) { - _emitPartial(map[entity]); - _sectionStart = _index + 1; - } - } - } - ; - //parses legacy entities (without trailing semicolon) - function _parseLegacyEntity() { - var start = _sectionStart + 1, limit = _index - start; - if (limit > 6) - limit = 6; //the max length of legacy entities is 6 - while (limit >= 2) { - //the min length of legacy entities is 2 - var entity = _buffer.substr(start, limit); - if (legacyMap.hasOwnProperty(entity)) { - _emitPartial(legacyMap[entity]); - _sectionStart += limit + 1; - return; - } - else { - limit--; - } - } - } - ; - function _stateInNamedEntity(c) { - if (c === ";") { - _parseNamedEntityStrict(); - if (_sectionStart + 1 < _index && !_xmlMode) { - _parseLegacyEntity(); - } - _state = _baseState; - } - else if ((c < "a" || c > "z") && - (c < "A" || c > "Z") && - (c < "0" || c > "9")) { - if (_xmlMode) { } - else if (_sectionStart + 1 === _index) { } - else if (_baseState !== TEXT) { - if (c !== "=") { - _parseNamedEntityStrict(); - } - } - else { - _parseLegacyEntity(); - } - _state = _baseState; - _index--; - } - } - ; - function _decodeNumericEntity(offset, base) { - var sectionStart = _sectionStart + offset; - if (sectionStart !== _index) { - //parse entity - var entity = _buffer.substring(sectionStart, _index); - var parsed = parseInt(entity, base); - _emitPartial(decodeCodePoint(parsed)); - _sectionStart = _index; - } - else { - _sectionStart--; - } - _state = _baseState; - } - ; - function _stateInNumericEntity(c) { - if (c === ";") { - _decodeNumericEntity(2, 10); - _sectionStart++; - } - else if (c < "0" || c > "9") { - if (!_xmlMode) { - _decodeNumericEntity(2, 10); - } - else { - _state = _baseState; - } - _index--; - } - } - ; - function _stateInHexEntity(c) { - if (c === ";") { - _decodeNumericEntity(3, 16); - _sectionStart++; - } - else if ((c < "a" || c > "f") && - (c < "A" || c > "F") && - (c < "0" || c > "9")) { - if (!_xmlMode) { - _decodeNumericEntity(3, 16); - } - else { - _state = _baseState; - } - _index--; - } - } - ; - function _cleanup() { - if (_sectionStart < 0) { - _buffer = ""; - _index = 0; - } - else { - if (_state === TEXT) { - if (_sectionStart !== _index) { - ontext(_buffer.substr(_sectionStart)); - } - _buffer = ""; - _index = 0; - } - else if (_sectionStart === _index) { - //the section just started - _buffer = ""; - _index = 0; - } - else { - //remove everything unnecessary - _buffer = _buffer.substr(_sectionStart); - _index -= _sectionStart; - } - _sectionStart = 0; - } - } - ; - function _parse() { - while (_index < _buffer.length) { - var c = _buffer.charAt(_index); - if (_state === TEXT) { - _stateText(c); - } - else if (_state === BEFORE_TAG_NAME) { - _stateBeforeTagName(c); - } - else if (_state === IN_TAG_NAME) { - _stateInTagName(c); - } - else if (_state === BEFORE_CLOSING_TAG_NAME) { - _stateBeforeCloseingTagName(c); - } - else if (_state === IN_CLOSING_TAG_NAME) { - _stateInCloseingTagName(c); - } - else if (_state === AFTER_CLOSING_TAG_NAME) { - _stateAfterCloseingTagName(c); - } - else if (_state === IN_SELF_CLOSING_TAG) { - _stateInSelfClosingTag(c); - } - else if (_state === BEFORE_ATTRIBUTE_NAME) { - /* - * attributes - */ - _stateBeforeAttributeName(c); - } - else if (_state === IN_ATTRIBUTE_NAME) { - _stateInAttributeName(c); - } - else if (_state === AFTER_ATTRIBUTE_NAME) { - _stateAfterAttributeName(c); - } - else if (_state === BEFORE_ATTRIBUTE_VALUE) { - _stateBeforeAttributeValue(c); - } - else if (_state === IN_ATTRIBUTE_VALUE_DQ) { - _stateInAttributeValueDoubleQuotes(c); - } - else if (_state === IN_ATTRIBUTE_VALUE_SQ) { - _stateInAttributeValueSingleQuotes(c); - } - else if (_state === IN_ATTRIBUTE_VALUE_NQ) { - _stateInAttributeValueNoQuotes(c); - } - else if (_state === BEFORE_DECLARATION) { - /* - * declarations - */ - _stateBeforeDeclaration(c); - } - else if (_state === IN_DECLARATION) { - _stateInDeclaration(c); - } - else if (_state === IN_PROCESSING_INSTRUCTION) { - /* - * processing instructions - */ - _stateInProcessingInstruction(c); - } - else if (_state === BEFORE_COMMENT) { - /* - * comments - */ - _stateBeforeComment(c); - } - else if (_state === IN_COMMENT) { - _stateInComment(c); - } - else if (_state === AFTER_COMMENT_1) { - _stateAfterComment1(c); - } - else if (_state === AFTER_COMMENT_2) { - _stateAfterComment2(c); - } - else if (_state === BEFORE_CDATA_1) { - /* - * cdata - */ - _stateBeforeCdata1(c); - } - else if (_state === BEFORE_CDATA_2) { - _stateBeforeCdata2(c); - } - else if (_state === BEFORE_CDATA_3) { - _stateBeforeCdata3(c); - } - else if (_state === BEFORE_CDATA_4) { - _stateBeforeCdata4(c); - } - else if (_state === BEFORE_CDATA_5) { - _stateBeforeCdata5(c); - } - else if (_state === BEFORE_CDATA_6) { - _stateBeforeCdata6(c); - } - else if (_state === IN_CDATA) { - _stateInCdata(c); - } - else if (_state === AFTER_CDATA_1) { - _stateAfterCdata1(c); - } - else if (_state === AFTER_CDATA_2) { - _stateAfterCdata2(c); - } - else if (_state === BEFORE_SPECIAL) { - /* - * special tags - */ - _stateBeforeSpecial(c); - } - else if (_state === BEFORE_SPECIAL_END) { - _stateBeforeSpecialEnd(c); - } - else if (_state === BEFORE_SCRIPT_1) { - /* - * script - */ - _stateBeforeScript1(c); - } - else if (_state === BEFORE_SCRIPT_2) { - _stateBeforeScript2(c); - } - else if (_state === BEFORE_SCRIPT_3) { - _stateBeforeScript3(c); - } - else if (_state === BEFORE_SCRIPT_4) { - _stateBeforeScript4(c); - } - else if (_state === BEFORE_SCRIPT_5) { - _stateBeforeScript5(c); - } - else if (_state === AFTER_SCRIPT_1) { - _stateAfterScript1(c); - } - else if (_state === AFTER_SCRIPT_2) { - _stateAfterScript2(c); - } - else if (_state === AFTER_SCRIPT_3) { - _stateAfterScript3(c); - } - else if (_state === AFTER_SCRIPT_4) { - _stateAfterScript4(c); - } - else if (_state === AFTER_SCRIPT_5) { - _stateAfterScript5(c); - } - else if (_state === BEFORE_STYLE_1) { - /* - * style - */ - _stateBeforeStyle1(c); - } - else if (_state === BEFORE_STYLE_2) { - _stateBeforeStyle2(c); - } - else if (_state === BEFORE_STYLE_3) { - _stateBeforeStyle3(c); - } - else if (_state === BEFORE_STYLE_4) { - _stateBeforeStyle4(c); - } - else if (_state === AFTER_STYLE_1) { - _stateAfterStyle1(c); - } - else if (_state === AFTER_STYLE_2) { - _stateAfterStyle2(c); - } - else if (_state === AFTER_STYLE_3) { - _stateAfterStyle3(c); - } - else if (_state === AFTER_STYLE_4) { - _stateAfterStyle4(c); - } - else if (_state === BEFORE_ENTITY) { - /* - * entities - */ - _stateBeforeEntity(c); - } - else if (_state === BEFORE_NUMERIC_ENTITY) { - _stateBeforeNumericEntity(c); - } - else if (_state === IN_NAMED_ENTITY) { - _stateInNamedEntity(c); - } - else if (_state === IN_NUMERIC_ENTITY) { - _stateInNumericEntity(c); - } - else if (_state === IN_HEX_ENTITY) { - _stateInHexEntity(c); - } - else { - onerror(new Error("unknown _state"), _state); - } - _index++; - } - _cleanup(); - } - ; - function _getSection() { - return _buffer.substring(_sectionStart, _index); - } - ; - function _emitToken(name) { - if (name === 'onopentagname') { - onopentagname(_getSection()); - } - else if (name === 'onclosetag') { - onclosetag(_getSection()); - } - _sectionStart = -1; - } - ; - function _emitPartial(value) { - if (_baseState !== TEXT) { - //_cbs.onattribdata(value); //TODO implement the new event - } - else { - ontext(value); - } - } - ; -} -exports.tokenizeHtml = tokenizeHtml; - -//# sourceMappingURL=tokenizer-old.js.map -- - diff --git a/docs/api/source/tokenizer.html b/docs/api/source/tokenizer.html deleted file mode 100644 index 060a4474..00000000 --- a/docs/api/source/tokenizer.html +++ /dev/null @@ -1,809 +0,0 @@ - - - - -
"use strict"; -// /*! -// * Modified version of htmlparser2 which has been stripped down to only provide -// * the functionality needed by Autolinker in order to make the final bundle as -// * small as possible. Original: https://github.com/fb55/htmlparser2 -// * -// * Original copyright: -// * -// * Copyright 2010, 2011, Chris Winberry <chris@winberry.net>. All rights reserved. -// * Permission is hereby granted, free of charge, to any person obtaining a copy -// * of this software and associated documentation files (the "Software"), to -// * deal in the Software without restriction, including without limitation the -// * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or -// * sell copies of the Software, and to permit persons to whom the Software is -// * furnished to do so, subject to the following conditions: -// */ -// var i = 0; -// var TEXT = i++; -// var BEFORE_TAG_NAME = i++; //after < -// var IN_TAG_NAME = i++; -// var IN_SELF_CLOSING_TAG = i++; -// var BEFORE_CLOSING_TAG_NAME = i++; -// var IN_CLOSING_TAG_NAME = i++; -// var AFTER_CLOSING_TAG_NAME = i++; -// //attributes -// var BEFORE_ATTRIBUTE_NAME = i++; -// var IN_ATTRIBUTE_NAME = i++; -// var AFTER_ATTRIBUTE_NAME = i++; -// var BEFORE_ATTRIBUTE_VALUE = i++; -// var IN_ATTRIBUTE_VALUE_DQ = i++; // " -// var IN_ATTRIBUTE_VALUE_SQ = i++; // ' -// var IN_ATTRIBUTE_VALUE_NQ = i++; -// //declarations -// var BEFORE_DECLARATION = i++; // ! -// var IN_DECLARATION = i++; -// //processing instructions -// var IN_PROCESSING_INSTRUCTION = i++; // ? -// //comments -// var BEFORE_COMMENT = i++; -// var IN_COMMENT = i++; -// var AFTER_COMMENT_1 = i++; -// var AFTER_COMMENT_2 = i++; -// //cdata -// var BEFORE_CDATA_1 = i++; // [ -// var BEFORE_CDATA_2 = i++; // C -// var BEFORE_CDATA_3 = i++; // D -// var BEFORE_CDATA_4 = i++; // A -// var BEFORE_CDATA_5 = i++; // T -// var BEFORE_CDATA_6 = i++; // A -// var IN_CDATA = i++; // [ -// var AFTER_CDATA_1 = i++; // ] -// var AFTER_CDATA_2 = i++; // ] -// //special tags -// var BEFORE_SPECIAL = i++; //S -// var BEFORE_SPECIAL_END = i++; //S -// var BEFORE_SCRIPT_1 = i++; //C -// var BEFORE_SCRIPT_2 = i++; //R -// var BEFORE_SCRIPT_3 = i++; //I -// var BEFORE_SCRIPT_4 = i++; //P -// var BEFORE_SCRIPT_5 = i++; //T -// var AFTER_SCRIPT_1 = i++; //C -// var AFTER_SCRIPT_2 = i++; //R -// var AFTER_SCRIPT_3 = i++; //I -// var AFTER_SCRIPT_4 = i++; //P -// var AFTER_SCRIPT_5 = i++; //T -// var BEFORE_STYLE_1 = i++; //T -// var BEFORE_STYLE_2 = i++; //Y -// var BEFORE_STYLE_3 = i++; //L -// var BEFORE_STYLE_4 = i++; //E -// var AFTER_STYLE_1 = i++; //T -// var AFTER_STYLE_2 = i++; //Y -// var AFTER_STYLE_3 = i++; //L -// var AFTER_STYLE_4 = i++; //E -// var BEFORE_ENTITY = i++; //& -// var BEFORE_NUMERIC_ENTITY = i++; //# -// var IN_NAMED_ENTITY = i++; -// var IN_NUMERIC_ENTITY = i++; -// var IN_HEX_ENTITY = i++; //X -// var j = 0; -// var SPECIAL_NONE = j++; -// var SPECIAL_SCRIPT = j++; -// var SPECIAL_STYLE = j++; -// export function tokenizeHtml( -// html: string, -// { ontext, onopentagname, onopentagend, onclosetag, onselfclosingtag, oncomment, onerror }: { -// ontext: ( text: string ) => void; -// onopentagname: ( name: string ) => void; -// onopentagend: () => void; -// onclosetag: ( name: string ) => void; -// onselfclosingtag: () => void; -// oncomment: ( text: string ) => void; -// onerror: ( err: Error, state: number ) => void; -// } -// ) { -// let _state = TEXT, -// _buffer = html, -// _sectionStart = 0, -// _index = 0, -// _baseState = TEXT, -// _special = SPECIAL_NONE, -// _decodeEntities = true, -// _xmlMode = false; -// // TEMPORARY -// const entityMap: any = {}; -// const xmlMap: any = {}; -// const legacyMap: any = {}; -// const decodeCodePoint = (arg:any) => ''; -// _parse(); -// function whitespace(c: string) { -// return c === " " || c === "\n" || c === "\t" || c === "\f" || c === "\r"; -// } -// function ifElseState(upper: string, SUCCESS: number, FAILURE: number) { -// var lower = upper.toLowerCase(); -// if (upper === lower) { -// return function(c: string) { -// if (c === lower) { -// _state = SUCCESS; -// } else { -// _state = FAILURE; -// _index--; -// } -// }; -// } else { -// return function(c: string) { -// if (c === lower || c === upper) { -// _state = SUCCESS; -// } else { -// _state = FAILURE; -// _index--; -// } -// }; -// } -// } -// function consumeSpecialNameChar(upper: string, NEXT_STATE: number) { -// var lower = upper.toLowerCase(); -// return function(c: string) { -// if (c === lower || c === upper) { -// _state = NEXT_STATE; -// } else { -// _state = IN_TAG_NAME; -// _index--; //consume the token again -// } -// }; -// } -// function _stateText(c: string) { -// if (c === "<") { -// if (_index > _sectionStart) { -// ontext(_getSection()); -// } -// _state = BEFORE_TAG_NAME; -// _sectionStart = _index; -// } else if ( -// _decodeEntities && -// _special === SPECIAL_NONE && -// c === "&" -// ) { -// if (_index > _sectionStart) { -// ontext(_getSection()); -// } -// _baseState = TEXT; -// _state = BEFORE_ENTITY; -// _sectionStart = _index; -// } -// }; -// function _stateBeforeTagName(c: string) { -// if (c === "/") { -// _state = BEFORE_CLOSING_TAG_NAME; -// } else if (c === "<") { -// ontext(_getSection()); -// _sectionStart = _index; -// } else if (c === ">" || _special !== SPECIAL_NONE || whitespace(c)) { -// _state = TEXT; -// } else if (c === "!") { -// _state = BEFORE_DECLARATION; -// _sectionStart = _index + 1; -// } else if (c === "?") { -// _state = IN_PROCESSING_INSTRUCTION; -// _sectionStart = _index + 1; -// } else { -// _state = -// !_xmlMode && (c === "s" || c === "S") -// ? BEFORE_SPECIAL -// : IN_TAG_NAME; -// _sectionStart = _index; -// } -// }; -// function _stateInTagName(c: string) { -// if (c === "/" || c === ">" || whitespace(c)) { -// _emitToken("onopentagname"); -// _state = BEFORE_ATTRIBUTE_NAME; -// _index--; -// } -// }; -// function _stateBeforeCloseingTagName(c: string) { -// if (whitespace(c)) {} -// else if (c === ">") { -// _state = TEXT; -// } else if (_special !== SPECIAL_NONE) { -// if (c === "s" || c === "S") { -// _state = BEFORE_SPECIAL_END; -// } else { -// _state = TEXT; -// _index--; -// } -// } else { -// _state = IN_CLOSING_TAG_NAME; -// _sectionStart = _index; -// } -// }; -// function _stateInCloseingTagName(c: string) { -// if (c === ">" || whitespace(c)) { -// _emitToken("onclosetag"); -// _state = AFTER_CLOSING_TAG_NAME; -// _index--; -// } -// }; -// function _stateAfterCloseingTagName(c: string) { -// //skip everything until ">" -// if (c === ">") { -// _state = TEXT; -// _sectionStart = _index + 1; -// } -// }; -// function _stateBeforeAttributeName(c: string) { -// if (c === ">") { -// onopentagend(); -// _state = TEXT; -// _sectionStart = _index + 1; -// } else if (c === "/") { -// _state = IN_SELF_CLOSING_TAG; -// } else if (!whitespace(c)) { -// _state = IN_ATTRIBUTE_NAME; -// _sectionStart = _index; -// } -// }; -// function _stateInSelfClosingTag(c: string) { -// if (c === ">") { -// onselfclosingtag(); -// _state = TEXT; -// _sectionStart = _index + 1; -// } else if (!whitespace(c)) { -// _state = BEFORE_ATTRIBUTE_NAME; -// _index--; -// } -// }; -// function _stateInAttributeName(c: string) { -// if (c === "=" || c === "/" || c === ">" || whitespace(c)) { -// _sectionStart = -1; -// _state = AFTER_ATTRIBUTE_NAME; -// _index--; -// } -// }; -// function _stateAfterAttributeName(c: string) { -// if (c === "=") { -// _state = BEFORE_ATTRIBUTE_VALUE; -// } else if (c === "/" || c === ">") { -// _state = BEFORE_ATTRIBUTE_NAME; -// _index--; -// } else if (!whitespace(c)) { -// _state = IN_ATTRIBUTE_NAME; -// _sectionStart = _index; -// } -// }; -// function _stateBeforeAttributeValue(c: string) { -// if (c === '"') { -// _state = IN_ATTRIBUTE_VALUE_DQ; -// _sectionStart = _index + 1; -// } else if (c === "'") { -// _state = IN_ATTRIBUTE_VALUE_SQ; -// _sectionStart = _index + 1; -// } else if (!whitespace(c)) { -// _state = IN_ATTRIBUTE_VALUE_NQ; -// _sectionStart = _index; -// _index--; //reconsume token -// } -// }; -// function _stateInAttributeValueDoubleQuotes(c: string) { -// if (c === '"') { -// _state = BEFORE_ATTRIBUTE_NAME; -// } else if (_decodeEntities && c === "&") { -// _baseState = _state; -// _state = BEFORE_ENTITY; -// _sectionStart = _index; -// } -// }; -// function _stateInAttributeValueSingleQuotes(c: string) { -// if (c === "'") { -// _state = BEFORE_ATTRIBUTE_NAME; -// } else if (_decodeEntities && c === "&") { -// _baseState = _state; -// _state = BEFORE_ENTITY; -// _sectionStart = _index; -// } -// }; -// function _stateInAttributeValueNoQuotes(c: string) { -// if (whitespace(c) || c === ">") { -// _state = BEFORE_ATTRIBUTE_NAME; -// _index--; -// } else if (_decodeEntities && c === "&") { -// _baseState = _state; -// _state = BEFORE_ENTITY; -// _sectionStart = _index; -// } -// }; -// function _stateBeforeDeclaration(c: string) { -// _state = -// c === "[" -// ? BEFORE_CDATA_1 -// : c === "-" -// ? BEFORE_COMMENT -// : IN_DECLARATION; -// }; -// function _stateInDeclaration(c: string) { -// if (c === ">") { -// _state = TEXT; -// _sectionStart = _index + 1; -// } -// }; -// function _stateInProcessingInstruction(c: string) { -// if (c === ">") { -// _state = TEXT; -// _sectionStart = _index + 1; -// } -// }; -// function _stateBeforeComment(c: string) { -// if (c === "-") { -// _state = IN_COMMENT; -// _sectionStart = _index + 1; -// } else { -// _state = IN_DECLARATION; -// } -// }; -// function _stateInComment(c: string) { -// if (c === "-") _state = AFTER_COMMENT_1; -// }; -// function _stateAfterComment1(c: string) { -// if (c === "-") { -// _state = AFTER_COMMENT_2; -// } else { -// _state = IN_COMMENT; -// } -// }; -// function _stateAfterComment2(c: string) { -// if (c === ">") { -// //remove 2 trailing chars -// oncomment( -// _buffer.substring(_sectionStart, _index - 2) -// ); -// _state = TEXT; -// _sectionStart = _index + 1; -// } else if (c !== "-") { -// _state = IN_COMMENT; -// } -// // else: stay in AFTER_COMMENT_2 (`--->`) -// }; -// let _stateBeforeCdata1 = ifElseState( -// "C", -// BEFORE_CDATA_2, -// IN_DECLARATION -// ); -// let _stateBeforeCdata2 = ifElseState( -// "D", -// BEFORE_CDATA_3, -// IN_DECLARATION -// ); -// let _stateBeforeCdata3 = ifElseState( -// "A", -// BEFORE_CDATA_4, -// IN_DECLARATION -// ); -// let _stateBeforeCdata4 = ifElseState( -// "T", -// BEFORE_CDATA_5, -// IN_DECLARATION -// ); -// let _stateBeforeCdata5 = ifElseState( -// "A", -// BEFORE_CDATA_6, -// IN_DECLARATION -// ); -// function _stateBeforeCdata6(c: string) { -// if (c === "[") { -// _state = IN_CDATA; -// _sectionStart = _index + 1; -// } else { -// _state = IN_DECLARATION; -// _index--; -// } -// }; -// function _stateInCdata(c: string) { -// if (c === "]") _state = AFTER_CDATA_1; -// }; -// function _stateAfterCdata1(c: string) { -// if (c === "]") _state = AFTER_CDATA_2; -// else _state = IN_CDATA; -// }; -// function _stateAfterCdata2(c: string) { -// if (c === ">") { -// _state = TEXT; -// _sectionStart = _index + 1; -// } else if (c !== "]") { -// _state = IN_CDATA; -// } -// //else: stay in AFTER_CDATA_2 (`]]]>`) -// }; -// function _stateBeforeSpecial(c: string) { -// if (c === "c" || c === "C") { -// _state = BEFORE_SCRIPT_1; -// } else if (c === "t" || c === "T") { -// _state = BEFORE_STYLE_1; -// } else { -// _state = IN_TAG_NAME; -// _index--; //consume the token again -// } -// }; -// function _stateBeforeSpecialEnd(c: string) { -// if (_special === SPECIAL_SCRIPT && (c === "c" || c === "C")) { -// _state = AFTER_SCRIPT_1; -// } else if (_special === SPECIAL_STYLE && (c === "t" || c === "T")) { -// _state = AFTER_STYLE_1; -// } else _state = TEXT; -// }; -// let _stateBeforeScript1 = consumeSpecialNameChar( -// "R", -// BEFORE_SCRIPT_2 -// ); -// let _stateBeforeScript2 = consumeSpecialNameChar( -// "I", -// BEFORE_SCRIPT_3 -// ); -// let _stateBeforeScript3 = consumeSpecialNameChar( -// "P", -// BEFORE_SCRIPT_4 -// ); -// let _stateBeforeScript4 = consumeSpecialNameChar( -// "T", -// BEFORE_SCRIPT_5 -// ); -// function _stateBeforeScript5(c: string) { -// if (c === "/" || c === ">" || whitespace(c)) { -// _special = SPECIAL_SCRIPT; -// } -// _state = IN_TAG_NAME; -// _index--; //consume the token again -// }; -// let _stateAfterScript1 = ifElseState("R", AFTER_SCRIPT_2, TEXT); -// let _stateAfterScript2 = ifElseState("I", AFTER_SCRIPT_3, TEXT); -// let _stateAfterScript3 = ifElseState("P", AFTER_SCRIPT_4, TEXT); -// let _stateAfterScript4 = ifElseState("T", AFTER_SCRIPT_5, TEXT); -// function _stateAfterScript5(c: string) { -// if (c === ">" || whitespace(c)) { -// _special = SPECIAL_NONE; -// _state = IN_CLOSING_TAG_NAME; -// _sectionStart = _index - 6; -// _index--; //reconsume the token -// } else _state = TEXT; -// }; -// let _stateBeforeStyle1 = consumeSpecialNameChar( -// "Y", -// BEFORE_STYLE_2 -// ); -// let _stateBeforeStyle2 = consumeSpecialNameChar( -// "L", -// BEFORE_STYLE_3 -// ); -// let _stateBeforeStyle3 = consumeSpecialNameChar( -// "E", -// BEFORE_STYLE_4 -// ); -// function _stateBeforeStyle4(c: string) { -// if (c === "/" || c === ">" || whitespace(c)) { -// _special = SPECIAL_STYLE; -// } -// _state = IN_TAG_NAME; -// _index--; //consume the token again -// }; -// let _stateAfterStyle1 = ifElseState("Y", AFTER_STYLE_2, TEXT); -// let _stateAfterStyle2 = ifElseState("L", AFTER_STYLE_3, TEXT); -// let _stateAfterStyle3 = ifElseState("E", AFTER_STYLE_4, TEXT); -// function _stateAfterStyle4(c: string) { -// if (c === ">" || whitespace(c)) { -// _special = SPECIAL_NONE; -// _state = IN_CLOSING_TAG_NAME; -// _sectionStart = _index - 5; -// _index--; //reconsume the token -// } else _state = TEXT; -// }; -// let _stateBeforeEntity = ifElseState( -// "#", -// BEFORE_NUMERIC_ENTITY, -// IN_NAMED_ENTITY -// ); -// let _stateBeforeNumericEntity = ifElseState( -// "X", -// IN_HEX_ENTITY, -// IN_NUMERIC_ENTITY -// ); -// //for entities terminated with a semicolon -// function _parseNamedEntityStrict() { -// // TODO: For this section, use the regex /( | |<|<|>|>|"|"|')/ ??? -// //offset = 1 -// if (_sectionStart + 1 < _index) { -// var entity = _buffer.substring( -// _sectionStart + 1, -// _index -// ), -// map = _xmlMode ? xmlMap : entityMap; -// if (map.hasOwnProperty(entity)) { -// _emitPartial(map[entity]); -// _sectionStart = _index + 1; -// } -// } -// }; -// //parses legacy entities (without trailing semicolon) -// function _parseLegacyEntity() { -// var start = _sectionStart + 1, -// limit = _index - start; -// if (limit > 6) limit = 6; //the max length of legacy entities is 6 -// while (limit >= 2) { -// //the min length of legacy entities is 2 -// var entity = _buffer.substr(start, limit); -// if (legacyMap.hasOwnProperty(entity)) { -// _emitPartial(legacyMap[entity]); -// _sectionStart += limit + 1; -// return; -// } else { -// limit--; -// } -// } -// }; -// function _stateInNamedEntity(c: string) { -// if (c === ";") { -// _parseNamedEntityStrict(); -// if (_sectionStart + 1 < _index && !_xmlMode) { -// _parseLegacyEntity(); -// } -// _state = _baseState; -// } else if ( -// (c < "a" || c > "z") && -// (c < "A" || c > "Z") && -// (c < "0" || c > "9") -// ) { -// if (_xmlMode) {} -// else if (_sectionStart + 1 === _index) {} -// else if (_baseState !== TEXT) { -// if (c !== "=") { -// _parseNamedEntityStrict(); -// } -// } else { -// _parseLegacyEntity(); -// } -// _state = _baseState; -// _index--; -// } -// }; -// function _decodeNumericEntity(offset: number, base: number) { -// var sectionStart = _sectionStart + offset; -// if (sectionStart !== _index) { -// //parse entity -// var entity = _buffer.substring(sectionStart, _index); -// var parsed = parseInt(entity, base); -// _emitPartial(decodeCodePoint(parsed)); -// _sectionStart = _index; -// } else { -// _sectionStart--; -// } -// _state = _baseState; -// }; -// function _stateInNumericEntity(c: string) { -// if (c === ";") { -// _decodeNumericEntity(2, 10); -// _sectionStart++; -// } else if (c < "0" || c > "9") { -// if (!_xmlMode) { -// _decodeNumericEntity(2, 10); -// } else { -// _state = _baseState; -// } -// _index--; -// } -// }; -// function _stateInHexEntity(c: string) { -// if (c === ";") { -// _decodeNumericEntity(3, 16); -// _sectionStart++; -// } else if ( -// (c < "a" || c > "f") && -// (c < "A" || c > "F") && -// (c < "0" || c > "9") -// ) { -// if (!_xmlMode) { -// _decodeNumericEntity(3, 16); -// } else { -// _state = _baseState; -// } -// _index--; -// } -// }; -// function _cleanup() { -// if (_sectionStart < 0) { -// _buffer = ""; -// _index = 0; -// } else { -// if (_state === TEXT) { -// if (_sectionStart !== _index) { -// ontext(_buffer.substr(_sectionStart)); -// } -// _buffer = ""; -// _index = 0; -// } else if (_sectionStart === _index) { -// //the section just started -// _buffer = ""; -// _index = 0; -// } else { -// //remove everything unnecessary -// _buffer = _buffer.substr(_sectionStart); -// _index -= _sectionStart; -// } -// _sectionStart = 0; -// } -// }; -// function _parse() { -// while (_index < _buffer.length) { -// var c = _buffer.charAt(_index); -// if (_state === TEXT) { -// _stateText(c); -// } else if (_state === BEFORE_TAG_NAME) { -// _stateBeforeTagName(c); -// } else if (_state === IN_TAG_NAME) { -// _stateInTagName(c); -// } else if (_state === BEFORE_CLOSING_TAG_NAME) { -// _stateBeforeCloseingTagName(c); -// } else if (_state === IN_CLOSING_TAG_NAME) { -// _stateInCloseingTagName(c); -// } else if (_state === AFTER_CLOSING_TAG_NAME) { -// _stateAfterCloseingTagName(c); -// } else if (_state === IN_SELF_CLOSING_TAG) { -// _stateInSelfClosingTag(c); -// } else if (_state === BEFORE_ATTRIBUTE_NAME) { -// /* -// * attributes -// */ -// _stateBeforeAttributeName(c); -// } else if (_state === IN_ATTRIBUTE_NAME) { -// _stateInAttributeName(c); -// } else if (_state === AFTER_ATTRIBUTE_NAME) { -// _stateAfterAttributeName(c); -// } else if (_state === BEFORE_ATTRIBUTE_VALUE) { -// _stateBeforeAttributeValue(c); -// } else if (_state === IN_ATTRIBUTE_VALUE_DQ) { -// _stateInAttributeValueDoubleQuotes(c); -// } else if (_state === IN_ATTRIBUTE_VALUE_SQ) { -// _stateInAttributeValueSingleQuotes(c); -// } else if (_state === IN_ATTRIBUTE_VALUE_NQ) { -// _stateInAttributeValueNoQuotes(c); -// } else if (_state === BEFORE_DECLARATION) { -// /* -// * declarations -// */ -// _stateBeforeDeclaration(c); -// } else if (_state === IN_DECLARATION) { -// _stateInDeclaration(c); -// } else if (_state === IN_PROCESSING_INSTRUCTION) { -// /* -// * processing instructions -// */ -// _stateInProcessingInstruction(c); -// } else if (_state === BEFORE_COMMENT) { -// /* -// * comments -// */ -// _stateBeforeComment(c); -// } else if (_state === IN_COMMENT) { -// _stateInComment(c); -// } else if (_state === AFTER_COMMENT_1) { -// _stateAfterComment1(c); -// } else if (_state === AFTER_COMMENT_2) { -// _stateAfterComment2(c); -// } else if (_state === BEFORE_CDATA_1) { -// /* -// * cdata -// */ -// _stateBeforeCdata1(c); -// } else if (_state === BEFORE_CDATA_2) { -// _stateBeforeCdata2(c); -// } else if (_state === BEFORE_CDATA_3) { -// _stateBeforeCdata3(c); -// } else if (_state === BEFORE_CDATA_4) { -// _stateBeforeCdata4(c); -// } else if (_state === BEFORE_CDATA_5) { -// _stateBeforeCdata5(c); -// } else if (_state === BEFORE_CDATA_6) { -// _stateBeforeCdata6(c); -// } else if (_state === IN_CDATA) { -// _stateInCdata(c); -// } else if (_state === AFTER_CDATA_1) { -// _stateAfterCdata1(c); -// } else if (_state === AFTER_CDATA_2) { -// _stateAfterCdata2(c); -// } else if (_state === BEFORE_SPECIAL) { -// /* -// * special tags -// */ -// _stateBeforeSpecial(c); -// } else if (_state === BEFORE_SPECIAL_END) { -// _stateBeforeSpecialEnd(c); -// } else if (_state === BEFORE_SCRIPT_1) { -// /* -// * script -// */ -// _stateBeforeScript1(c); -// } else if (_state === BEFORE_SCRIPT_2) { -// _stateBeforeScript2(c); -// } else if (_state === BEFORE_SCRIPT_3) { -// _stateBeforeScript3(c); -// } else if (_state === BEFORE_SCRIPT_4) { -// _stateBeforeScript4(c); -// } else if (_state === BEFORE_SCRIPT_5) { -// _stateBeforeScript5(c); -// } else if (_state === AFTER_SCRIPT_1) { -// _stateAfterScript1(c); -// } else if (_state === AFTER_SCRIPT_2) { -// _stateAfterScript2(c); -// } else if (_state === AFTER_SCRIPT_3) { -// _stateAfterScript3(c); -// } else if (_state === AFTER_SCRIPT_4) { -// _stateAfterScript4(c); -// } else if (_state === AFTER_SCRIPT_5) { -// _stateAfterScript5(c); -// } else if (_state === BEFORE_STYLE_1) { -// /* -// * style -// */ -// _stateBeforeStyle1(c); -// } else if (_state === BEFORE_STYLE_2) { -// _stateBeforeStyle2(c); -// } else if (_state === BEFORE_STYLE_3) { -// _stateBeforeStyle3(c); -// } else if (_state === BEFORE_STYLE_4) { -// _stateBeforeStyle4(c); -// } else if (_state === AFTER_STYLE_1) { -// _stateAfterStyle1(c); -// } else if (_state === AFTER_STYLE_2) { -// _stateAfterStyle2(c); -// } else if (_state === AFTER_STYLE_3) { -// _stateAfterStyle3(c); -// } else if (_state === AFTER_STYLE_4) { -// _stateAfterStyle4(c); -// } else if (_state === BEFORE_ENTITY) { -// /* -// * entities -// */ -// _stateBeforeEntity(c); -// } else if (_state === BEFORE_NUMERIC_ENTITY) { -// _stateBeforeNumericEntity(c); -// } else if (_state === IN_NAMED_ENTITY) { -// _stateInNamedEntity(c); -// } else if (_state === IN_NUMERIC_ENTITY) { -// _stateInNumericEntity(c); -// } else if (_state === IN_HEX_ENTITY) { -// _stateInHexEntity(c); -// } else { -// onerror(new Error("unknown _state"), _state); -// } -// _index++; -// } -// _cleanup(); -// }; -// function _getSection() { -// return _buffer.substring(_sectionStart, _index); -// }; -// function _emitToken(name: 'onopentagname' | 'onclosetag') { -// if( name === 'onopentagname' ) { -// onopentagname(_getSection()); -// } else if( name === 'onclosetag' ) { -// onclosetag(_getSection()); -// } -// _sectionStart = -1; -// }; -// function _emitPartial(value: string) { -// if (_baseState !== TEXT) { -// //_cbs.onattribdata(value); //TODO implement the new event -// } else { -// ontext(value); -// } -// }; -// } - -//# sourceMappingURL=tokenizer.js.map -- - diff --git a/docs/api/source/url-match-validator.html b/docs/api/source/url-match-validator.html index 44c18a83..2ae8d902 100644 --- a/docs/api/source/url-match-validator.html +++ b/docs/api/source/url-match-validator.html @@ -44,6 +44,7 @@ * However, URL matches with a protocol will be allowed (ex: 'http://localhost') * 2) URL matches which do not have at least one word character in the * domain name (effectively skipping over matches like "git:1.0"). + * However, URL matches with a protocol will be allowed (ex: 'intra-net://271219.76') * 3) A protocol-relative url match (a URL beginning with '//') whose * previous character is a word character (effectively skipping over * strings like "abc//google.com") @@ -120,8 +121,10 @@ return (!!urlMatch && (!protocolUrlMatch || !this.hasFullProtocolRegex.test(protocolUrlMatch)) && urlMatch.indexOf('.') === -1); }; /** - * Determines if a URL match does not have at least one word character after - * the protocol (i.e. in the domain name). + * Determines if a URL match does not have either: + * + * a) a full protocol (i.e. 'http://'), or + * b) at least one word character after the protocol (i.e. in the domain name) * * At least one letter character must exist in the domain name after a * protocol match. Ex: skip over something like "git:1.0" @@ -133,12 +136,12 @@ * match. Ex: 'http://yahoo.com'. This is used to know whether or not we * have a protocol in the URL string, in order to check for a word * character after the protocol separator (':'). - * @return {Boolean} `true` if the URL match does not have at least one word - * character in it after the protocol, `false` otherwise. + * @return {Boolean} `true` if the URL match does not have a full protocol, or + * at least one word character in it, `false` otherwise. */ UrlMatchValidator.urlMatchDoesNotHaveAtLeastOneWordChar = function (urlMatch, protocolUrlMatch) { if (urlMatch && protocolUrlMatch) { - return !this.hasWordCharAfterProtocolRegex.test(urlMatch); + return !this.hasFullProtocolRegex.test(protocolUrlMatch) && !this.hasWordCharAfterProtocolRegex.test(urlMatch); } else { return false; diff --git a/docs/api/source/url-matcher-old.html b/docs/api/source/url-matcher-old.html deleted file mode 100644 index 1effaed8..00000000 --- a/docs/api/source/url-matcher-old.html +++ /dev/null @@ -1,294 +0,0 @@ - - - - -
"use strict"; -Object.defineProperty(exports, "__esModule", { value: true }); -var tslib_1 = require("tslib"); -var matcher_1 = require("./matcher"); -var regex_lib_1 = require("../regex-lib"); -var tld_regex_1 = require("./tld-regex"); -var url_match_1 = require("../match/url-match"); -var url_match_validator_1 = require("./url-match-validator"); -/** - * @class Autolinker.matcher.Url - * @extends Autolinker.matcher.Matcher - * - * Matcher to find URL matches in an input string. - * - * See this class's superclass ({@link Autolinker.matcher.Matcher}) for more details. - */ -var UrlMatcher = (function (_super) { - tslib_1.__extends(UrlMatcher, _super); - /** - * @method constructor - * @param {Object} cfg The configuration properties for the Match instance, - * specified in an Object (map). - */ - function UrlMatcher(cfg) { - var _this = _super.call(this, cfg) || this; - /** - * @cfg {Object} stripPrefix (required) - * - * The Object form of {@link Autolinker#cfg-stripPrefix}. - */ - _this.stripPrefix = { scheme: true, www: true }; // default value just to get the above doc comment in the ES5 output and documentation generator - /** - * @cfg {Boolean} stripTrailingSlash (required) - * @inheritdoc Autolinker#stripTrailingSlash - */ - _this.stripTrailingSlash = true; // default value just to get the above doc comment in the ES5 output and documentation generator - /** - * @cfg {Boolean} decodePercentEncoding (required) - * @inheritdoc Autolinker#decodePercentEncoding - */ - _this.decodePercentEncoding = true; // default value just to get the above doc comment in the ES5 output and documentation generator - /** - * @protected - * @property {RegExp} matcherRegex - * - * The regular expression to match URLs with an optional scheme, port - * number, path, query string, and hash anchor. - * - * Example matches: - * - * http://google.com - * www.google.com - * google.com/path/to/file?q1=1&q2=2#myAnchor - * - * - * This regular expression will have the following capturing groups: - * - * 1. Group that matches a scheme-prefixed URL (i.e. 'http://google.com'). - * This is used to match scheme URLs with just a single word, such as - * 'http://localhost', where we won't double check that the domain name - * has at least one dot ('.') in it. - * 2. Group that matches a 'www.' prefixed URL. This is only matched if the - * 'www.' text was not prefixed by a scheme (i.e.: not prefixed by - * 'http://', 'ftp:', etc.) - * 3. A protocol-relative ('//') match for the case of a 'www.' prefixed - * URL. Will be an empty string if it is not a protocol-relative match. - * We need to know the character before the '//' in order to determine - * if it is a valid match or the // was in a string we don't want to - * auto-link. - * 4. Group that matches a known TLD (top level domain), when a scheme - * or 'www.'-prefixed domain is not matched. - * 5. A protocol-relative ('//') match for the case of a known TLD prefixed - * URL. Will be an empty string if it is not a protocol-relative match. - * See #3 for more info. - */ - _this.matcherRegex = (function () { - var schemeRegex = /(?:[A-Za-z][-.+A-Za-z0-9]{0,63}:(?![A-Za-z][-.+A-Za-z0-9]{0,63}:\/\/)(?!\d+\/?)(?:\/\/)?)/, // match protocol, allow in format "http://" or "mailto:". However, do not match the first part of something like 'link:http://www.google.com' (i.e. don't match "link:"). Also, make sure we don't interpret 'google.com:8000' as if 'google.com' was a protocol here (i.e. ignore a trailing port number in this regex) - wwwRegex = /(?:www\.)/, // starting with 'www.' - // Allow optional path, query string, and hash anchor, not ending in the following characters: "?!:,.;" - // http://blog.codinghorror.com/the-problem-with-urls/ - urlSuffixRegex = new RegExp('[/?#](?:[' + regex_lib_1.alphaNumericAndMarksCharsStr + '\\-+&@#/%=~_()|\'$*\\[\\]?!:,.;\u2713]*[' + regex_lib_1.alphaNumericAndMarksCharsStr + '\\-+&@#/%=~_()|\'$*\\[\\]\u2713])?'); - return new RegExp([ - '(?:', - '(', - schemeRegex.source, - regex_lib_1.getDomainNameStr(2), - ')', - '|', - '(', - '(//)?', - wwwRegex.source, - regex_lib_1.getDomainNameStr(6), - ')', - '|', - '(', - '(//)?', - regex_lib_1.getDomainNameStr(10) + '\\.', - tld_regex_1.tldRegex.source, - '(?![-' + regex_lib_1.alphaNumericCharsStr + '])', - ')', - ')', - '(?::[0-9]+)?', - '(?:' + urlSuffixRegex.source + ')?' // match for path, query string, and/or hash anchor - optional - ].join(""), 'gi'); - })(); - /** - * A regular expression to use to check the character before a protocol-relative - * URL match. We don't want to match a protocol-relative URL if it is part - * of another word. - * - * For example, we want to match something like "Go to: //google.com", - * but we don't want to match something like "abc//google.com" - * - * This regular expression is used to test the character before the '//'. - * - * @protected - * @type {RegExp} wordCharRegExp - */ - _this.wordCharRegExp = new RegExp('[' + regex_lib_1.alphaNumericAndMarksCharsStr + ']'); - /** - * The regular expression to match opening parenthesis in a URL match. - * - * This is to determine if we have unbalanced parenthesis in the URL, and to - * drop the final parenthesis that was matched if so. - * - * Ex: The text "(check out: wikipedia.com/something_(disambiguation))" - * should only autolink the inner "wikipedia.com/something_(disambiguation)" - * part, so if we find that we have unbalanced parenthesis, we will drop the - * last one for the match. - * - * @protected - * @property {RegExp} - */ - _this.openParensRe = /\(/g; - /** - * The regular expression to match closing parenthesis in a URL match. See - * {@link #openParensRe} for more information. - * - * @protected - * @property {RegExp} - */ - _this.closeParensRe = /\)/g; - _this.stripPrefix = cfg.stripPrefix; - _this.stripTrailingSlash = cfg.stripTrailingSlash; - _this.decodePercentEncoding = cfg.decodePercentEncoding; - return _this; - } - /** - * @inheritdoc - */ - UrlMatcher.prototype.parseMatches = function (text) { - var matcherRegex = this.matcherRegex, stripPrefix = this.stripPrefix, stripTrailingSlash = this.stripTrailingSlash, decodePercentEncoding = this.decodePercentEncoding, tagBuilder = this.tagBuilder, matches = [], match; - console.log(matcherRegex); - while ((match = matcherRegex.exec(text)) !== null) { - var matchStr = match[0], schemeUrlMatch = match[1], wwwUrlMatch = match[4], wwwProtocolRelativeMatch = match[5], - //tldUrlMatch = match[ 8 ], -- not needed at the moment - tldProtocolRelativeMatch = match[9], offset = match.index, protocolRelativeMatch = wwwProtocolRelativeMatch || tldProtocolRelativeMatch, prevChar = text.charAt(offset - 1); - if (!url_match_validator_1.UrlMatchValidator.isValid(matchStr, schemeUrlMatch)) { - continue; - } - // If the match is preceded by an '@' character, then it is either - // an email address or a username. Skip these types of matches. - if (offset > 0 && prevChar === '@') { - continue; - } - // If it's a protocol-relative '//' match, but the character before the '//' - // was a word character (i.e. a letter/number), then we found the '//' in the - // middle of another word (such as "asdf//asdf.com"). In this case, skip the - // match. - if (offset > 0 && protocolRelativeMatch && this.wordCharRegExp.test(prevChar)) { - continue; - } - if (/\?$/.test(matchStr)) { - matchStr = matchStr.substr(0, matchStr.length - 1); - } - // Handle a closing parenthesis at the end of the match, and exclude - // it if there is not a matching open parenthesis in the match - // itself. - if (this.matchHasUnbalancedClosingParen(matchStr)) { - matchStr = matchStr.substr(0, matchStr.length - 1); // remove the trailing ")" - } - else { - // Handle an invalid character after the TLD - var pos = this.matchHasInvalidCharAfterTld(matchStr, schemeUrlMatch); - if (pos > -1) { - matchStr = matchStr.substr(0, pos); // remove the trailing invalid chars - } - } - var urlMatchType = schemeUrlMatch ? 'scheme' : (wwwUrlMatch ? 'www' : 'tld'), protocolUrlMatch = !!schemeUrlMatch; - matches.push(new url_match_1.UrlMatch({ - tagBuilder: tagBuilder, - matchedText: matchStr, - offset: offset, - urlMatchType: urlMatchType, - url: matchStr, - protocolUrlMatch: protocolUrlMatch, - protocolRelativeMatch: !!protocolRelativeMatch, - stripPrefix: stripPrefix, - stripTrailingSlash: stripTrailingSlash, - decodePercentEncoding: decodePercentEncoding, - })); - } - return matches; - }; - /** - * Determines if a match found has an unmatched closing parenthesis. If so, - * this parenthesis will be removed from the match itself, and appended - * after the generated anchor tag. - * - * A match may have an extra closing parenthesis at the end of the match - * because the regular expression must include parenthesis for URLs such as - * "wikipedia.com/something_(disambiguation)", which should be auto-linked. - * - * However, an extra parenthesis *will* be included when the URL itself is - * wrapped in parenthesis, such as in the case of "(wikipedia.com/something_(disambiguation))". - * In this case, the last closing parenthesis should *not* be part of the - * URL itself, and this method will return `true`. - * - * @protected - * @param {String} matchStr The full match string from the {@link #matcherRegex}. - * @return {Boolean} `true` if there is an unbalanced closing parenthesis at - * the end of the `matchStr`, `false` otherwise. - */ - UrlMatcher.prototype.matchHasUnbalancedClosingParen = function (matchStr) { - var lastChar = matchStr.charAt(matchStr.length - 1); - if (lastChar === ')') { - var openParensMatch = matchStr.match(this.openParensRe), closeParensMatch = matchStr.match(this.closeParensRe), numOpenParens = (openParensMatch && openParensMatch.length) || 0, numCloseParens = (closeParensMatch && closeParensMatch.length) || 0; - if (numOpenParens < numCloseParens) { - return true; - } - } - return false; - }; - /** - * Determine if there's an invalid character after the TLD in a URL. Valid - * characters after TLD are ':/?#'. Exclude scheme matched URLs from this - * check. - * - * @protected - * @param {String} urlMatch The matched URL, if there was one. Will be an - * empty string if the match is not a URL match. - * @param {String} schemeUrlMatch The match URL string for a scheme - * match. Ex: 'http://yahoo.com'. This is used to match something like - * 'http://localhost', where we won't double check that the domain name - * has at least one '.' in it. - * @return {Number} the position where the invalid character was found. If - * no such character was found, returns -1 - */ - UrlMatcher.prototype.matchHasInvalidCharAfterTld = function (urlMatch, schemeUrlMatch) { - if (!urlMatch) { - return -1; - } - var offset = 0; - if (schemeUrlMatch) { - offset = urlMatch.indexOf(':'); - urlMatch = urlMatch.slice(offset); - } - var re = new RegExp("^((.?\/\/)?[-." + regex_lib_1.alphaNumericAndMarksCharsStr + "]*[-" + regex_lib_1.alphaNumericAndMarksCharsStr + "]\\.[-" + regex_lib_1.alphaNumericAndMarksCharsStr + "]+)"); - var res = re.exec(urlMatch); - if (res === null) { - return -1; - } - offset += res[1].length; - urlMatch = urlMatch.slice(res[1].length); - if (/^[^-.A-Za-z0-9:\/?#]/.test(urlMatch)) { - return offset; - } - return -1; - }; - return UrlMatcher; -}(matcher_1.Matcher)); -exports.UrlMatcher = UrlMatcher; - -//# sourceMappingURL=url-matcher-old.js.map -- - diff --git a/docs/dist/Autolinker.js b/docs/dist/Autolinker.js index e8975eb3..4bb8810e 100644 --- a/docs/dist/Autolinker.js +++ b/docs/dist/Autolinker.js @@ -2112,6 +2112,7 @@ * However, URL matches with a protocol will be allowed (ex: 'http://localhost') * 2) URL matches which do not have at least one word character in the * domain name (effectively skipping over matches like "git:1.0"). + * However, URL matches with a protocol will be allowed (ex: 'intra-net://271219.76') * 3) A protocol-relative url match (a URL beginning with '//') whose * previous character is a word character (effectively skipping over * strings like "abc//google.com") @@ -2188,8 +2189,10 @@ return (!!urlMatch && (!protocolUrlMatch || !this.hasFullProtocolRegex.test(protocolUrlMatch)) && urlMatch.indexOf('.') === -1); }; /** - * Determines if a URL match does not have at least one word character after - * the protocol (i.e. in the domain name). + * Determines if a URL match does not have either: + * + * a) a full protocol (i.e. 'http://'), or + * b) at least one word character after the protocol (i.e. in the domain name) * * At least one letter character must exist in the domain name after a * protocol match. Ex: skip over something like "git:1.0" @@ -2201,12 +2204,12 @@ * match. Ex: 'http://yahoo.com'. This is used to know whether or not we * have a protocol in the URL string, in order to check for a word * character after the protocol separator (':'). - * @return {Boolean} `true` if the URL match does not have at least one word - * character in it after the protocol, `false` otherwise. + * @return {Boolean} `true` if the URL match does not have a full protocol, or + * at least one word character in it, `false` otherwise. */ UrlMatchValidator.urlMatchDoesNotHaveAtLeastOneWordChar = function (urlMatch, protocolUrlMatch) { if (urlMatch && protocolUrlMatch) { - return !this.hasWordCharAfterProtocolRegex.test(urlMatch); + return !this.hasFullProtocolRegex.test(protocolUrlMatch) && !this.hasWordCharAfterProtocolRegex.test(urlMatch); } else { return false; diff --git a/docs/dist/Autolinker.min.js b/docs/dist/Autolinker.min.js index 40883392..e456105c 100644 --- a/docs/dist/Autolinker.min.js +++ b/docs/dist/Autolinker.min.js @@ -7,5 +7,5 @@ * * https://github.com/gregjacobs/Autolinker.js */ -!function(e,t){"object"==typeof exports&&"undefined"!=typeof module?module.exports=t():"function"==typeof define&&define.amd?define(t):(e=e||self).Autolinker=t()}(this,function(){"use strict";function s(e,t){if(Array.prototype.indexOf)return e.indexOf(t);for(var u=0,n=e.length;u