forked from dlang/dlang.org
-
Notifications
You must be signed in to change notification settings - Fork 0
/
regular-expression.dd
187 lines (146 loc) · 4.43 KB
/
regular-expression.dd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
Ddoc
$(D_S Regular Expressions,
$(P Regular expressions are a powerful tool for
pattern matching on strings of text. They
are built in to the core of languages like Perl,
Ruby, and Javascript. Perl and Ruby are particulary
reknowned for adroitly handling regular expressions.
So why aren't they part of the D core language?
Read on and see how they're done in D compared with Ruby.
)
$(P This article explains how to use regular expressions
in D. It doesn't explain regular expressions themselves,
after all, people have written entire books on that topic.
D's specific implementation of regular expressions
is entirely contained in the Phobos library module
$(LINK2 phobos/std_regexp.html, std.regexp).
For a more advanced treatment of using regular expressions
in conjuction with template metaprogramming, see
$(LINK2 templates-revisited.html, Templates Revisited).
)
$(P In Ruby a regular expression can be created
as a special literal:
)
$(RUBY
r = /pattern/
s = /p[1-5]\s*/
)
$(P D doesn't have special literals for them, but they can
be created:)
---
r = RegExp("pattern");
s = RegExp(r"p[1-5]\s*");
---
$(P If the $(I pattern) contains backslash characters \,
wysiwyg string literals are used, which have the $(SINGLEQUOTE r) prefix
to the string. $(I r) and $(I s) are of type $(B RegExp), but
we can use type inference to declare and assign them automatically:
)
---
auto r = RegExp("pattern");
auto s = RegExp(r"p[1-5]\s*");
---
$(P To check for a match of a string $(I s) with a regular expression
in Ruby, use the =~ operator, which returns the index of the
first match:)
$(RUBY
s = "abcabcabab"
s =~ /b/ /* match, returns 1 */
s =~ /f/ /* no match, returns nil */
)
$(P In D this looks like:
)
---
auto s = "abcabcabab";
std.regexp.find(s, "b"); /* match, returns 1 */
std.regexp.find(s, "f"); /* no match, returns -1 */
---
$(P Note the equivalence to std.string.find, which searches for
substring matches rather than regular expression matches.)
$(P The Ruby =~ operator sets some implicitly defined variables
based on the result:)
$(RUBY
s = "abcdef"
if s =~ /c/
"#{$`}[#{$&}]#{$'}" /* generates string ab[c]def
)
$(P The function std.regexp.search() returns a RegExp object
describing the match, which can be exploited:
)
---
auto m = std.regexp.search("abcdef", "c");
if (m)
writefln("%s[%s]%s", m.pre, m.match(0), m.post);
---
$(P Or even more concisely as:
)
---
if (auto m = std.regexp.search("abcdef", "c"))
writefln("%s[%s]%s", m.pre, m.match(0), m.post); // writes ab[c]def
---
<h2>Search and Replace</h2>
$(P Search and replace gets more interesting. To replace the
occurrences of "a" with "ZZ" in Ruby; the first occurrence, then
all:
)
$(RUBY
s = "Strap a rocket engine on a chicken."
s.sub(/a/, "ZZ") // result: StrZZp a rocket engine on a chicken.
s.gsub(/a/, "ZZ") // result: StrZZp ZZ rocket engine on ZZ chicken.
)
$(P In D:)
---
s = "Strap a rocket engine on a chicken.";
sub(s, "a", "ZZ"); // result: StrZZp a rocket engine on a chicken.
sub(s, "a", "ZZ", "g"); // result: StrZZp ZZ rocket engine on ZZ chicken.
---
$(P The replacement string can reference the matches using
the $&, $$, $', $`, $0 .. $99 notation:)
---
sub(s, "[ar]", "[$&]", "g"); // result: St[r][a]p [a] [r]ocket engine on [a] chicken.
---
$(P Or the replacement string can be provided by a delegate:)
---
sub(s, "[ar]",
(RegExp m) { return toupper(m.match(0)); },
"g"); // result: StRAp A Rocket engine on A chicken.
---
($(TT toupper()) comes from $(LINK2 phobos/std_string.html, std.string).)
<h2>Looping</h2>
$(P It's possible to search over all matches within
a string:)
---
import std.stdio;
import std.regexp;
void main()
{
foreach(m; RegExp("ab").search("abcabcabab"))
{
writefln("%s[%s]%s", m.pre, m.match(0), m.post);
}
}
// Prints:
// [ab]cabcabab
// abc[ab]cabab
// abcabc[ab]ab
// abcabcab[ab]
---
<h2>Conclusion</h2>
$(P D regular expression handling is as powerful as Ruby's. But
its syntax isn't as concise:)
$(UL
$(LI Regular expression literal syntax - doing so would
make it impossible to perform lexical analysis without also
doing syntactic or semantic analysis.)
$(LI Implicit naming of match variables - this causes problems
with name collisions, and just doesn't
fit with the rest of the way D works.)
)
$(P But it is just as powerful.
)
)
Macros:
TITLE=Regular Expressions
WIKI=RegularExpression
CATEGORY_ARTICLES=$0
RUBY=$(CCODE $0)