No Slide Title
Transcript
No Slide Title
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
MSc in Communication Sciences 2010-11
Program in Technologies for Human Communication
Davide Eynard
Software Technology 2
02 - Regular expressions 2
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
2
Javascript basics
To test regular expressions in javascript, you need to know at
least some basic notions of this programming language:
Printing information
Variable (scalar and arrays) declaration and assignment
Conditions (if)
Loops (while, for)
Objects and method/function calls
How can you test your code?
Veeery easy examples: using the browser address bar or a
bookmarklet
More complex ones (with many lines of code): use a
development environment such as the one at
https://www.squarefree.com/bookmarklets/webdevel.html
(search for the “jsenv” bookmarklet)
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
3
Javascript Regular Expressions
Using regular expressions in Javascript usually means
performing the following steps:
Choose which text you want to parse (the regexp is always
applied to a text string!)
Define a regular expression to match/extract/substitute
text within the chosen string (see previous lesson)
Apply the correct methods to perform the desired
operation (whether it is matching, extraction, or
substitution):
Methods connected to the “RegExp” object
Methods connected to the “String” object
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
4
Defining a regular expression/1
To define a regular expression you can simply assign it to a
variable:
var varName = /PATTERN/[g|i|m];
Examples:
var re = /ab+c/;
var homerschild = /(Bart|Lisa|Maggie) Simpson/i;
var divcontent = /<div>(.*?)<\/div>/gi;
^note the escaping “\”!!!
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
5
Defining a regular expression/2
Or, you can explicitly define it as an instance of the RegExp
object:
var varName = new RegExp("PATTERN", "[g|i|m]");
Examples:
var re = new RegExp("ab+c");
var homerschild = new RegExp(
» "(Bart|Lisa|Maggie) Simpson", "i");
var txt = new RegExp("<div>(.*?)</div>", "gi");
Note that the escaping for “/” is not needed in ^this case...
However, escaping is needed if a backslash is already present
in the regexp!
re = /\w+\s/g;
becomes
re = new RegExp("\\w+\\s", "g");
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
And now?
6
Which notation should we use?
Implicit (simple)
when you know the
regexp in advance
when you are not
interested in
performance
when you don't know
how to deal with objects
Explicit (object declaration)
when you define the
regexp at runtime
when you need a faster
execution
When you know how to
deal with objects
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
7
RegExp:
test
exec
compile
String:
match
search
replace
split
RegExp and String methods
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
8
RegExp “test” method
What does it do?
The “test” method just checks if a pattern exists within a
string. It returns true if so, and false otherwise
Usage:
regexp.test(str);
Where:
regexp is the name of a regular expression variable
str is the string against which we want to match the
regular expression
Example (run it on Google News...):
var re=/Grande Fratello/i;
var s=document.documentElement.innerHTML;
if(re.test(s)){
alert("This is a Big Brother day!");
}else{
alert("No Big Brother today!");
}
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
9
RegExp “exec” method/1
What does it do?
The “exec” method searches for matches inside a given
string. If matches are found, they are returned into an
array (otherwise the method returns null)
Usage: array = regexp.exec(str);
Where:
regexp is the name of a regular expression
str is the string against which to match the regular
expression
Example (on Facebook friends phone list):
var re = new RegExp ("<div class=\"fsl fwb fcb\">.*?<a href=\"[^\"]
+\">([^<]+)<.*?<div class=\"fsl\">([^<]+)<span class=\"pls fss
fcg\">([^<]+)</span>", "gi");
(NOTE: the previous three lines are actually one!!!)
content = document.documentElement.innerHTML;
while (array = re.exec(content)){
print(array[1]+";"+array[2]+";"+"\n");
}
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
RegExp “exec” method/2
10
The returned array has a particular format
index is the zero-based index of the match in the string
input is the original string
[0] is the portion of the string that was matched last
[1], [2], ..., [n] are the parenthesized substring matches
(if they exist)
Example:
var re = /a(b*)c/;
var str = "ccabcabbcbac";
var array = re.exec(str);
print(array.index);
print(array.input);
print(array[0]);
print(array[1]);
//
//
//
//
prints
prints
prints
prints
“2”
str
“abc”
“b”
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
RegExp “exec” method/3
11
Given that the exec method returns null if no match is
found, it can be used inside a loop to match a regexp many
times inside a document
Example:
var re = /a(b*)c/g;
var str = "ccabcabbcbac";
while (array = re.exec(str)){
print(array.index);
print(array.input);
print(array[0]);
print(array[1]);
}
// note the “g” here
//
//
//
//
prints
prints
prints
prints
“2”,”5”,”10”
str
“abc”,”abc”,”ac”
“b”,”bb”,””
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
12
RegExp “compile” method
What does it do?
The “compile” method converts (compiles) the specified
pattern into its internal format. The result is a faster
execution
Usage:
regexp.compile("PATTERN", "[g|i|m]");
Where:
regexp is the name of a regular expression
PATTERN is the text of the regular expression
Example:
var re = new RegExp();
re.compile("c*ba", "i");
var str = "abcabcbac";
var array = re.exec(str);
print(array);
// now matches c*ba
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
String “match” method
13
What does it do?
The “match” method is the same as exec, but its object is
a string (and requires a regexp as a parameter)
NOTE: for global matching and loops, use exec instead:
the string match method does not support it
Usage:
str.match(regexp)
like: regexp.exec(str)
Where:
str is the string against which to match the regular
expression
regexp is the name of a regular expression
Example:
var re = /a(b*)c/;
var str = "ccabcabcbac";
var array = str.match(re);
print(array.index);
print(array.input);
print(array[0]);
print(array[1]);
//
//
//
//
//
only change here
prints “2”
prints str
prints “abc”
prints “b”
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
14
String “search” method
What does it do?
The “search” method is the same as test, but its object is
a string (and requires a regexp as a parameter)
Usage:
str.search(regexp)
like:regexp.test(str)
Where:
str is the string against which to match the regular
expression
regexp is the name of a regular expression
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
String “replace” method/1
15
What does it do?
The “replace” method
Usage:
newstr = str.replace(regexp, replaceStr)
Where:
str is the string against which to match the regular
expression
regexp is the name of a regular expression
replaceStr is a string describing how the substitution has
to be made
Example:
var re = /a(b*)c/;
var str = "ccabcabcbac";
var newstr = str.replace(re, "xxx");
print(newstr); // prints "ccxxxabcbac";
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
16
String “replace” method/2
NOTE: replaceStr can contain placeholders to use the matched
substrings inside it
Example:
var re = /(\w+)\s(\w+)/g;
var str = "Jack Brown; Bob White; Jeff Green";
var newstr = str.replace(re, "$2,$1");
print (newstr);
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
17
String “split” method
What does it do?
The “split” method scans a string for delimiters and splits
the string into a list of substrings, returning the resulting
list in the form of an array
Usage:
str.split(regexp)
Where:
str is the string against which to match the regular
expression
regexp is the name of a regular expression
Example:
var re = /;/;
var str = "Jack Brown; Bob White; Jeff Green";
var array = str.split(re);
print (array[0]);
// prints “Jack Brown”
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
18
References
Some Web references:
http://www.regular-expressions.info/javascript.html
https://developer.mozilla.org/en/Core_JavaScript_1.5_Guide/Regular_Expressions
http://www.javascriptref.com/examples/ch08-ed2/index.htm
Some tools:
https://www.squarefree.com/bookmarklets, and in particular the “jsenv”
bookmarklet
Installation instructions:
Connect to https://www.squarefree.com/bookmarklets/webdevel.html
Drag the “jsenv” button from the Web page to your bookmarks bar/folder
Just click on the link within your bookmarks to open the environment
Note: the tool works on the current Web page, so if you want it to run on
another page just close it, open the new page, and then click on the
bookmarklet again.
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
19
Exercises
The following is a regular expression that we created and
tested during the lesson
Write a regexp which matches (and is able to extract) the
URL and the text connected with an anchor tag
Example string to parse:
<a href="http://blablabla">Click here</a>
RegExp:
<a href="([^"]+)">([^<]+)</a>
Wrong RegExp:
<a href="([^"]+)">(.[^<]+)</a>
(note the dot!)
The version with the dot matches the example string
correctly, however it also matches empty anchors like:
<a href="http://blablabla"></a> (add whatever here)</a>