Sometimes, you need to know how many times a particular token occurs in a given string.
For example, you may need to know how many times the word ‘California’ is used in a given
block of text. Or you may want to know how many e-mail addresses exist in a given string.
The hard way to find this out would be to set up some kind of loop that goes through the
string character by character or line by line, looking for (and tallying) occurrences of a
given token. A far easier thing to do is use the match() method of JavaScript’s
String object. The match() method is extremely powerful for this, because first of
all, it lets you search on a regular expression; and secondly, it returns all the ‘match
substrings’ found, in an array. An example will make this clear.
Suppose you have the following String and you want to count the number of e-mail
addresses in it:
var str = 'kt@acroforms.com, jon2456@aol.com, ed@ed.com';var matches =
str.match(/@/g); // try to match '@'var count = matches.length; // this is how many!
If one can make the (sometimes safe) assumption that ‘and’ signs (@) only occur inside
e-mail addresses, then we can simply do a search on @, as in str.match(/@/g). Note
the lowercase ‘g’, which means to apply the search globally to the entire string. This is
important, because if you fail to include this parameter, the search will end after the
first successful match. That’s not what we want. We want ALL matches to be found. Always
remember that the match() function returns an Array: the Array of matches!
Obviously, the length of this Array will be equal to the number of matches found.
A purist would scoff at the sloppiness of this approach, complaining that @ signs do not
always equal e-mail addresses. To address this problem, you might want to try a regular
expression designed to home in on true e-mail addresses:
var str = 'kt@acroforms.com, jon2456@aol.com, ed@ed.com';var m =
s.match(/w+[.]?w*@w+[.]w+/g); // regex for e-mail
addressesvar count = m.length; // the count
Things to note
Bear in mind, a dyed-in-the-alpaca regex dilettante would still scoff at this
approach, since the above regular expression is hardly a failsafe regex for checking e-mail addresses. Nevertheless, it’s better than what we had. At least now we’re looking for one or more word characters (w+) followed optionally by a period and zero or more word characters, followed by an @, followed by one or more word characters, a period, and one or more word characters.
The derivation of a truly failsafe regex for e-mail addresses (which works across
newlines) is left to the reader as an exercise. Bwa-ha-ha…