JavaScript – Counting Occurrences of a Particular Token

Sometimes, you need to know how many times a particular token occurs in a given string.

For example, you may need to know how many times the word ‘California’ is used in a given

block of text. Or you may want to know how many e-mail addresses exist in a given string.

The hard way to find this out would be to set up some kind of loop that goes through the

string character by character or line by line, looking for (and tallying) occurrences of a

given token. A far easier thing to do is use the match() method of JavaScript’s

String object. The match() method is extremely powerful for this, because first of

all, it lets you search on a regular expression; and secondly, it returns all the ‘match

substrings’ found, in an array. An example will make this clear.

Suppose you have the following String and you want to count the number of e-mail

addresses in it:

var str = ',,';var matches = 

str.match(/@/g); // try to match '@'var count = matches.length; // this is how many!

If one can make the (sometimes safe) assumption that ‘and’ signs (@) only occur inside

e-mail addresses, then we can simply do a search on @, as in str.match(/@/g). Note

the lowercase ‘g’, which means to apply the search globally to the entire string. This is

important, because if you fail to include this parameter, the search will end after the

first successful match. That’s not what we want. We want ALL matches to be found. Always

remember that the match() function returns an Array: the Array of matches!

Obviously, the length of this Array will be equal to the number of matches found.

A purist would scoff at the sloppiness of this approach, complaining that @ signs do not

always equal e-mail addresses. To address this problem, you might want to try a regular

expression designed to home in on true e-mail addresses:

var str = ',,';var m = 

s.match(/w+[.]?w*@w+[.]w+/g); // regex for e-mail

addressesvar count = m.length; // the count

Things to note

Bear in mind, a dyed-in-the-alpaca regex dilettante would still scoff at this

approach, since the above regular expression is hardly a failsafe regex for checking e-mail addresses. Nevertheless, it’s better than what we had. At least now we’re looking for one or more word characters (w+) followed optionally by a period and zero or more word characters, followed by an @, followed by one or more word characters, a period, and one or more word characters.

The derivation of a truly failsafe regex for e-mail addresses (which works across
newlines) is left to the reader as an exercise. Bwa-ha-ha…

You May Also Like

About the Author: Kas Thomas

Leave a Reply