Back

How to match and count keywords in text using JavaScript

July 4, 2023 4 minute read
Searching a dictionary laid on a map
Source: Unsplash

Introduction

Keywords play a crucial role in analysing and extracting information from text data. Whether you're building a search functionality or conducting text analysis, being able to match and count keywords in JavaScript can be a valuable skill. In this article, we will explore a step-by-step approach to achieving this using JavaScript. I used this approach whilst creating an interactive JavaScript tool Job Application Keyword Checker. Be sure to check it out!

Define your keywords and text

The first step is to define the keywords you want to search for in the text. Create an array and populate it with the keywords you wish to match. Next, you need to obtain the text in which you want to search for the keywords. This can be any string of text you have or even user input. For demonstration purposes, let's assume we have the following:

const keywords = ["this", "where", "keywords", "none"];

const text = "This is the input text where we will search for keywords.";

Be sure to customise this array with your own set of keywords and you own text input.

Match and count keywords using regex

Now that we have our keywords and text ready, let's proceed with the matching and counting process. We will iterate over each keyword and utilise regular expressions to find matches in the text. We'll also count the occurrences of each keyword.

let keywordCount = 0;
const keywordCounts = {};

keywords.forEach(keyword => {
  const regex = new RegExp(keyword, "gi");
  const matches = text.match(regex);

  if (matches) {
    keywordCount[keyword] = matches.length;
    keywordCount++;
  } else {
    keywordCount[keyword] = 0;
  }
});

console.log(keywordCount);

In the code above, we iterate over each keyword and create a regular expression using the keyword and the "gi" flags. The "g" flag enables a global search to find all occurrences of the keyword, while the "i" flag ensures case-insensitive matching.

Using the match method on the text with the regular expression, we find all the matches. If matches are found, we store the count in the keywordCount object; otherwise, we set the count to 0.

Finally, we log the keywordCount object to the console, which displays the count of each keyword in the text.

Match and count keywords using array.includes()

An alternative and more iterative approach is to transform the input text to an array and then match words from each array. We first would need a function to transform a string into an array.

/**
 * Parses an input string and transforms it into 
 * an array of words
 */
function getWords(str) {
    let words = str.toLowerCase().split(" ");
    let uniqueWords = [...new Set(words)];
    
    for (let i = 0; i < uniqueWords.length; i++) {
        uniqueWords[i] = uniqueWords[i].replace(/-/g, " ");
    } 

    return uniqueWords;
}

We can then use this to match the keywords. We use toLowerCase to avoid case sensitive mismatches.

let textArray = getWords(text);
let matchedWords = [];

// Go over each word in the text array and find matches
for (let i = 0; i < textArray.length; i++) {
    let word = textArray[i].toLowerCase();

    if (!matchedWords.includes(word)) {
        if (keywords.includes(word)) {
            matchedWords.push(word);
        }
    }
}

// Then go over all keywords to cross check
for (let i = 0; i < keywords.length; i++) {
    let term = keywords[i];

    if (!matchedWords.includes(term)) {
        if (text.toLowerCase().includes(term)) {
            matchedWords.push(term);
        }
    }
} 

console.log(matchedWords.length);

Conclusion

Matching and counting keywords in text is a useful technique when working with JavaScript and textual data. By following the steps outlined in this article, you can easily implement this functionality into your own projects. Remember to customize the keywords and text variables to match your specific use case.

Feel free to experiment and enhance this code further by considering variations of keywords, such as plural forms or different tenses. Advanced techniques like stemming or lemmatisation can be employed to achieve more comprehensive keyword matching. Stemming is the process of reducing words to their base or root form, disregarding variations like tense or plural forms, to improve keyword matching and analysis in text data whereas lemmatisation is the process of reducing words to their base or dictionary form. So a good example would be the word "running" becomes "run".

Harnessing the power of JavaScript and keyword matching opens up possibilities for creating powerful search engines, text analysis tools, and much more. Start exploring and leveraging this technique to unlock the potential within your own solutions!

As always, if you enjoyed this article, be sure to check out other articles on the site.

If you are interested in finding out how to search for keywords using Python, then check out Using PyPDF2 to score keywords in a job application.