Keyword Grouping Tool Guide

Edited November 17, 2022

    Keyword grouper is the automated grouping of keywords based on search engine results.

    The Rush Analytics clustering algorithm will gather the top 10 URLs of Google’s search engine results for each of your keywords, compare the results for each keyword, and cluster the keywords in exactly such a way that they will be successfully promoted in search engines.

    There are two methods of clustering in Rush Analytics: Soft and Hard.

    Once the keywords have been processed, you will have a site structure that is almost ready and correctly formed to be friendly to search engines. Based on the search volume for each keyword group, you can easily decide whether to create additional pages on your site.

    Clustering FAQ: The most frequent questions from our users

    • What is keyword clustering? How does it work?

      Clustering is the grouping of keywords based on a comparison of search engine results. The algorithm will gather the top 10 URLs for your keywords, compare the results for each keyword, and group the queries so they will be successfully promoted in search engines, and so it will be convenient and logical to create pages on your site.

    • What data do I need for keyword clustering?

      You need to upload a list of keywords and their search volume to Rush Analytics.
    • What is clustering accuracy?

      Clustering accuracy indicates how many common URLs are needed in the TOP10 search results for the keywords to fall into one cluster.

      In other words, the higher the clustering accuracy, the more similar phrases will fall into the same group (cluster). For most topics, an accuracy of 5 will be sufficient.

    • Why are there several options for clustering accuracy?

      Each topic has its own necessary and sufficient threshold of search engine similarity to build a semantic core. For example, when promoting an online shop, it would be a big problem if the keywords “Redmond Multicooker RX500” and “Redmond Multicooker RX500-1” fell into one cluster, because they are different products and should be promoted on different product cards. Here we recommend using an accuracy of 5.

      For info topics, such as sites for discounts or recipes, such accuracy is not needed. Here the task is to get the maximum number of grouped clusters for writing articles. For these sites, we recommend a precision of 3 or 4. For sites in very competitive topics, where the fight for the top is mainly competitive high-volume keywords, we recommend using increased accuracy of clustering (6 or 7), or hard clustering and creating separate pages for non-clustered keywords.

    • If I choose two or more accuracy settings in the clustering, will I be charged 2/3/4 times more credits?

      No. The clustering charge will be based on the number of keywords loaded into the task. You can select all types of accuracy at no extra charge.
    • Can two clusters be combined into one if logically their keywords should be promoted on the same page?

      Yes, they can. Sometimes it is even necessary. When can you combine two clusters into one? 
      Often keywords like “buy Redmond multicooker” and “Redmond multicooker price” may fall into different clusters because of the low quality of the Google search for these keywords. 

      In this case, you need to combine these clusters into one and promote the Redmond multicookers page. This is quite a normal situation. 
    • When can two clusters not be combined into one? 

      When one cluster consists of informational keywords, and the other commercial. For example, the clusters “buy Redmond multicooker” and “review Redmond multicookers” cannot be combined because these keywords should in principle be promoted to different pages.

    • Why weren’t all the keywords clustered?

      Because the keywords in the Unclustered tab did not find a match for the cluster. Unfortunately, not all keywords can be clustered, as they are not all related to each other. 

      We are guided primarily by how keywords will be promoted (ranked) and we group them based on search engine similarity. For example, the keywords “mobile phone” and “mobile phones” should be promoted on different pages, because one is an information keyword and the other is a commercial keyword, and they will never be promoted on the same page.

    • What about unclustered keywords?

      If you find valuable keywords in the list of unclustered keywords, you can manually add them to existing groups (they may not be linked due to bad rankings) or create separate pages for those keywords on your site.

    • How do I increase the percentage of clustered keywords?

      Here we have two practical pieces of advice:
      Decrease the accuracy of the clustering. In this case, you will get wider clusters and more clusters of 2-3 keywords. This method is good for informational websites. For online shops, keywords may fall into the wrong groups when clustered by the search volume.
      Do a combination of Search volume and marker clustering:

      – First, manually select markers that will be the core of your site structure and which are easy to find. You can do this by using logical hypotheses and looking at keyword search volume.

      – Use stop words (for example, cities you don’t promote, negative keywords like “download free”, or “no SMS” when collecting keywords or when creating a clustering task).

      This will help raise the percentage of clustered keywords to 45-65%.

    Step-by-step guide:

    Create a task. To create a task and cluster your keywords, go to the clustering tab and click on “Create new task”.

    Step one: Search Engine and Region.

    Here you need to enter a task name (mandatory field). You can enter any name; it is often convenient to enter the name of a site so that you can easily find the desired task in the future.

    Next, specify the search engine by the data on which the clustering will be performed. 

    For Google, all regions and languages of the world are currently available.

    1. Step two: Task settings

    All about our clustering algorithms

    Clustering method::

    • Soft clustering: In this clustering method, the algorithm identifies the central (marker) keywords and compares all other keywords to them. This algorithm is ideal for clustering keywords for traffic projects: online shops, information sites, and service sites with little competition.
    • Hard clustering: Keywords are combined into a group only if there is a common set of URLs for all of them. This type of clustering groups fewer keywords, but with very high accuracy. Ideal for competitive high-volume keywords.

    Type – Choose a clustering algorithm.

    We have 3 clustering algorithms:

    • Clustering with manually entered core keywords (markers)
    • Clustering by Search Volume
    • Combination of Search Volume and marker clustering.

    They work on the same basic principle — a comparison of the similarities of the top search engines — but are designed to solve slightly different problems.

    Algorithm using manually entered core keywords (markers):

    This algorithm is most effective when your site already has an extensively branched semantic structure (directory) so you don’t need to expand it, you know all the markers, and you just need to understand which keywords you are going to use to promote existing pages. In this case, take your markers (category/page titles), collect the Google search suggestions for them, mark the markers as 1 and the collected keyword cloud as 0, and perform clustering. In the output, you will get a ready-made semantic core for your categories, while keywords that are not assigned to your structure will remain unclustered.

    Upload format: keyword | marker(1/0) – download sample input file.

    Search Volume clustering algorithm

    This algorithm solves the inverse problem of the manual marker algorithm. You do not yet know the structure of your site and can not allocate markers: you just collected keywords, search suggestions, and search volume. Next, you need to structure the semantic core. In this case, the search volume clustering algorithm is the best way to do it.

    The entire list of keywords is sorted by decreasing search volume. The algorithm tries to link all the possible words from the list to the highest volume keyword and forms a cluster, then iterates for the next highest volume keywords.

    Don’t worry that keywords might bind to the wrong cluster on the first pass of the algorithm; we use binary tree machine learning algorithms to prevent this 🙂

    Data loading format: keyword | search volume – download sample input file

    Combined algorithm of Search Volume and marker clustering – combines the approaches of the previous two methods.

    This algorithm is suitable for the task of simultaneously selecting keywords for the existing site structure and expanding it. It works as follows: first, we try to link all possible keywords to your marker keywords and generate a ready-made structure linked to your markers. Next, all the keywords that have not been linked to your markers are sorted by decreasing search volume and grouped. As a result, you get:

    a) A ready-made semantic core for the existing categories of your site.

    b) Expansion of the semantic core for your site.

    We strongly recommend using the combined algorithm; it gives the best results.

    Load data format: keyword | marker(1/0) | search volume – download sample input file

    All you need to know about clustering accuracy

    The better the clustering accuracy, the more similar phrases will fall into one group (cluster).

    In other words, this option is responsible for how many common URLs are in the top 10 of the search engine so that keywords are in the same cluster.

    Each topic has its own necessary and sufficient threshold of similarity to get a quality semantic core. For example, when optimizing an online shop, it would be a big problem if during clustering keywords “Redmond Multicooker RX500” and “Redmond Multicooker RX500-1” fell into one cluster, because they are different products and should be promoted on different product cards. Here we recommend using an accuracy of 5.

    For info topics, such as sites for discounts or recipes, such accuracy is not needed. Here, the task is to get the maximum number of grouped clusters for writing articles. For these sites, we recommend a precision of 3 or 4. For sites in very competitive topics, where the fight for the top is mainly competitive high-volume keywords, we recommend using increased accuracy of clustering (6 or 7), and creating separate pages for non-clustered keywords.

    We recommend you choose options 3-6 and see from the results which query clustering has sufficient completeness and accuracy for your semantics. The higher the accuracy, the smaller the keyword groups will be.

    Other clustering settings

    Define relevant URLs for clusters of an existing site.

    Simply enter the domain and our algorithms will try to determine the relevant URLs for the resulting clusters.

    This option works in the following way: if your site is already in the top 10 of a search engine by the main (marker) keyword, we will show this URL and highlight it in green. If there are no keywords for your site in TOP10, we will select the URL for the marker keyword with the use of the site: operator.

    This option works in the following way: if your site is already in the top 10 of a search engine by the main (marker) keyword, we will show this URL and highlight it in green. If there are no keywords for your site in TOP10, we will select the URL for the marker keyword with the use of the site: operator.

    1. Step three: Keywords and price

    Upload a file with the keywords.

    Supported formats: xls, xlsx.  For Search Volume clustering, the input format is keywords; search volume.  For Manually entered core keywords (markers) clustering data format: keyword; marker (1/0 ). For clustering by a combination of Search Volume and marker clustering data format: keyword; marker; search volume.

    1. Enter negative words

    Phrases containing negative words will be excluded from the list before clustering. This functionality helps save clustering credits and solves the problem of manually clearing negative words from the results. It is especially useful if you are clustering a pre-cleaned list of keywords.

    We suggest using ready-made negative keyword lists by geographical names and different topics, or create your own negative words list.

    Click “Create New Task” – your task has been sent for clustering!

    You can now track the status of the task in the Queue tab or the list of clustering tasks.

    Rush Analytics currently has 4 statuses:

    In Queue – no data is being collected yet; the task is waiting its turn to collect data.

    Parsing – the counter shows how many keywords are being collected.

    Clustering – the task data is already being collected; the system is calculating all the necessary metrics to provide you with the results.

    Ready – the task is ready; you can view the results in the web interface or download them in XLSX format.

    Clustering output file – column descriptions

    The output of the clustering in XLSX format is as follows:

    • Keywords highlighted in gray are marker keywords manually specified by you or defined by the system.
    • Cluster name – the marker keyword’s name is taken.
    • Cluster size – the number of keywords in the cluster.
    • Keyword search volume – the search volume you have set in the “Keywords” step. 
    • Total cluster search volume – the sum of search volume of all keywords in the cluster.
    • Top matches – the number of common URLs in the search results for a given keyword with the results for the reference (marker) keyword.
    • Highlights – Highlights from search engine results collected for your keyword.
    • Highlights for a cluster – Highlights without duplicates for all words in a given cluste
    • Top URL – the most visible URL of a competitor in the search engine results for all keywords in the cluster. Here we estimate the frequency of occurrence of competitor URLs in the output for each keyword and the position of each competitor URL in the search output.
    • Relevant URL – the relevant URL found for the cluster if the “Detect relevant URLs” option was selected.

    The option works in the following way: if your site is already in TOP10 by the main (marker) keyword, we will show this URL and highlight it in green. If your site is not in the top 10 of search results for any of the keywords, we will select the URL for the marker keyword using the site operator:

    You can then combine logically related groups to build the structure of the site.

    Is this article helpful?

    Yes 2
    No 0