2.1 How it works

At the basis of the VOSviewer is the visualization of similarities (N. J. V. Eck and Waltman 2007) (N. J. van Eck and Waltman 2010). The objects – articles, journals, organizations, authors or terms – are located relative to each other in such a way that the distance between two objects is an approximate measure of their similarity within the set.

For example if we would look at a data set of articles about food, a topic such as food processing will form a cluster of its own, while a topic such as nutritional value might form another cluster at a distance from the processing. However, there will also be some instances where the food processing method has a direct influence on the nutritional value: these articles will be found in between the two clusters or at the edge of either cluster, depending on the main topic of the article.

So how exactly does VOS know if something is similar? Within the program we can choose from several options to base the similarity upon, depending on the data set and type of analysis we want to do:

· Co-authorship, to build collaboration networks and find partners for research projects

· Co-occurrence (based on the documents terms occuring together), to discover the best search keywords and define popularity, age and impact of topics

· Citation analysis (direct citation), to discover seminal papers that everyone refers to

· Bibliographic coupling (based on the number of shared references), to find articles with a common knowledge base

· Co-citations (the number of times articles are cited together), to find complementary articles

The information VOSviewer uses is all contained within the set of articles you start from. So let’s say we want to find the most popular articles within a topic. We will start by downloading all of these articles including their citation information. We see documents as similar if they refer to or are cited by the same other documents (similarity because they link each other, the direction is not important). Thus we can calculate for all pairs of documents in our set how similar document i is to document j. VOS uses the following formula for this:

\[ similarity_{ij}=\cfrac{2n_{documents}citations_{shared\_ij}}{citations_{total_i}citations_{total_j}}=\cfrac{relation\_between\_items}{normalisation\_factor} \]

So how do we go from a high dimensional similarity- matrix - score to a visual representation? The really difficult task of VOS is to map all of these documents on a two dimensional chart, so that the distance between the documents is related to the inverse of their similarity (they are displayed closer to each other if they are more similar), while at the same time the documents should not overlap in the visualization. As the sets get bigger you may imagine how the complexity builds and the exact position becomes more of an estimation, but still provides us with a clear picture of overlap within linked articles. The distance between fully unlinked articles or clusters of articles is less clear: in fact they should be as far from the other documents they are not (indirectly) linked to as possible.

img

TIP: Layout is created by optimizing the possibilities. The computer algorithm in use tries all combinations until it finds a semi-optimal presentation. As the algorithm starts at a random position, the results may differ if you run the layout multiple times. To prevent this, set a specific number (not 0) for random seed. This option can be found in the analysis tab, under the ‘Advanced parameters’ of the Layout section.

After the mapping to x,y-coordinates on a map, VOSviewer goes one step further and also indicates to us some clusters of highly related terms visualized with colours. These clusters make the maps so easy to interprete (and sell in presentations to funders).

A common way in AI to find clusters is by density-based clustering. In other words, where there are many points, there is most likely a clustering of related points. There is no set number of clusters in this technique, but you have to set tresholds (which you can adjust in the program) for the document similarity so the algorithm knows when to neglect a document for a specific cluster. It works as follows:

img

If the clustering seems off, you can play with the thresholds to provide a better view. Overall clustering works best if the set is not completely uniform. In case you see just one big circle, it could well be that your set consists of very uniform topics or highly unrelated topics.

In VOSviewer you will notice that some items are visualized in between items of other colors. The principles mentioned above do still apply, but the real clustering uses the similarity network in the background. While the visualization is 2-dimensional, this similarity network has many dimensions.

References

Eck, Nees Jan Van, and Ludo Waltman. 2007. “VOS: A New Method for Visualizing Similarities Between Objects.” Studies in Classification, Data Analysis, and Knowledge Organization, 299–306. https://doi.org/10.1007/978-3-540-70981-7_34/COVER.
Eck, Nees Jan van, and Ludo Waltman. 2010. “Software Survey: VOSviewer, a Computer Program for Bibliometric Mapping.” Scientometrics 84 (August): 523–38. https://doi.org/10.1007/s11192-009-0146-3.