Embeddings and similarity
Rek.ai includes functionality to detect which pages and Q&A items are about related topics. This is done using embeddings and similarity.
What are embeddings?
An embedding is a numerical representation of text, such as a webpage or a Question and nswer pair. Instead of storing the text itself, the system converts it into a vector (a list of numbers) that captures the meaning of the content.
This allows rek.ai to compare and find semantically similar pages or questions, even if they are worded differently.
How can I control embeddings?
Rek.ai automatically uses embeddings when generating recommendations and keyword suggestions. However, there may be cases, such as section pages, where this function does not affect the recommendations. In these situations, you can force rek.ai to use embeddings by adding data-forceembedding.
By default, recommendations are generated using a mix of statistical patterns and related pages (via embeddings). You can instruct the system to return only recommendations based on embeddings by using data-onlyembeddings.
If you want to control the mix between similar pages (via embeddings) and pages generated through statistics, you can do so with data-nrofembeddedhits.
How can we measure whether two texts are about the same topic?
How similar two text objects are is measured in similarity. This is a value ranging from 0 (exactly the same text) to above 10 when there is no similarity at all. Rek.ai only recommends pages with a similarity of around 1.15.
If you want a looser or stricter match, you can control the required level of precision using the data-maxsimilarity parameter.
So data-maxsimilarity="0.7"
would result in a very narrow selection of pages.
Final example
This example shows how to get 5 recommendations based only on embeddings with a maximum similarity of 0.7.
Even if the page dosent use enbeddings during normal sercumstances, it will do so here because of the data-forceembedding="true"
parameter.
<div class="rek-prediction" data-nrofhits="5" data-forceembedding="true" data-onlyembeddings="true" data-maxsimilarity="0.7"></div>