PUT /blogposts/post/1
{
"title": "About popularity",
"content": "In this post we will talk about...",
"votes": 6
}
Boosting by Popularity
Imagine that we have a website that hosts blog posts and enables users to vote for the blog posts that they like. We would like more-popular posts to appear higher in the results list, but still have the full-text score as the main relevance driver. We can do this easily by storing the number of votes with each blog post:
At search time, we can use the function_score query with the
field_value_factor function to combine the number of votes with the full-text relevance score:
GET /blogposts/post/_search
{
"query": {
"function_score": { (1)
"query": { (2)
"multi_match": {
"query": "popularity",
"fields": [ "title", "content" ]
}
},
"field_value_factor": { (3)
"field": "votes" (4)
}
}
}
}
-
The
function_scorequery wraps the main query and the function we would like to apply. -
The main query is executed first.
-
The
field_value_factorfunction is applied to every document matching the mainquery. -
Every document must have a number in the
votesfield for thefunction_scoreto work. If every document does not have a number in thevotesfield, then you must use the {ref}/query-dsl-function-score-query.html#function-field-value-factor[missingproperty] to provide a default value for the score calculation.
In the preceding example, the final _score for each document has been altered as
follows:
new_score = old_score * number_of_votes
This will not give us great results. The full-text _score range
usually falls somewhere between 0 and 10. As can be seen in Linear popularity based on an original _score of 2.0, a blog post with 10 votes will
completely swamp the effect of the full-text score, and a blog post with 0
votes will reset the score to zero.
_score of 2.0modifier
A better way to incorporate popularity is to smooth out the votes value
with some modifier. In other words, we want the first few votes to count a
lot, but for each subsequent vote to count less. The difference between 0
votes and 1 vote should be much bigger than the difference between 10 votes
and 11 votes.
A typical modifier for this use case is log1p, which changes the formula
to the following:
new_score = old_score * log(1 + number_of_votes)
The log function smooths out the effect of the votes field to provide a
curve like the one in Logarithmic popularity based on an original _score of 2.0.
_score of 2.0The request with the modifier parameter looks like the following:
GET /blogposts/post/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "popularity",
"fields": [ "title", "content" ]
}
},
"field_value_factor": {
"field": "votes",
"modifier": "log1p" (1)
}
}
}
}
-
Set the
modifiertolog1p.
The available modifiers are none (the default), log, log1p, log2p,
ln, ln1p, ln2p, square, sqrt, and reciprocal. You can read more
about them in the
{ref}/query-dsl-function-score-query.html#function-field-value-factor[field_value_factor documentation].
factor
The strength of the popularity effect can be increased or decreased by
multiplying the value in the votes field by some number, called the
factor:
GET /blogposts/post/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "popularity",
"fields": [ "title", "content" ]
}
},
"field_value_factor": {
"field": "votes",
"modifier": "log1p",
"factor": 2 (1)
}
}
}
}
-
Doubles the popularity effect
Adding in a factor changes the formula to this:
new_score = old_score * log(1 + factor * number_of_votes)
A factor greater than 1 increases the effect, and a factor less than 1
decreases the effect, as shown in Logarithmic popularity with different factors.
boost_mode
Perhaps multiplying the full-text score by the result of the
field_value_factor function still has too large an effect. We can control
how the result of a function is combined with the _score from the query by
using the boost_mode parameter, which accepts the following values:
multiply-
Multiply the
_scorewith the function result (default) sum-
Add the function result to the
_score min-
The lower of the
_scoreand the function result max-
The higher of the
_scoreand the function result replace-
Replace the
_scorewith the function result
If, instead of multiplying, we add the function result to the _score, we can
achieve a much smaller effect, especially if we use a low factor:
GET /blogposts/post/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "popularity",
"fields": [ "title", "content" ]
}
},
"field_value_factor": {
"field": "votes",
"modifier": "log1p",
"factor": 0.1
},
"boost_mode": "sum" (1)
}
}
}
-
Add the function result to the
_score.
The formula for the preceding request now looks like this (see Combining popularity with sum):
new_score = old_score + log(1 + 0.1 * number_of_votes)
summax_boost
Finally, we can cap the maximum effect that the function can have by using the
max_boost parameter:
GET /blogposts/post/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "popularity",
"fields": [ "title", "content" ]
}
},
"field_value_factor": {
"field": "votes",
"modifier": "log1p",
"factor": 0.1
},
"boost_mode": "sum",
"max_boost": 1.5 (1)
}
}
}
-
Whatever the result of the
field_value_factorfunction, it will never be greater than1.5.
|
Note
|
The max_boost applies a limit to the result of the function only, not
to the final _score.
|