The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Elastic::Manual::QueryDSL::Queries - Overview of the queries available in Elasticsearch

VERSION

version 0.52

INTRODUCTION

Queries should be used instead of filters where "relevance scoring" is appropriate.

While a filter can be used for:

    Give me docs that include the tag "perl" or "python"

... a query can do:

    Give me docs that include the tag "perl" or "python", sorted by relevance

This is particularly useful for full text search, where there isn't a simple binary Yes/No answer. Instead, we're looking for the most relevant results which match a complex phrase like "perl unicode cookbook".

QUERY TYPES

There are 5 main query types:

Analyzed queries

These are used for full text search on unstructured text. The search keywords are analyzed into terms before being searched on. For instance: WHERE matches(content, 'perl unicode')

Exact queries

These are used for exact matching. For instance: WHERE tags IN ('perl','python').

Combining queries

These combine multiple queries together, eg and or or.

Scoring queries

These can be used to alter how the relevance score is calculated.

Joining queries

These work on parent-child relationships, or on "nested" docs.

BOOST

"Boost" is a way of increasing the relevance of part of a query. For instance, if I'm searching for the words "perl unicode" in either the title or content field of a post, I could do:

    $view->queryb([
        content => 'perl unicode',
        title   => 'perl unicode',
    ]);

But it is likely that documents with those words in the title are more relevant than if those words appear only in the content, so we can boost the title field:

    $view->queryb([
        content => 'perl unicode',
        title   => {
            '=' => {
                query => 'perl unicode',
                boost => 2
            }
        },
    ]);

Or in the native Query DSL:

    $view->queryb(
        bool => {
            should => [
                { match => { content => 'perl unicode' } },
                { match => {
                    title => {
                        query => 'perl unicode',
                        boost => 2
                    }
                }}
            ]
        }
    );

The boost is multiplied with the _score, so a boost less than 1 will decrease relevance. Also see "explain" in Elastic::Model::Result for help when debugging relevance scoring.

ANALYZED QUERIES

The search keywords are analyzed before being searched on. The analyzer is chosen from the first item in this list which is set:

  • The analyzer specified in the query

  • The search_analyzer specified on the field being searched

  • The analyzer specified on the field being searched

  • The default analyzer for the type being searched on

Simple text queries

SearchBuilder
    # where title matches "perl unicode"
    $view->queryb( title => 'perl unicode' );
    $view->queryb( title => { '=' => 'perl unicode' });

    # where the _all field matches "perl unicode"
    $view->queryb( 'perl unicode' );
    $view->queryb( _all => 'perl unicode');

See "= | -text | != | <> | -not_text" in ElasticSearch::SearchBuilder.

QueryDSL
    # where title matches "perl unicode"
    $view->query( match => { title => 'perl unicode' } );

    # where the _all field matches "perl unicode"
    $view->query( match => { _all => 'perl unicode' });

See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html

Phrase queries

Phrase queries match all words in the phrase, in the same order.

SearchBuilder
    # where title matches the phrase "perl unicode"
    $view->queryb( title => { '==' => 'perl unicode });

    # where 'unicode' precedes 'perl' within 5 words of each other
    $view->queryb(
        title => {
            '==' => {
                query => 'perl unicode',
                slop  => 5
            }
        }
    );

    # where title contains a phrase starting with "perl unic"
    $view->queryb( title => { '^' => 'perl unic' });

See "== | -phrase | -not_phrase" in ElasticSearch::SearchBuilder and "^ | -phrase_prefix | -not_phrase_prefix" in ElasticSearch::SearchBuilder.

QueryDSL
    # where title matches the phrase "perl unicode"
    $view->query(
        match_phrase => {
            title   => 'perl unicode'
        }
    );

    # where 'unicode' precedes 'perl' within 5 words of each other
    $view->query(
        match_phrase => {
            title   => {
                query => 'perl unicode',
                slop  => 5
            }
        }
    );

    # where title contains a phrase starting with "perl unic"
    $view->query(
        match_phrase_prefix => {
            title => 'perl unic'
        }
    );

See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html#_match_phrase_prefix.

Lucene query parser syntax

The query_string and field queries use the Lucene query parser syntax allowing complex queries like (amongst other features):

Logic

'mac AND big NOT apple' or '+mac +big -apple'

Phrases

'these words and "exactly this phrase"'

Wildcards

'test?ng wild*rd'

Fields

'title:(big mac) content:"this exact phrase"'

Boosting

'title:(perl unicode)^2 content:(perl unicode)'

Proximity

(quick brown dog)~10 (within 10 words of each other)

The query_string query can also be used for searching across multiple fields.

There are two downsides to this query:

  • The syntax must be correct, otherwise your query will fail.

  • Users can search any field using the "field:" syntax.

You can use "filter_keywords()" in ElasticSearch::Util for a simple filter, or ElasticSearch::QueryParser for a more flexible solution.

SearchBuilder
    # where the title field matches '+big +mac -apple'
    $view->queryb( title => { -qs => '+big +mac -apple' });

    # where the _all field matches '+big +mac -apple'
    $view->queryb( _all => { -qs => '+big +mac -apple' });

    # where the title or content fields match '+big +mac -apple'
    $view->queryb(
        -qs =>{
            query   => '+big +mac -apple',
            fields  => ['title^2','content']  # boost the title field
        }
    );

See "-qs | -query_string | -not_qs | -not_query_string" in ElasticSearch::SearchBuilder.

QueryDSL
    # where the title field matches '+big +mac -apple'
    $view->query(
        query_string => {
            query => '+big +mac -apple',
            fields => ['title'],
        }
    );

    # where the _all field matches '+big +mac -apple'
    $view->query( query_string => { query => '+big +mac -apple' });
    $view->query(
        query_string => {
            query => '+big +mac -apple',
        }
    );

    # where the title or content fields match '+big +mac -apple'
    $view->query(
        query_string =>{
            query   => '+big +mac -apple',
            fields  => ['title^2','content']  # boost the title field
        }
    );

See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html.

More-like-this and Fuzzy-like-this

The more-like-this query tries to find documents similar to the search keywords, across multiple fields. It is useful for clustering related documents.

See "-mlt | -not_mlt" in ElasticSearch::SearchBuilder, http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html.

The fuzzy-like-this query is similar to more-like-this, but additionally "fuzzifies" all the search terms (finds all terms within a certain Levenshtein edit distance).

See "-flt | -not_flt" in ElasticSearch::SearchBuilder, http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-flt-query.html.

EXACT QUERIES

These queries do not have an analysis phase. They try to match the actual terms stored in Elasticsearch. But unlike filters, the result of these queries is included in the relevance scoring.

Match all

Matches all docs.

SearchBuilder
    # All docs
    $view->queryb();
    $view->queryb( -all => 1 )

See "MATCH ALL" in ElasticSearch::SearchBuilder

QueryDSL
    # All docs
    $view->query();
    $view->query( match_all => {} )

See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-all-query.html

Equality

SearchBuilder:
    # WHERE status = 'active'
    $view->queryb( status => 'active' );

    # WHERE count = 5
    $view->queryb( count  => 5 );

    # WHERE tags IN ('perl','python')
    $view->queryb( tags  => [ 'perl', 'python' ]);

See "EQUALITY (QUERIES)" in ElasticSearch::SearchBuilder.

QueryDSL:
    # WHERE status = 'active'
    $view->query(  term   => { status => 'active' } );

    # WHERE count = 5
    $view->query(  term   => { count => 5 );

    # WHERE tags IN ('perl','python')
    $view->query(  terms => { tag => ['perl', 'python' ]})

See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-term-query.html and http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-terms-query.html.

Range

SearchBuilder:
    # WHERE date BETWEEN '2012-01-01' AND '2013-01-01'
    $view->queryb(
        date   => {
            gte => '2012-01-01',
            lt  => '2013-01-01'
        }
    );

See "RANGES" in ElasticSearch::SearchBuilder

QueryDSL:
    # WHERE date BETWEEN '2012-01-01' AND '2013-01-01'
    $view->query(
        range => {
            date => {
                gte => '2012-01-01',
                lt  => '2013-01-01'
            }
        }
    );

See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-range-query.html

Prefix, wildcard and fuzzy

A "fuzzy" query matches terms within a certain Levenshtein edit instance of the search terms.

Warning: These queries do not peform well. First they have to load all terms into memory to find those that match the prefix/wildcard/fuzzy conditions. Then they query all matching terms.

If you find yourself wanting to use any of these, then you should rather analyze your fields in a way that you can use a simple query on them instead, for instance, using the edge_ngram token filter or one of the phonetic token filters.

SearchBuilder
    # WHERE code LIKE 'abc%'
    $view->queryb( code => { '^' => 'abc' });

    # WHERE code LIKE 'ab?c%'
    $view->queryb( code => { '*' => 'ab?c*' })

    # where code contains terms similar to "purl unikode"
    $view->queryb( code => { fuzzy => 'purl unikode' })

See "PREFIX (FILTERS)" in ElasticSearch::SearchBuilder and "WILDCARD AND FUZZY QUERIES" in ElasticSearch::SearchBuilder.

QueryDSL
    # WHERE code LIKE 'abc%'
    $view->query( prefix => { code => 'abc' });

    # WHERE code LIKE 'ab?c%'
    $view->query( wildcard => { code => 'ab?c*' })

    # where code contains terms similar to "purl unikode"
    $view->query( fuzzy => { code => 'purl unikode' })

See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-prefix-query.html, http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-wildcard-query.html and http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-fuzzy-query.html.

COMBINING QUERIES

These queries allow you to combine multiple queries together.

Filtered query

By default, queries are run on all documents. You can use a filtered query to reduce which documents are queried. This is the same query that is used to combine the query and filter attributes of Elastic::Model::View.

For instance, if you only want to query documents where status = 'active', then you can filter your documents with that restriction. A filter does not affect the relevance score.

SearchBuilder
    # document where status = 'active', and title matches 'perl unicode'
    $view->queryb(
        title   => 'perl unicode',
        -filter => { status => 'active' }
    );

See "QUERY / FILTER CONTEXT" in ElasticSearch::SearchBuilder

QueryDSL
    # document where status = 'active', and title matches 'perl unicode'
    $view->queryb(
        filtered => {
            query => {
                match => { title => 'perl unicode'}
            },
            filter => {
                term => { status => 'active' }
            }
        }
    );

See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html.

Bool queries

bool queries are the equivalent of and, or and not except instead they use must, should and must_not. The difference is that you can specify the minimum number of should clauses that have to match (default 1).

In the SearchBuilder syntax, these use the same syntax as and, or and not but you can also use the -bool operator directly if you want to use minimum_number_should_match.

Note: the scores of all matching clauses are combined together.

SearchBuilder:

See "AND|OR LOGIC" in ElasticSearch::SearchBuilder and "-bool" in ElasticSearch::SearchBuilder

And
    # WHERE title matches 'perl unicode' AND status = 'active'
    $view->queryb( title => 'perl unicode', status => 'active' );
Or
    # WHERE title matches 'perl unicode' OR status = 'active'
    $view->queryb([ status => 'active', status => 'active' ]);
Not
    # WHERE status <> 'active'
    $view->queryb( status => { '!=' => 'active' });

    # WHERE tags NOT IN ('perl','python')
    $view->queryb( tags   => { '!=' => ['perl', 'python'] });

    # WHERE NOT ( x = 1 AND y = 2 )
    $view->queryb( -not   => { x => 1, y => 2 });

    # WHERE NOT ( x = 1 OR y = 2 )
    $view->queryb( -not   => [ x => 1, y => 2 ]);
minimum_number_should_match
    # where title matches 'object oriented'
    # and status <> 'inactive'
    # and tags contain 2 or more of 'perl','python','ruby'

    $view->queryb(
       -bool => {
           must          => [{ title => 'object oriented' }],
           must_not      => [{ status => 'inactive' }],
           should        => [
                { tag    => 'perl'   },
                { tag    => 'python' },
                { tag    => 'ruby' },
           ],
           minimum_number_should_match => 2,
       }
    )
QueryDSL:

See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html

And
    # WHERE title matches 'perl unicode' AND status = 'active'
    $view->query(
        bool => {
            must => [
                { match => { title  => 'perl unicode' }},
                { term => { status => 'active' }}
            ]
        }
    );
Or
    # WHERE title matches 'perl unicode' OR status = 'active'
    $view->query(
        bool => {
            should => [
                { match => { title  => 'perl unicode' }},
                { term => { status => 'active' }}
            ]
        }
    );
Not
    # WHERE status <> 'active'
    $view->query(
        bool => {
            must_not => [
                { term => { status => 'active' }}
            ]
        }
    );

    # WHERE tags NOT IN ('perl','python')
    $view->query(
        bool => {
            must_not => [
                { terms => { tag => [ 'perl','python' ] }}
            ]
        }
    );


    # WHERE NOT ( x = 1 AND y = 2 )
    $view->query(
        bool => {
            must_not => [
                { term => { x => 1 }},
                { term => { y => 2 }}
            ]
        }
    );

    # WHERE NOT ( x = 1 OR y = 2 )
    $view->query(
        bool => {
            must_not => [
                {
                    bool => {
                        should => [
                            { term => { x => 1 }},
                            { term => { y => 2 }}
                        ]
                    }
                }
            ]
        }
    );
minimum_number_should_match
    # where title matches 'object oriented'
    # and status <> 'inactive'
    # and tags contain 2 or more of 'perl','python','ruby'

    $view->query(
       bool => {
           must          => [{ match => { title => 'object oriented' }}],
           must_not      => [{ term => { status => 'inactive' }}],
           should        => [
                { term   => { tag    => 'perl'   }},
                { term   => { tag    => 'python' }},
                { term   => { tag    => 'ruby'   }},
           ],
           minimum_number_should_match => 2,
       }
    )

Dis_max / Disjunction max query

While the "Bool queries" combine the scores of each matching clause, the dis_max query uses the highest score of any matching clause. For instance, if we want to search for "perl unicode" in the title and content fields, we could do:

    $view->queryb(
        title   => 'perl unicode',
        content => 'perl unicode'
    );

But we could have a doc which matches 'perl' in both fields, and 'unicode' in neither. As a boolean query, these two matches for 'perl' would be added together. As a dis_max query, the higher score of the title or the content clause match would be used.

The tie_breaker can be used to give a slight advantage to docs where both clauses match with the same score.

SearchBuilder
    # without tie_breaker:
    $view->queryb(
        -dis_max => [
            { title   => 'perl unicode' },
            { content => 'perl unicode' }
        ]
    );

    # with tie_breaker:
    $view->queryb(
        -dis_max => {
            tie_breaker => 0.7,
            queries     => [
                { title   => 'perl unicode' },
                { content => 'perl unicode' }
            ]
        }
    );

See "-dis_max | -dismax" in ElasticSearch::SearchBuilder.

QueryDSL
    $view->query(
        dis_max => {
            tie_breaker => 0.7,
            queries     => [
                { title   => 'perl unicode' },
                { content => 'perl unicode' }
            ]
        }
    );

See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-dis-max-query.html.

Indices

The indices query can be used to execute different queries on different indices.

SearchBuilder
    # On index_one or index_two, only allow status = 'active'
    # On any other index, allow status IN ('active','pending')
    $view->queryb(
        -indices  => {
            indices       => [ 'index_one','index_two' ],
            query          => { status => 'active' },
            no_match_query => { status => [ 'active','pending' ]}
        }
    );

See "-indices" in ElasticSearch::SearchBuilder.

QueryDSL
    # On index_one or index_two, only allow status = 'active'
    # On any other index, allow status IN ('active','pending')
    $view->queryb(
        indices  => {
            indices       => [ 'index_one','index_two' ],
            query          => { term  => { status => 'active' }},
            no_match_query => { terms => { status => [ 'active','pending' ] }}
        }
    );

See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-indices-query.html.

SCORING QUERIES

These queries allow you to tweak the relevance _score, making certain docs more or less relevant.

IMPORTANT: The custom_score, custom_filters_score and custom_boost_factor queries have been removed in Elasticsearch 1.0 and replaced with the function_score query.

Support for this has not yet been added to the SearchBuilder.

Scoring with filters

The custom_filters_score query allows you to boost documents that match a filter, either with a boost parameter, or with a custom script.

This is a very powerful and efficient way to boost results which depend on matching unanalyzed fields, eg a tag or a date. Because the filters can be cached, it performs very well.

SearchBuilder
    # include recency in the relevance score
    $view->queryb(
        -custom_filters_score => {
            query       => { title => 'perl unicode' },
            score_mode  => 'first',
            filters     => [
                {
                    filter => { date => { gte => '2012-01-01' }},
                    boost  => 5
                },
                {
                    filter => { date => { gte => '2011-01-01' }},
                    boost  => 3
                },
            ]
        }
    );

See "-custom_filters_score" in ElasticSearch::SearchBuilder.

QueryDSL
    # include recency in the relevance score
    $view->query(
        custom_filters_score => {
            query       => { match => { title => 'perl unicode' }},
            score_mode  => 'first',
            filters     => [
                {
                    filter => { range => { date => { gte => '2012-01-01' }}},
                    boost  => 5
                },
                {
                    filter => { range => { date => { gte => '2011-01-01' }}},
                    boost  => 3
                },
            ]
        }
    );

See http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/query-dsl-custom-filters-score-query.html.

Other scoring queries

Boosting

Documents which match a query (eg "apple pear")can be "demoted" (made less relevant) if they also match a second query (eg "computer").

See "-boosting" in ElasticSearch::SearchBuilder or http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-boosting-query.html

Custom score

A custom_score query uses a script to calculate the _score for each matching doc.

See "-custom_score" in ElasticSearch::SearchBuilder or http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/query-dsl-custom-score-query.html

Custom boost factor

The custom_boost query allows you to multiply the scores of another query by the specified boost factor. This is a bit different from a standard boost parameter, which is normalized.

See "-custom_boost" in ElasticSearch::SearchBuilder or http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/query-dsl-custom-boost-factor-query.html

Constant score

The constant_score query does no relevance calculation - all docs are returned with the same score.

See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-constant-score-query.html.

JOINING QUERIES

Parent-child queries

Parent-child relationships are not yet supported natively in Elastic::Model. They will be soon.

In the meantime, see

Nested queries

See Elastic::Manual::QueryDSL::Nested.

SEE ALSO

AUTHOR

Clinton Gormley <drtech@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2015 by Clinton Gormley.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.