Postgres Search With Location Awareness and JSON-Based Facets

Fri 17 November 2017 | -- (permalink)

Postgres has good built-in search features (language-aware stemming, similarity ranking) that have been well-covered in blog posts like Postgres Full Text Search Is Good Enough. In this post I'd like to add a couple refinements on those techniques:

  • filtering results by facets
  • filtering results by proximity to a physical location

I'll provide examples in both raw SQL and Python code using SQLAlchemy. The examples will show how to create a simple search engine of local businesses. For certain kinds of businesses it will support filtering on specific facets, like whether the restaurant serves pizza. I won't be going into detail on how to optimize search performance, though I'll include a couple tips at the end.

Here's the SQL for creating our table:

BEGIN;
CREATE TABLE businesses (
  id SERIAL NOT NULL, 
  name TEXT NOT NULL, 
  description TEXT, 
  street_address TEXT[] NOT NULL, 
  city TEXT NOT NULL, 
  state TEXT NOT NULL, 
  postcode TEXT NOT NULL, 
  latitude FLOAT NOT NULL, 
  longitude FLOAT NOT NULL, 
  facets JSONB DEFAULT '{}' NOT NULL, 
  PRIMARY KEY (id)
);
COMMIT;

That was generated from this Python code:

from sqlalchemy import Column, Text, Float, Integer, ARRAY, create_engine
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()


class Business(Base):
    __tablename__ = 'businesses'
    id = Column(Integer, primary_key=True)
    name = Column(Text, nullable=False)
    description = Column(Text)
    street_address = Column(ARRAY(Text), nullable=False)
    city = Column(Text, nullable=False)
    state = Column(Text, nullable=False)
    postcode = Column(Text, nullable=False)
    latitude = Column(Float, nullable=False)
    longitude = Column(Float, nullable=False)
    facets = Column(JSONB, nullable=False, server_default='{}')

def printdb():
    "A helper function to print out the statements for creating the tables."
    statements = []
    def dump(sql, *multiparams, **params):
        statements.append(str(sql.compile(dialect=engine.dialect)).strip())
    engine = create_engine('postgresql://', strategy='mock', executor=dump)
    Base.metadata.create_all(engine, checkfirst=False)
    print('BEGIN;')
    print(';\n\n'.join(statements) + ';')
    print('COMMIT;')

if __name__ == '__main__':
    printdb()

Note that each business is required to have a name, address, latitude, and longitude. A text description is optional. There's also a JSON-based facets field for storing key/value data about each business. This way we can have fields for certain types of businesses (like foods served at restaurants) without cluttering the table with columns that are irrelevant for all the other businesses.

Now let's insert some data. We'll put in the following fake businesses to test with:

  • 4 restaurants
  • 2 department stores
  • 2 car repair shops

Paste this into psql to insert these records:

INSERT INTO businesses VALUES (1, 'Asian Star', 'Chinese classics plus sushi rolls from a dedicated bar, all served in an airy, contemporary space.', '{"7588 Union Park Ave"}', 'Midvale', 'UT', '84047', 40.6134920000000008, -111.857440999999994, '{"food_types": ["asian", "chinese", "japanese"], "price_range": 2, "kid_friendly": true}');
INSERT INTO businesses VALUES (2, 'Saigon Sandwich', 'a bright and stylish kitchen serving Banh Mi, Pho, and other authentic Vietnamese rice and noodle favorites.', '{"8528 S 1300 E"}', 'Sandy', 'UT', '84094', 40.5967240000000018, -111.854409000000004, '{"food_types": ["asian", "vietnamese"], "price_range": 1, "kid_friendly": true}');
INSERT INTO businesses VALUES (3, 'La Caille', 'Upscale restaurant & event venue serving French-Belgian fare in an elegant, ch√Ęteau-style setting.', '{"9565 Wasatch Blvd"}', 'Sandy', 'UT', '84092', 40.5759040000000013, -111.792124999999999, '{"food_types": ["french", "belgian"], "price_range": 3, "kid_friendly": false}');
INSERT INTO businesses VALUES (4, 'Mezquite Mexican Grill', NULL, '{"265 7200 S"}', 'Midvale', 'UT', '84047', 40.6204999999999998, -111.898919000000006, '{"food_types": ["mexican", "tacos", "burritos"], "price_range": 1, "kid_friendly": false}');
INSERT INTO businesses VALUES (5, 'Macy''s', 'Department store chain providing brand-name clothing, accessories, home furnishings & housewares.', '{"10600 S 110 W"}', 'Sandy', 'UT', '84070', 40.6364260000000002, -111.885600999999994, '{}');
INSERT INTO businesses VALUES (6, 'Target', 'Retail chain offering home goods, clothing, electronics & more, plus exclusive designer collections.', '{"7025 S Park Centre Dr"}', 'Salt Lake City', 'UT', '84121', 40.630391000000003, -111.849552000000003, '{}');
INSERT INTO businesses VALUES (7, 'Big O Tires', NULL, '{"2284 E Fort Union Blvd"}', 'Salt Lake City', 'UT', '84121', 40.6257689999999982, -111.825850000000003, '{"services": ["tires"]}');
INSERT INTO businesses VALUES (8, 'Master Muffler and Brake Complete Auto Care', NULL, '{"6790 State St"}', 'Murray', 'UT', '84107', 40.6301330000000007, -111.890652000000003, '{"services": ["mufflers", "brakes"]}');

-- fast forward our ID sequence to account for the rows inserted with explicit IDs
SELECT pg_catalog.setval('businesses_id_seq', 8, true);

We're going to start with a super simple search and then incrementally improve it.

Simple Name-Based Search

SELECT name, ts_rank_cd(
  to_tsvector(name),
  plainto_tsquery('asian')
) score FROM businesses
ORDER BY score DESC;

Here we're scoring each business record based on how well its name matches the term 'asian', and then ordering them with the highest scoring results first. The name is first converted to a tsvector (text search vector), and the query is first converted to a text search query. The plainto_tsquery will combine multiple search terms with a logical AND. If you want more control over the boolean logic, you can instead use to_tsquery and provide or own logical operators. (See docs). The results should be this:

                    name                     | ts_rank_cd 
---------------------------------------------+------------
 Asian Star                                  |        0.1
 Saigon Sandwich                             |          0
 La Caille                                   |          0
 Mezquite Mexican Grill                      |          0
 Macy's                                      |          0
 Target                                      |          0
 Big O Tires                                 |          0
 Master Muffler and Brake Complete Auto Care |          0
(8 rows)

Here's a Python script that uses SQLAlchemy to do the same thing:

from sqlalchemy import Column, Text, Float, Integer, ARRAY, create_engine, func, select
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm.session import sessionmaker

Base = declarative_base()


class Business(Base):
    __tablename__ = 'businesses'
    id = Column(Integer, primary_key=True)
    name = Column(Text, nullable=False)
    description = Column(Text)
    street_address = Column(ARRAY(Text), nullable=False)
    city = Column(Text, nullable=False)
    state = Column(Text, nullable=False)
    postcode = Column(Text, nullable=False)
    latitude = Column(Float, nullable=False)
    longitude = Column(Float, nullable=False)
    facets = Column(JSONB, nullable=False, server_default='{}')


def search_businesses(db, term):
    score = func.ts_rank_cd(func.to_tsvector(Business.name), func.plainto_tsquery(term))
    q = select([Business.name, score]).order_by(score.desc())
    return db.execute(q)

def main():
    # This assumes we've already done a "CREATE DATABASE sqlsearch;" in psql.
    db_url = 'postgresql://postgres@/sqlsearch'
    Session = sessionmaker(bind=create_engine(db_url))
    sess = Session()
    for result in search_businesses(sess, 'asian'):
        print(result)

if __name__ == '__main__':
    main()

Concatenating and Weighting Columns

The most obvious limitation of what we've done so far is that we're only searching names. A good search tool should be able to look at more metadata than that. Let's add descriptions and facets to our text search. We'll also weight the various fields so that names are most important, then descriptions, and then facets.

SELECT name, ts_rank_cd(
  setweight(to_tsvector(name), 'A') ||
  setweight(to_tsvector(description), 'B') ||
  setweight(to_tsvector(facets::text), 'C'),
  plainto_tsquery('asian')
) score FROM businesses
ORDER BY score DESC;

Each of the fields we're searching is first converted to a tsvector, then weighted, and then they're all concatenated together with the || operator. The resulting tsvector is compared against the search term, and results returned.

(Note that we first had to cast the facets JSONB field to text. The to_tsvector function will automatically clean out all the JSON punctuation cruft. As of Postgres 10, you can call to_tsvector on a JSON field without needing to cast to text first.)

Run that, and you should see this:

                    name                     | score 
---------------------------------------------+-------
 Mezquite Mexican Grill                      |      
 Master Muffler and Brake Complete Auto Care |      
 Big O Tires                                 |      
 Asian Star                                  |   1.2
 Saigon Sandwich                             |   0.2
 Macy's                                      |     0
 Target                                      |     0
 La Caille                                   |     0
(8 rows)

Uh oh. Some rows have no score. This happens when one of our fields is NULL. The only column that's allowed to be null in our table is the description column, so that must be the culprit. We can use the Postgres coalesce function to treat these descriptions as empty strings instead. The coalesce function accepts any number of arguments, and will return the first one that is not NULL:

SELECT name, ts_rank_cd(
  setweight(to_tsvector(name), 'A') ||
  setweight(to_tsvector(coalesce(description, '')), 'B') ||
  setweight(to_tsvector(facets::text), 'C'),
  plainto_tsquery('asian')
) score FROM businesses
ORDER BY score DESC;

Now we get a score for each business:

                    name                     | score 
---------------------------------------------+-------
 Asian Star                                  |   1.2
 Saigon Sandwich                             |   0.2
 La Caille                                   |     0
 Mezquite Mexican Grill                      |     0
 Macy's                                      |     0
 Target                                      |     0
 Big O Tires                                 |     0
 Master Muffler and Brake Complete Auto Care |     0
(8 rows)

Let's translate that into Python/SQLAlchemy (now showing just the search_businesses function for the sake of brevity):

def search_businesses(db, term):
    score = func.ts_rank_cd(
        func.setweight(
            func.to_tsvector(Business.name), 'A'
        ).op('||')(
            func.setweight(func.to_tsvector(func.coalesce(Business.description, '')), 'B')
        ).op('||')(
            func.setweight(func.to_tsvector(cast(Business.facets, Text)), 'C')
        ),
        func.plainto_tsquery(term)
    )
    q = select([Business.name, score]).order_by(score.desc())
    return db.execute(q)

If you run that, you'll get the same result as the raw SQL above.

Filtering By Facets

At this point we've got a pretty good text search engine of businesses in our table. But we want more features. I promised filtering based on facets and geographical distance. Let's do facets first.

Faceted search allows users to apply various filters to narrow down their search terms. The filters you want to support may vary by the kind of thing you're searching. If you're searching restaurants, then you may want to filter by cost, type of cuisine, and whether they're kid-friendly or not. If you're searching for cars, you might want to limit your search to sedans or minivans, or cars with gas mileage above a certain threshold.

If we just had restaurants in our table then we could have dedicated columns for food_types, kid_friendly, and price_range. Since our table also includes non-restaurant businesses like department stores and car repair shops, it makes sense to instead use a JSONB field that can store appropriate facets for each kind of business. Restaurants can have a food_types facet for the kinds of food they serve, while car repair shops can have a services facet for the kinds of service they offer. Creating your own taxonomy of categories and their supported facets can be a very big job, depending on how many different kinds of things you're searching.

Let's search for all "asian" restaurants where the price_range facet is 1 (think of this like the number of dollar signs next to a restaurant name in a Google result).

SELECT name, ts_rank_cd(
  setweight(to_tsvector(name), 'A') ||
  setweight(to_tsvector(coalesce(description, '')), 'B') ||
  setweight(to_tsvector(facets::text), 'C'),
  plainto_tsquery('asian')
) score FROM businesses
WHERE facets->>'price_range' = '1'
ORDER BY score DESC;

The results:

          name          | score 
------------------------+-------
 Saigon Sandwich        |   0.2
 Mezquite Mexican Grill |     0
(2 rows)

Notice that we're still getting a mexican restaurant in there, though its score is 0. Let's update the WHERE clause to exclude all results with a score of 0.

SELECT name, ts_rank_cd(
  setweight(to_tsvector(name), 'A') ||
  setweight(to_tsvector(coalesce(description, '')), 'B') ||
  setweight(to_tsvector(facets::text), 'C'),
  plainto_tsquery('asian')
) score FROM businesses
WHERE facets->>'price_range' = '1'
AND ts_rank_cd(
  setweight(to_tsvector(name), 'A') ||
  setweight(to_tsvector(coalesce(description, '')), 'B') ||
  setweight(to_tsvector(facets::text), 'C'),
  plainto_tsquery('asian')
) > 0
ORDER BY score DESC;

This is where SQL gets a little annoying. Postgres won't let us just say AND score > 0 to filter out the irrelevant results (if you try that it will tell you "column 'score' does not exist"), so we have to repeat all of the functions for concatenating and scoring. SQLAlchemy can make this code less verbose and error prone by letting you store the score clauses in a variable and use them in different parts of the query. Here we use score in the columns to be selected, the WHERE clause, and the ORDER BY clause.

def search_businesses(db, term, facets=None):
    score = func.ts_rank_cd(
        func.setweight(
            func.to_tsvector(Business.name), 'A'
        ).op('||')(
            func.setweight(func.to_tsvector(func.coalesce(Business.description, '')), 'B')
        ).op('||')(
            func.setweight(func.to_tsvector(cast(Business.facets, Text)), 'C')
        ),
        func.plainto_tsquery(term)
    )
    q = select([Business.name, score])

    # filter out all results with zero scores
    wheres = [
      score > 0
    ]
    if facets:
        # filter to businesses whose facets object contains all the key/value pairs passed to us.
        wheres.append(Business.facets.contains(facets))

    # join all our wheres with an AND, then feed them to the sqlalchemy query's where() method 
    q = q.where(and_(*wheres))

    # top scoring results first.
    q = q.order_by(score.desc())
    return db.execute(q)

In the raw SQL version you could do the scoring in a separate Common Table Expression (CTE) and JOIN to it to avoid repeating the scoring code, but I will not demonstrate that here.

Range-based faceted search

So this is cool, but missing a common way that people use facets. If someone's searching by price, for example, they're likely to want to search inequalities (less than, greater than) and not just equality to specific values. Let's update the query to do that.

SELECT name, ts_rank_cd(
  setweight(to_tsvector(name), 'A') ||
  setweight(to_tsvector(coalesce(description, '')), 'B') ||
  setweight(to_tsvector(facets::text), 'C'),
  plainto_tsquery('asian')
) score FROM businesses
WHERE jsonb_typeof(facets->'price_range') = 'number'
AND (facets->>'price_range')::numeric <= 2
AND ts_rank_cd(
  setweight(to_tsvector(name), 'A') ||
  setweight(to_tsvector(coalesce(description, '')), 'B') ||
  setweight(to_tsvector(facets::text), 'C'),
  plainto_tsquery('asian')
) > 0
ORDER BY score DESC;

In that example, we're now looking for all "asian" restaurants with a price_range facet of 2 or lower. Instead of treating the price_range as text as in earlier examples, we now cast it to a number explicitly. To do that safely, we also ignore any values that aren't numeric using jsonb_typeof.

In the Python equivalent, we'll take one argument for facets that should be tested for equality, and separate arguments for the less than / greater than ones.

def search_businesses(db, term, facets_eq=None, facets_lte=None, facets_gte=None):
    score = func.ts_rank_cd(
        func.setweight(
            func.to_tsvector(Business.name), 'A'
        ).op('||')(
            func.setweight(func.to_tsvector(func.coalesce(Business.description, '')), 'B')
        ).op('||')(
            func.setweight(func.to_tsvector(cast(Business.facets, Text)), 'C')
        ),
        func.plainto_tsquery(term)
    )
    q = select([Business.name, score])

    # filter out all results with zero scores
    wheres = [
        score > 0
    ]
    if facets_eq:
        # filter to businesses whose facets object contains all the key/value pairs passed to us.
        wheres.append(Business.facets.contains(facets_eq))

    if facets_lte:
        for k, v in facets_lte.items():
            wheres.extend([
                func.jsonb_typeof(Business.facets.op('->')(k)) == 'number',
                cast(Business.facets.op('->>')(k), NUMERIC) <= v
            ])

    if facets_gte:
        for k, v in facets_gte.items():
            wheres.extend([
                func.jsonb_typeof(Business.facets.op('->')(k)) == 'number',
                cast(Business.facets.op('->>')(k), NUMERIC) >= v
            ])

    # join all our filters with an AND, then feed them to the sqlalchemy query's where() method 
    q = q.where(and_(*wheres))

    # top scoring results first.
    q = q.order_by(score.desc())
    return db.execute(q)

At this point, we've built the backend we'd need to support Amazon-style searching and filtering.

Filtering by Geographical Distance

By comparison to the facet feature, filtering results by distance from a given point will be relatively simple. We'll use the <@> operator from Postgres' earthdistance module to calculate distances between points. Let's get asian restaurants with a price of 2 or lower that are within half a mile of where I'm sitting right now. Note that the point function somewhat-unintuitively takes the longitude argument first and the latitude argument second.

CREATE EXTENSION IF NOT EXISTS earthdistance CASCADE;

SELECT name, ts_rank_cd(
  setweight(to_tsvector(name), 'A') ||
  setweight(to_tsvector(coalesce(description, '')), 'B') ||
  setweight(to_tsvector(facets::text), 'C'),
  plainto_tsquery('asian')
) score FROM businesses
WHERE jsonb_typeof(facets->'price_range') = 'number'
AND (facets->>'price_range')::numeric <= 2
AND ts_rank_cd(
  setweight(to_tsvector(name), 'A') ||
  setweight(to_tsvector(coalesce(description, '')), 'B') ||
  setweight(to_tsvector(facets::text), 'C'),
  plainto_tsquery('asian')
) > 0
AND point(longitude, latitude) <@> point(-111.854952, 40.606536) < 0.5 
ORDER BY score DESC;

Now we're down to one result:

    name    | score 
------------+-------
 Asian Star |   1.2
(1 row)

In the Python function, we'll accept the center coordinates and distance limit as a tuple containing three floats:

def search_businesses(db, term, facets_eq=None, facets_lte=None, facets_gte=None,
                      miles_lat_long=None):
    score = func.ts_rank_cd(
        func.setweight(
            func.to_tsvector(Business.name), 'A'
        ).op('||')(
            func.setweight(func.to_tsvector(func.coalesce(Business.description, '')), 'B')
        ).op('||')(
            func.setweight(func.to_tsvector(cast(Business.facets, Text)), 'C')
        ),
        func.plainto_tsquery(term)
    )
    q = select([Business.name, score])

    # filter out all results with zero scores
    wheres = [
        score > 0
    ]
    if facets_eq:
        # filter to businesses whose facets object contains all the key/value pairs passed to us.
        wheres.append(Business.facets.contains(facets_eq))

    if facets_lte:
        for k, v in facets_lte.items():
            wheres.extend([
                func.jsonb_typeof(Business.facets.op('->')(k)) == 'number',
                cast(Business.facets.op('->>')(k), NUMERIC) <= v
            ])

    if facets_gte:
        for k, v in facets_lte.items():
            wheres.extend([
                func.jsonb_typeof(Business.facets.op('->')(k)) == 'number',
                cast(Business.facets.op('->>')(k), NUMERIC) >= v
            ])

    if miles_lat_long:
        assert len(miles_lat_long) == 3, 'miles_lat_long must contain three floats'
        for val in miles_lat_long:
            assert isinstance(val, float), 'miles_lat_long must contain three floats'
        miles, lat, lng = miles_lat_long
        distance = func.point(Business.longitude, Business.latitude).op('<@>')(func.point(lng, lat))
        wheres.append(distance <= miles)

    # join all our filters with an AND, then feed them to the sqlalchemy query's where() method 
    q = q.where(and_(*wheres))

    # top scoring results first.
    q = q.order_by(score.desc())
    return db.execute(q)

SQLAlchemy ORM Integration

Though we defined a Business model using SQLAlchemy's ORM, all our queries up to now have returned just business names and text match scores. In real life code, it's probably more useful to get back instances of our Business class. We can use SQLAlchemy's from_statement query method to get those back. Our last code snippet is a complete working Python example, with the changes necessary to play more nicely with the ORM:

from sqlalchemy import Column, Text, Float, Integer, ARRAY, create_engine, func, select, cast, and_
from sqlalchemy.dialects.postgresql import JSONB, NUMERIC
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm.session import sessionmaker

Base = declarative_base()


class Business(Base):
    __tablename__ = 'businesses'
    id = Column(Integer, primary_key=True)
    name = Column(Text, nullable=False)
    description = Column(Text)
    street_address = Column(ARRAY(Text), nullable=False)
    city = Column(Text, nullable=False)
    state = Column(Text, nullable=False)
    postcode = Column(Text, nullable=False)
    latitude = Column(Float, nullable=False)
    longitude = Column(Float, nullable=False)
    facets = Column(JSONB, nullable=False, server_default='{}')


def build_search_query(term, facets_eq=None, facets_lte=None, facets_gte=None, miles_lat_long=None):
    score = func.ts_rank_cd(
        func.setweight(
            func.to_tsvector(Business.name), 'A'
        ).op('||')(
            func.setweight(func.to_tsvector(func.coalesce(Business.description, '')), 'B')
        ).op('||')(
            func.setweight(func.to_tsvector(cast(Business.facets, Text)), 'C')
        ),
        func.plainto_tsquery(term)
    )
    q = select([Business])

    # filter out all results with zero scores
    wheres = [
        score > 0
    ]
    if facets_eq:
        # filter to businesses whose facets object contains all the key/value pairs passed to us.
        wheres.append(Business.facets.contains(facets_eq))

    if facets_lte:
        for k, v in facets_lte.items():
            wheres.extend([
                func.jsonb_typeof(Business.facets.op('->')(k)) == 'number',
                cast(Business.facets.op('->>')(k), NUMERIC) <= v
            ])

    if facets_gte:
        for k, v in facets_lte.items():
            wheres.extend([
                func.jsonb_typeof(Business.facets.op('->')(k)) == 'number',
                cast(Business.facets.op('->>')(k), NUMERIC) >= v
            ])

    if miles_lat_long:
        assert len(miles_lat_long) == 3, 'miles_lat_long must contain three floats'
        for val in miles_lat_long:
            assert isinstance(val, float), 'miles_lat_long must contain three floats'
        miles, lat, lng = miles_lat_long
        distance = func.point(Business.longitude, Business.latitude).op('<@>')(func.point(lng, lat))
        wheres.append(distance <= miles)

    # join all our filters with an AND, then feed them to the sqlalchemy query's where() method 
    q = q.where(and_(*wheres))

    # top scoring results first.
    q = q.order_by(score.desc())
    return q


def main():
    # This assumes we've already done a "CREATE DATABASE sqlsearch;" in psql.
    db_url = 'postgresql://postgres@/sqlsearch'
    Session = sessionmaker(bind=create_engine(db_url))
    sess = Session()
    q = build_search_query(
        'asian',
        facets_lte={'price_range': 2},
        miles_lat_long=(0.5, 40.606536, -111.854952),
    )
    businesses = sess.query(Business).from_statement(q)
    for b in businesses:
        print(b.name)

if __name__ == '__main__':
    main()

Performance

I won't be going into detailed performance optimization here, as the optimizations needed for your data could be very different from mine. Some general rules still apply, though. Specifically, indexes are the first thing to reach for if you see a performance problem with the kind of search implemented here. Postgres' Generalized Inverted Index (GIN) type can be used to store pre-calculated text search vectors and speed up the queries above. Using an index will require changing our queries slightly, to specify the language used when building tsvectors instead of pulling the default from Postgres config. (Indexes want to always return the same result from the same query, so don't want to rely on a mutable config setting.) You could create a GIN index on the columns we're searching like this:

CREATE INDEX business_ts_idx
ON businesses
USING gin((
  setweight(to_tsvector('english', name), 'A') ||
  setweight(to_tsvector('english', coalesce(description, '')), 'B') ||
  setweight(to_tsvector('english', facets::text), 'C')
));

If you are building a multi-lingual search tool, you'll probably want to store the language as a column on the table, and then refer to that column when doing your query instead of hardcoding the language as I've done with 'english' above.

GIN indexes can also be used to speed up the JSONB field querying that we're using for facet filtering:

CREATE INDEX business_facets_idx
ON businesses
USING gin(facets);

Wrapping Up

In this post we've covered how to do full text search across multiple weighted columns in a Postgres table, facet-based filtering on values in a JSONB field, and filtering based on distance from a geographical point. Along the way, we've also seen how using SQLAlchemy's expression layer can allow you to compose queries with any number of optional features. It's important to note that all the sorting and filtering work is being done inside Postgres, not in Python. All our SQLAlchemy code is just being used to produce a complex query that will be sent over to the database side. Finally, we saw how to use that expression-layer power from the higher level ORM.

My motivation in writing this post was to provide an alternative to a common pattern that I see among developers to reach for new tools instead of digging into the power of the ones they have already. In the case of search, developers are likely to say "Let's just use Elasticsearch." That may be the right solution for some cases, but I think the "just" in that sentence contains a false promise of simplicity. While setting up an Elasticsearch instance and initially copying some data into it may require less mental exertion than the code in this post, the suggestion to "just use Elasticsearch" hides the cost of maintaining another stateful service and the difficulty of guaranteeing that a separate search backend is in sync with your database. I hope that if Postgres' features become better known, then teams that are already using it for regular CRUD apps will be more likely to say "Let's just use Postgres" when they want to add search features.