It's what happens when there's not enough time during the day...

Using Beanie With FastAPI and MongoDB

Let's Do A Weighted Text Search with... PyMongo?

7 min read Dec 13, 2021 MongoDB Beanie

Using Beanie With FastAPI and MongoDB

Let's Do A Weighted Text Search with... PyMongo?

I decided to use MongoDB as the backend database solution for my blog, and as a result, there weren't quite as many resources to refer to when trying to find a specific solution. Ergo, how do I search weighted text fields in MongoDB with Python?

Background

I have a feeling that my articles for this blog may be all over the place. I previously wrote about dynamically loading Beanie Document models when initiating a database connection, but I have yet to write about creating those Document models to begin with.

I suppose it would make more sense if I went in chronological order, so those of you at home who wish to follow along could do so without so much cognitive dissonance.

There may come a time when I do something like that down the road. But for now, I think I need to write these things down before I forget.

Something I wish I had done at the very beginning of this blog process is to jot down all my lessons learned.

However, when I'm head down trying to figure out some Python code or delving into the database design process... Yikes! Who has time for prose?

Live and learn, I guess.

My ODM of Choice

As I mentioned, I'm using MongoDB as my application database, which means that I needed an asynchronous ODM (object-document mapper) to interface with the database.

Note: I switched from PostgreSQL to MongoDB halfway through development. I'll talk about that switch some other time.

When I originally switched to FastAPI. I tried ODMantic and it didn't quite tick for me. Maybe it's because I didn't know what the hell an asynchronous ODM was supposed to do in the first place. (Not sure if I even know now.)

Anyway, at that point, I switched to Beanie to check it out and have been happy so far. (If you look at Beanie's github history, you'll note less popularity, but a bit more activity than ODMantic.)

Both packages provide a Pythonic way to interact with your database, and both are based on the excellent pydantic library.

Though the packages are similar, I gravitated toward Beanie. It felt more accessible and inviting. Your mileage may vary.

Simple Beanie Stuff

Beanie makes it easy to do basic CRUD operations on a database (create, read, update, delete). It is also a cinch to get started.

Note: I used Beanie 1.8.6 when this was written. The API may have changed slightly since then.

For example, defining a database model is very easy. This is a simplified version of my Article class, which in turn represents the data that is stored in MongoDB for each one of my blog posts.

class Article(Document):  # This is the db model
    main_title: str
    alt_title: str
    summary: str
    content: str

And once you have stored the data, this is how easy it is to do a query:

result = Item.find(search_criteria)

Yes, Beanie has its own API for defining the search criteria, but it's very intelligible. If I want to find an article with the title "Hello... World?", then I can simply write:

my_article = await Article.find_one(
    Article.main_title == "Hello... World?"
    )

I'm not going to go down the path of trying to demonstrate everything that you can do with Beanie. The documentation already does an excellent job of that.

But I do want to focus on a little item that was giving me a lot of trouble.

Searching

One of the things I knew that I wanted to add to the blog was the ability to do a search on all posts. It goes without saying that this was completely unnecessary (at the time). I think I had only written two or three articles.

But, I found out that MongoDB supports query operations on fields using a text index, and I knew I would want the ability to search through my titles, summary, and content fields using that feature.

Despite reading about it on the MongoDB website, I couldn't figure out how to replicate that kind of search with Beanie's syntax.

As a matter of fact, I first needed to figure out how to add the text index to the necessary fields.

Indexing A Field With Text Type

As it turns out, adding a text index to an individual field is pretty easy. As you see in the documentation, this is how I could add it to an article's content field.

class Article(Document):
    main_title: str
    alt_title: str
    summary: str
    content: Indexed(index_type = pymongo.TEXT)   # the indexed field

See? Simple. Using my database model class (built on top of pydantic), Beanie uses the "Indexed" field type and ...

Wait, wait, wait. What's that?

PyMongo?

Dang. Something else to learn?

Indexing Multiple Fields

As I tend to do in these circumstances, I gave myself a crash course on the MongoDB text search, this time, straight from the MongoDB docs. As I read, I gathered that in addition to my desire to have multi-field indexes, I wanted each of the fields to have different "weights."

That is to say, search terms in my main_title field should be more important than those in my alt_title, and that weight should be distributed as I see fit. And of course, as a result, the weight should determine the sort order of the results (more relevant items listed first).

MongoDB allows for that. But I couldn't find anything about weights in the Beanie docs.

This Is Getting Heavy

I know right?

There is one bit of nuance to Beanie that I had sort of missed when I started working with it. Namely, that it supports the aforementioned native PyMongo syntax.

For example, note how Beanie uses the PyMongo TEXT type to define indexed field types.

Look, I'm not trying to get you to follow me down this rabbit hole. It could go pretty deep. Needless to say, I don't really know much else about PyMongo, aside from how the most of its syntax mimics the native MongoDB syntax.

Well that was good news. That meant that I could somehow insert (nearly) native MongoDB syntax straight into my Python application.

Indexes and Weights Revisited

My first task was to ensure my database model had the indexed fields properly defined and weighted. Here's what the code looks like in the mongo shell:

db.article.createIndex(
{
    main_title: "text",
    alt_title: "text",
    content: "text"
},
{
    weights: {
    content: 10,
    keywords: 5
    },
    name: "TextIndex"
}
)

Yuck. What are those curly braces doing all over the place?

It's kind of like a Python dictionary, right? Well, let's observe how Beanie handles the same thing.

class Article(Document):  # previously defined db model
    ...

    class Collection:
        name = "articles"   # name of the collection in MongoDB
        indexes = [
            "text_index",   # name of the index
            IndexModel(
                [
                    ("main_title", pymongo.TEXT),
                    ("alt_title", pymongo.TEXT),
                    ("content", pymongo.TEXT)
                    ],
                weights={
                    "main_title": 10,
                    "alt_title": 5,
                }
            )
        ]

While IndexModel is a feature of PyMongo, Beanie allows you to insert it directly to your database model through the "Collection" subclass.

Note: The Collection subclass has since been renamed to Settings.

The above code accomplished the first piece of the puzzle. Namely, main_title, alt_title, and content fields are all indexed. In addition, my main_title has a higher weight than alt_title, and both have a higher weight than content (which defaults to a weight value of one.)

That's So $meta

So the last piece of the puzzle was how to properly write the query with Beanie.

As mentioned above, Beanie already has its own language for CRUD operations, but I was having a hard time finding the best way to use the indexed fields.

I needed a way to emulate the MongoDB command that queries and sorts based on the weights. So, here's what searching for a "random blog topic" might look like in the mongo shell:

db.article.find(
{ $text: { $search: "random blog topic" } },
{ score: { $meta: "textScore" } }
).sort( { score: { $meta: "textScore" } } )

After searching far and wide, I was able to find something similar in the PyMongo documentation (I told you, it was a rabbit hole), which meant that a similar statement was possible within Beanie. Here's what it looks like in Python (thanks to Beanie and PyMongo):

sorted_results = await Article.find(
        {'$text': {'$search': search_value}}).sort(
            [('score', {'$meta': 'textScore'})])

Two quick things to note.

In the mongo shell, you see the score of the $meta variable twice. The first time you see it, it is "projecting" the score on your database models. I tried it this way a few times and couldn't get it working.

I believe that is because MongoDB no longer needs the projection in order to sort on that $meta score, so I dropped that first portion in my Python code.

The second thing you'll note is that the sort is performed on a Python list containing the textScore, which is different than what you would do through mongo shell.

But all things considered, it was a somewhat simple solution all along.

Lessons Learned

Thanks to writing these blog posts (and a hint from Roman himself) Beanie does have a layer of accessing MongoDB operators in a simpler way than going directly to PyMongo syntax.

Using the $text operator in Beanie could have made things a little bit easier. For example, the above snippet could also have been written like this:

from beanie.operators import Text

sorted_results = await Article.find(
        Text(search_value)).sort(
            [('score', {'$meta': 'textScore'})])

Sometimes when looking for a solution, you could get sucked into an XY Problem. I think that may have happened to me a couple times while working on this. In the end, at least in this case, my ultimate goal of using weighted search on selected fields was achieved.

Search and You Will Find

I know this is an overly long article with a very small and specific payoff. My hope is that if you were only looking for the piece of code that will make your multi-indexed model take advantage of weighted text searching, that you scrolled to the bits that you needed and utilized the old trusty copy/paste method.

But I also thought it would be helpful to detail my thought process when looking for a not-so-obvious solution. Even when you're not very familiar with the packages you're working with, you're bound to find a solution if you keep at it. Hopefully this helped cut down on your own search.