Finding Python Models
Loading Document Objects to Beanie Dynamically
I have a feeling this post is going to be quite dense. The idea to write it came from an episode of Talk Python where Michael Kennedy interviewed Roman Right, the creator of Beanie, my Python object-document mapper (ODM) of choice for MongoDB.
Lightning fast background here: Beanie is a library that provides a way to connect with and interact with MongoDB databases asynchronously. It does so primarily through a Document
class that is used as the parent class to your database models. Beanie then allows you to perform basic CRUD operations (among other advanced features) by using your database models directly.
Phew!
Anyway, on the podcast mentioned above, the subject came up about how the init_beanie
function (which initializes a connection to the MongoDB database) requires a list of Document
models to be passed in as a parameter. (This is necessary to specify which models correspond to the active database connection—as one could potentially have different models for different databases).
In any case, it's very likely that you will have multiple database models within a particular application that you would want to initialize.
Especially during development, it would be nice if you could ask Beanie to include all relevant models contained within a given directory.
And yes, Michael asked that very same question to Roman.
While that feature hasn't been implemented (yet?), I thought it would be a good idea to write about my solution.
In Plain English
As I've previously mentioned, I started this project as a Flask application, and I was facing a similar issue. I wanted to be able to load my database models dynamically.
I came across this absolutely fantastic resource by Bob Waycott explaining precisely how to do that very thing. In Flask.
While I initially started building this site in Flask and SQLAlchemy, I eventually moved away from those libraries altogether, but the idea of a dynamic loader stayed with me.
To accomplish a similar task with Beanie, my basic solution needed to accomplish the following:
- Walk a given directory in search of all modules (my .py files)
- Look inside each of those modules in search of objects
- Determine if objects found were Beanie
Document
models - Create an iterable of the models found
- Return a list with dot separated paths to each of those models
Where to Start
I don't want to rehash instructions on how to use Beanie. The documentation already does an awesome job at that.
Relevant to this topic, I'm focusing on how one goes about connecting to a database. It involves creating the client (using the Motor library) and initializing the connection with Beanie's built in method.
That part of the code looks like this:
await init_beanie(database=client.db_name, document_models=[Sample])
Note that the Beanie documentation also states:
initbeanie supports not only list of classes for the documentmodels parameter, but also strings with the dot separated paths
As such, my end goal is to provide a list for document_models
that only includes my relevant Beanie models. Something like:
document_models=[
"app.models.article.ArticleModel",
"app.models.user.UserModel",
"app.models.tags.TagModel"
]
So, to get that list of dot separated paths, I'll write a load_beanie_models
function to find relevant models in a given directory.
Simple, right?
To Be or Not to Beanie
My solution that follows is not necessarily chronological in terms of the logic laid out above, but hopefully it makes sense as I start explaining it.
One of the primary tasks I will have to do is determine whether a given object is a Beanie Document
to begin with.
To do so, I created a simple function that receives an item
and determines if it is a Beanie Document
.
It's pretty straightforward:
from inspect import isclass
def is_beanie_model(item: Model) -> bool:
"""Determines if item is a Beanie Document object."""
return (isclass(item) and issubclass(item, Document)
This uses the builtin inspect
library to access the isclass
method. It checks to make sure item
is a class, and in addition, a subclass of the Beanie Document
model. The return value is a boolean True
or False
.
As an aside: All Beanie database objects are built as a subclass of
Document
, which itself is a subclass of pydantic'sBaseModel
class.
Let's Go Get Them
Before I jump into the more complex parts, I decided to create a simple function with no arguments. It is responsible for calling my dynamic_loader
function (which I'll talk about below) and gathers objects based on the is_beanie_model
comparison function mentioned above, as well as a string value of the directory I want to traverse.
This allows me to potentially use the loader for things other than Beanie models. (Bob Waycott—who I mentioned above—uses it as a dynamic way to register views in Flask.)
def get_beanie_models() -> list[str]:
"""Dynamic Beanie model finder."""
return dynamic_loader("models", is_beanie_model)
This function will provide me with the end result—a list of my Beanie database models. You also see a peek of my dynamic_loader
function, which I have not gotten to quite yet.
Stay tuned.
But First, Path to Enlightenment (or Models)
You'll note that the dynamic_loader
function takes a string value of "models".
As mentioned above, this is the name of the directory that contains all my database models (I could theoretically start searching at my root directory, in case your models are peppered around in your app, though I wouldn't recommend that—but app structure is another topic altogether).
For reference, this is a simplified version of my app structure. All of my functions are defined in the src/lib/util.py
module.
src/
┣ api/
┣ core/
┃ ┣ config.py
┃ ┣ security.py
┃ ┗ __init__.py
┣ crud/
┃ ┣ article.py
┃ ┣ user.py
┃ ┗ __init__.py
┣ db/
┃ ┣ db.py
┃ ┗ __init__.py
┣ lib/
┃ ┣ util.py # my functions live here
┃ ┗ __init__.py
┣ models/
┃ ┣ mixins/
┃ ┣ article.py
┃ ┣ base.py
┃ ┣ tags.py
┃ ┣ user.py
┃ ┗ __init__.py
┣ main.py
┗ __init__.py
I decided to build all my app models in a "models" subdirectory. As a result, I want to do a recursive search in that directory for all existing modules (any .py
files).
Here, I use the excellent, builtin Pathlib library to help with all of this. Again, from my util.py
module:
from typing import Iterator
from Pathlib import Path
APPLICATION_DIR = Path(__file__).parent.parent # App's root directory
def get_modules(module) -> Iterator[str]:
"""Returns all .py modules in given file_dir as
a generator of dot separated string values.
"""
file_dir = Path(APPLICATION_DIR / module)
idx_app_root = len(APPLICATION_DIR.parts) - 1
modules = [f for f in list(file_dir.rglob("*.py"))
if not f.stem == "__init__"]
for filepath in modules:
yield (".".join(filepath.parts[idx_app_root:])[0:-3])
And as these things go, here's the breakdown.
Depending on where you are in your app directory making the call, you can get a Pathlib
object by using Path(__file__)
and some combination of .parent
notation to specify your application root.
Once we have the APPLICATION_DIR
(your root directory), you can append the path with /
and the name of your model directory (I call the variable module
). This becomes your file_dir
which will be searched recursively.
Ultimately, once we find the paths to all our .py
files, I will use Pathlib's .parts
to split the path into individual components of that path. That may sound more complicated than it is.
But first, I want to know the "index" of where my dot separated string paths need to start.
For example, lets say my APPLICATION_DIR
is located in c:\\project\app\src
. After splitting that in parts, I will have a tuple of ("c:\\", "projects", "app", "src)
and len
value of four.
When using import statements, I want my dot separted paths to start with src
. So later, let's say my Beanie Document
is located in c:\\project\app\src\models\article.py
, I would want my dot separated path to say something like src.models.article.BeanieDocument
.
So backing up a bit, If I split my APPLICATION_DIR
into "parts", I want to know the index of my src
directory. That will always be the equivalent of len(APPLICATION_DIR.parts) - 1
.
For those of you counting at home, that means the
idx_app_root
value is 4 - 1... or 3... akasrc
.
Next, there's that scary looking list comprehension.
modules = [f for f in list(file_dir.rglob("*.py"))
if not f.stem == "__init__"]
I'll unpack it and write it long form to see if it helps (maybe I should do this in my app too? Is this more pythonic?):
all_modules = list(file_dir.rglob("*.py"))
modules = []
for file in all_modules:
if not file.stem == "__init__":
modules.append(file)
Pathlib allows us to rglob
the given directory, which means it yields all existing files that match the given pattern. Next, we iterate over all_modules
to ensure we are not pulling in any .py
files named __init__
(if you've read this far, you are probably aware as to why we would want to exclude those files).
Now that we have the paths to all the modules (less __init__.py
) from the given directory, we can create a generator to return the dot separated paths to these modules.
for file in modules:
yield (".".join(filepath.parts[idx_app_root:])[0:-3])
Remember that idx_app_root
? Now I can use it on each of the paths to get the relevant, dot separated path.
So if you follow, you can see that c:\\project\app\src\models\article.py
becomes ("c:\\", "project", "app", "src", "models", "article", ".py")
after splitting it in parts.
Since we know my index is 3, I can join all the values from that index forward using the slicing notation of [3:]
, or rather, [idx_app_root:]
.
By using .join
method on the path parts, I then get "src.models.article.py"
as my string value. Lastly, I slice away the last three characters with [0:-3]
to remove the ".py".
My resulting path looks like src.models.article
.
Wow, that seems like way too much explanation for something relatively simple. To be honest, I could've really used something like this when figuring this out, so here it is for posterity.
Almost There
Lastly, I need to check each one of those modules to see if there are any Beanie models in there to begin with. But a quick aside.
I want you to take a quick look at my article.py
module, which holds my database model for my blog articles:
from beanie import Document
from pydantic import BaseModel
__all__ = (
"ArticleBase",
"Article",
"ArticleCreate",
"ArticleDB",
)
class ArticleBase(BaseModel):
... # hidden for sake of brevity
class Article(ArticleBase):
... # ditto
class ArticleCreate(ArticleBase):
... # same
class ArticleDB(ArticleBase, Document):
... # samesies
Notice two things.
First of all, I explicitly list the objects that I have defined in my module and assign those to the __all__
variable. I want to make sure that I control what is seen and not seen throughout my app's code.
Secondly, notice how most of my models are actually not Beanie models at all. The only one that matters to Beanie is the last one, ArticleDB
. It is the only one subclassed by the Document
class.
Talking about my database model design is beyond the scope here, but I wanted to point out that even though the __all__
variable is optional here, I do make use of it in my dynamic loader function.
DYNAMITE!!
We made it! Almost...
The last piece of the puzzle is taking all the dot separated paths we found with the get_models
function and extracting only the paths that contain Beanie Document
objects.
May I present:
from importlib import import_module
def dynamic_loader(module, compare) -> list:
"""Iterates over all .py files in `module` directory,
finding all classes that match `compare` function.
"""
items = []
for mod in get_modules(module):
module = import_module(mod)
if hasattr(module, "__all__"):
objs = ([getattr(module, obj)
for obj in module.__all__])
items += [o for o in objs
if compare(o) and o not in items]
return items
This might look a little complicated, but it really isn't. The real hero here is the import_module
method from the standard importlib
library. This allows us to import a module within our function (just like you do at the top of your .py
files).
So with the get_modules
function we defined, we get a generator object containing all modules within a given directory.
Then, for each module (mod
), we import it into our function and inspect it to see if it has the __all__
variable I mentioned above.
If so, for each object contained in __all__
, we use the getattr
method to get the named attribute of that object. (Hint: that's how we can tell if an object is a Beanie Document
!)
Once we have a list of those objects and their attributes, I use a list comprehension to sort out which objects are Beanie objects by using the comparator function that we started with (is_beanie_model
).
The resulting items
list contains only the relevant objects.
Python By ZZZ
I know this was a long one.
We are really done here, but I want to point out an oversight I made when finding this solution. I was so focused on getting the "strings with dot separated paths"—I missed the fact that I was already technically done.
Observe...
def get_beanie_models() -> List[str]:
"""Dynamic Beanie model finder."""
return dynamic_loader("models", is_beanie_model)
def db_models() -> List[str]:
"""Create a list of Beanie models to include in db initialization."""
objs = get_beanie_models()
obj_list = [f"{o.__module__}.{o.__name__}" for o in objs]
return obj_list # this works
# return objs # this also works
To create the dot separated paths, I took the Beanie Document
objects and created a new list, using __module__
to get the module path and the __name__
method to get the Document
object's name.
This returns a nice, clean dot separated string path. In my case, it would look something like src.models.article.ArticleDB
.
However, I glossed over the primary part that allows me to pass the actual Beanie objects directly to the document_models
parameter.
Again, from the Beanie docs, now with my emphasis:
initbeanie supports not only *list of classes* for the documentmodels parameter, but also strings with the dot separated paths
By using objs
directly, I am sending the actual object to the init_beanie
function. As a result, instead of a path, I would be passing something like <class 'src.models.article.ArticleDB>
.
Anyway, both solutions work. If you're trying this at home, go for the second one. (Ignore the db_models
function.)
Live and learn I guess.
I actually didn't know about the second solution until I wrote this article. I was so fixated on getting the dot separated paths that I missed the more obvious (and simpler) solution. The lesson here is: Write a blog article, learn something.
So now, all I have to do to load my Document
models to Beanie is:
# Initialize Beanie with dynamically loaded Document models
await init_beanie(
database=client.db_name,
document_models=db_models() # or just get_beanie_models()
)
And there you have it.
That's how I dynamically load my database models to Beanie. When I create new database objects, as long as they are subclassed with Beanie's Document
model and included in the module's __all__
variable, I don't have to remember to include it in init_beanie
.
Now, I'm going to get some sleep. It's late.
Sigh, wanted to mention one last thing. Note that you can use the
dynamic_loader
function to find just about any kind of Python object in any given directory. All it would take is a comparator function and the name of any directory. So you could technically create something similar tois_beanie_model
, maybe likeis_route
oris_view
and end up with a similar end result.