Leveraging dbt and Python for Predictive Modeling
Exploring dbt’s Python entry point functionality in v1.5

Last week, I had the opportunity to attend a wonderful talk titled “Unleashing the Power of dbt and Python for Modern Data Stack” by Features and Labels (fal-ai) at EuroPython 2023. The talk focused on the integration of dbt and Python in building and training ML modeling pipelines highlighting the untapped potential of combining dbt and Python for efficient data transformation and analysis.
While dbt currently supports Python integrations with Snowflake and Databricks, support for other adapters such as Redshift does not exist yet. In this Medium post, I would like to share my perspective on leveraging new dbt functionality in v1.5 for predictive modeling.
We will walk through the entire process, including fetching external data, storing it in a Postgres database, performing transformations and data cleansing with dbt, and finally, making inferences using a pre-trained lightGBM model for profit scoring of loans and storing scores in the same database.
More information about how the model was trained and the data it was trained on is available in this LinkedIn post.
Getting Started
To follow along with this tutorial, make sure you have Python 3.11 installed, and set up Docker for the Postgres database. We will use Python’s internal venv
package to create a virtual environment for our project. I am using macOS Monterey version 12.6.3 (M2 chip).
Setting up the Virtual Environment
- Install Python 3.11 using brew:
brew install python@3.11
- Download Docker for your machine from Docker’s website
First, you will need to create a project folder, which is in my casedbt-python
.
Now, let’s set up the virtual environment and install the necessary packages:
python3.11 -m venv my_env
source my_env/bin/activate
pip install pandas requests scikit-learn lightgbm==3.3.5 psycopg2-binary sqlalchemy dbt-core dbt-postgres
Setting up the PostgreSQL Database
We need to build a Docker container with our PostgreSQL database. You can download Postgres by running:
docker pull postgres
To set up a local instance of Postgres, you can follow this guide from Week 1 Data Engineering Zoomcamp. Remember to mount the database to your local drive to avoid data loss if the Docker container is stopped. More guidance about using Postgres with Docker is available in this post.
Below is an example of spinning up a local Postgres database instance:
docker run -d \
- name postgres_db \
-e POSTGRES_USER="root" \
-e POSTGRES_PASSWORD="root" \
-e POSTGRES_DB="postgres_db" \
-v pg_data:/var/lib/postgresql/data \
-p 5432:5432 \
postgres
Creating the dbt Project
Now, let’s set up our dbt project. Initiate a dbt project folder within the same directory:
dbt init my_dbt_project - skip-profile-setup
Next, set up a profiles.yml
file, which contains our credentials:
cd my_dbt_project
nano profiles.yml
Copy the following configuration into the yaml file:
dbt_playground:
target: dev
outputs:
dev:
type: postgres
host: localhost
user: "root"
pass: "root"
port: 5432
dbname: postgres_db
schema: dbt_playground
threads: 1
Verify your dbt setup:
dbt debug
If you see “All checks passed”, your dbt setup is complete.
Pipeline

The first module within our pipeline is a Python script fetch_and_write_data.py
that fetches external data from Bondora and ingests it to Postgres. The resulting table will appear under the name dbt_playground.bondora_loan_dataset
in our Postgres database.
The second part of the pipeline is dbt_transform_data.py
, which transforms the raw data obtained in the previous step into a dataset which we can feed to our scoring model.
To apply the transformations to the extracted dataset, we will need to create a script called dbt_transformation.sql
inside the /models
folder of my_dbt_project
.
touch dbt_transformation.sql
nano dbt_transformation.sql
Copy the transformation script into dbt_transformation.sql
from here.
After the second script executes, the data will become available in the database in the table dbt_playground.dbt_transformation
.
The final step is to score loans using the module run_scoring.py
. In this step we download our pickled model and apply predictions on the transformed dataset. After running the script, we should see that the average predicted Annualized Rate of Return (ARR) in our sample being equal to 7%.
The scoring outputs are stored into a table dbt_playground.scoring_outputs
, which contain the contract ID information, profit score, and a timestamp.
Conclusion
And that’s about it. In this post, we have demonstrated a powerful and flexible approach to predictive modeling by combining the power of dbt and Python.
Integration using dbt’s Python entry point enables smooth data transformation, analysis, and model inference, making it a valuable addition to any predictive modeling workflow. With the example of P2P scoring using Bondora’s loan dataset, we hope this guide inspires you to explore dbt + Python for your own data projects. Happy modeling!