Managing Snowflake's Procedure & UDF with Github
Discover how integrating GitHub with Snowflake empowers seamless version control and collaborative workflows for managing your UDFs and procedures.
Snowflake is an incredible data platform that my company and I have been leveraging for about a year and a half. It's robust, reliable, and feature-rich, with an intuitive UI that makes it easy to navigate. Notably, we haven't experienced any accidents caused by Snowflake, underscoring its stability and reliability.
Among Snowflake's myriad features, User-Defined Functions (UDFs) and Procedures stand out. UDFs are functions that you can call within a SELECT
(or other) query to format data to your specifications. Procedures, on the other hand, are functions that you can call to process tasks in the background, including creating, updating, and deleting records.
However, there was no way to manage, test, or review these functions before deploying them to production—until Snowflake implement integration with git. Although it's still in open preview, this integration has the potential to solve that problem effectively.
Prepare Codebase
Let's start by preparing the codebase. Here's the directory structure:
As you can see, I'm using Poetry to set up the virtual environment and manage packages. I also employ Ruff for linting, ensuring that my code remains clean and maintainable (well, one can hope (´ω`)).
The source code resides in the /src
directory, which I’ve divided into /src/procedures
and /src/udf
. For this example, I'll create a simple fetch procedure that, given a table name, returns ten rows from it.
import os
from snowflake.snowpark import Session, Table
def fetch(session: Session, table_name: str) -> Table:
return session.table(table_name).select(["id"]).limit(10)
if __name__== "__main__":
from snowflake.snowpark import Session
session = Session.builder.configs({
"account": os.getenv("SNOWFLAKE__ACCOUNT"),
"user": os.getenv("SNOWFLAKE__USER"),
"password": os.getenv("SNOWFLAKE__PASSWORD"),
"role": os.getenv("SNOWFLAKE__ROLE"),
"warehouse": os.getenv("SNOWFLAKE__WAREHOUSE"),
"database": os.getenv("SNOWFLAKE__DATABASE"),
"schema": os.getenv("SNOWFLAKE__SCHEMA"),
}).create()
table = fetch(
session=session,
table_name="TEST_TABLE",
)
print(table.collect())
This simple example demonstrates how to fetch data from a Snowflake table using a stored procedure, managed and versioned in a GitHub repository. This code enables you to test it locally by running poetry run python3 src/procedures/fetch/fetch.py
.
For the full code and additional examples, visit my GitHub repository.
Create Github Personal Access Token
We've written code for Snowflake to understand! Yay! But now we need Snowflake to read our code. To enable this, we need to generate a personal access token (PAT) that allows Snowflake to access our GitHub repository.
Here is the step:
Open the GitHub page and click on your profile picture in the top right corner.
-
Click on Settings.
-
In the sidebar, click on Developer settings.
-
Go to Personal access tokens → Fine-grained tokens.
-
Generate a new token. Select your repository and set the expiry date.
-
Grant it permission to read contents.
-
Done!
You should now have your GitHub PAT, which will look something like this:
github_pat_xxxxxxxxxxxx....
Create Needed Snowflake Resources
Okay, let's jump into the Snowflake console and get our hands dirty. First, we need to save our GitHub Personal Access Token in Snowflake as a Secret.
CREATE OR REPLACE SECRET git_secret
TYPE = password
USERNAME = 'dummy-username'
PASSWORD = 'github_pat_xxxxxxxxxxxx....';
The username doesn't really matter; you can input whatever you want. The password must be the generated GitHub PAT token.
Next, using that Secret, let's create our API Integration to integrate GitHub and Snowflake.
CREATE OR REPLACE API INTEGRATION git_api_integration
API_PROVIDER = git_https_api
API_ALLOWED_PREFIXES = ('https://github.com/alvinend')
ALLOWED_AUTHENTICATION_SECRETS = (git_secret)
ENABLED = TRUE;
Using that API Integration, let's create the Git repository.
CREATE OR REPLACE GIT REPOSITORY snowflake_extensions
API_INTEGRATION = git_api_integration
GIT_CREDENTIALS = git_secret
ORIGIN = 'https://github.com/alvinend/snowflake-extensions.git';
Invoke fetch to fetch the repository.
ALTER GIT REPOSITORY snowflake_extensions FETCH;
Check if our Snowflake Git repository can actually read it.
LS @snowflake_extensions/branches/main;
Now, let's declare our procedure. This simple procedure accepts a table name and returns ten rows from that table.
CREATE OR REPLACE PROCEDURE fetch(
table_name VARCHAR,
)
RETURNS TABLE(
ID VARCHAR,
NAME VARCHAR
)
LANGUAGE PYTHON
RUNTIME_VERSION = '3.11'
PACKAGES = ('snowflake-snowpark-python')
IMPORTS = ('@temp_snowflake_extensions/branches/main/src/procedures/fetch/fetch.py')
HANDLER = 'fetch.fetch';
All that's left to do is call it!
CALL fetch(
'TEST_TABLE'
);
Closing
Managing Snowflake's UDFs and procedures with GitHub can significantly streamline your development process, enabling collaborative work and ensuring better code management. By leveraging Snowflake's powerful data platform and integrating it with GitHub, you can easily version-control your UDFs and procedures, reducing the risk of errors and making your workflows more efficient.
In this blog, we walked through the entire process—from creating a GitHub personal access token to integrating it with Snowflake. By following these steps, you can ensure that your Snowflake environment is robust, scalable, and easy to manage.
Happy coding, and may your data always be clean and your queries always be fast!