Managing Snowflake's Procedure & UDF with Github

Managing Snowflake's Procedure & UDF with Github

Discover how integrating GitHub with Snowflake empowers seamless version control and collaborative workflows for managing your UDFs and procedures.

Snowflake is an incredible data platform that my company and I have been leveraging for about a year and a half. It's robust, reliable, and feature-rich, with an intuitive UI that makes it easy to navigate. Notably, we haven't experienced any accidents caused by Snowflake, underscoring its stability and reliability.

Among Snowflake's myriad features, User-Defined Functions (UDFs) and Procedures stand out. UDFs are functions that you can call within a SELECT (or other) query to format data to your specifications. Procedures, on the other hand, are functions that you can call to process tasks in the background, including creating, updating, and deleting records.

However, there was no way to manage, test, or review these functions before deploying them to production—until Snowflake implement integration with git. Although it's still in open preview, this integration has the potential to solve that problem effectively.

Prepare Codebase

Let's start by preparing the codebase. Here's the directory structure:

As you can see, I'm using Poetry to set up the virtual environment and manage packages. I also employ Ruff for linting, ensuring that my code remains clean and maintainable (well, one can hope (´ω`)).

The source code resides in the /src directory, which I’ve divided into /src/procedures and /src/udf. For this example, I'll create a simple fetch procedure that, given a table name, returns ten rows from it.

import os

from snowflake.snowpark import Session, Table


def fetch(session: Session, table_name: str) -> Table:
    return session.table(table_name).select(["id"]).limit(10)

if __name__== "__main__":
    from snowflake.snowpark import Session

    session = Session.builder.configs({
        "account": os.getenv("SNOWFLAKE__ACCOUNT"),
        "user": os.getenv("SNOWFLAKE__USER"),
        "password": os.getenv("SNOWFLAKE__PASSWORD"),
        "role": os.getenv("SNOWFLAKE__ROLE"),
        "warehouse": os.getenv("SNOWFLAKE__WAREHOUSE"),
        "database": os.getenv("SNOWFLAKE__DATABASE"),
        "schema": os.getenv("SNOWFLAKE__SCHEMA"),
    }).create()

    table = fetch(
        session=session,
        table_name="TEST_TABLE",
    )

    print(table.collect())

This simple example demonstrates how to fetch data from a Snowflake table using a stored procedure, managed and versioned in a GitHub repository. This code enables you to test it locally by running poetry run python3 src/procedures/fetch/fetch.py.

For the full code and additional examples, visit my GitHub repository.

Create Github Personal Access Token

We've written code for Snowflake to understand! Yay! But now we need Snowflake to read our code. To enable this, we need to generate a personal access token (PAT) that allows Snowflake to access our GitHub repository.

Here is the step:

  1. Open the GitHub page and click on your profile picture in the top right corner.

  2. Click on Settings.

  3. In the sidebar, click on Developer settings.

  4. Go to Personal access tokensFine-grained tokens.

  5. Generate a new token. Select your repository and set the expiry date.

  6. Grant it permission to read contents.

  7. Done!

You should now have your GitHub PAT, which will look something like this:

github_pat_xxxxxxxxxxxx....

Create Needed Snowflake Resources

Okay, let's jump into the Snowflake console and get our hands dirty. First, we need to save our GitHub Personal Access Token in Snowflake as a Secret.

CREATE OR REPLACE SECRET git_secret
    TYPE = password
    USERNAME = 'dummy-username'
    PASSWORD = 'github_pat_xxxxxxxxxxxx....';

The username doesn't really matter; you can input whatever you want. The password must be the generated GitHub PAT token.

Next, using that Secret, let's create our API Integration to integrate GitHub and Snowflake.

CREATE OR REPLACE API INTEGRATION git_api_integration
  API_PROVIDER = git_https_api
  API_ALLOWED_PREFIXES = ('https://github.com/alvinend')
  ALLOWED_AUTHENTICATION_SECRETS = (git_secret)
  ENABLED = TRUE;

Using that API Integration, let's create the Git repository.

CREATE OR REPLACE GIT REPOSITORY snowflake_extensions
  API_INTEGRATION = git_api_integration
  GIT_CREDENTIALS = git_secret
  ORIGIN = 'https://github.com/alvinend/snowflake-extensions.git';

Invoke fetch to fetch the repository.

ALTER GIT REPOSITORY snowflake_extensions FETCH;

Check if our Snowflake Git repository can actually read it.

LS @snowflake_extensions/branches/main;

Now, let's declare our procedure. This simple procedure accepts a table name and returns ten rows from that table.

CREATE OR REPLACE PROCEDURE fetch(
    table_name VARCHAR,
)
  RETURNS TABLE(
    ID VARCHAR,
    NAME VARCHAR
  )
  LANGUAGE PYTHON
  RUNTIME_VERSION = '3.11'
  PACKAGES = ('snowflake-snowpark-python')
  IMPORTS = ('@temp_snowflake_extensions/branches/main/src/procedures/fetch/fetch.py')
  HANDLER = 'fetch.fetch';

All that's left to do is call it!

CALL fetch(
    'TEST_TABLE'
);

Closing

Managing Snowflake's UDFs and procedures with GitHub can significantly streamline your development process, enabling collaborative work and ensuring better code management. By leveraging Snowflake's powerful data platform and integrating it with GitHub, you can easily version-control your UDFs and procedures, reducing the risk of errors and making your workflows more efficient.

In this blog, we walked through the entire process—from creating a GitHub personal access token to integrating it with Snowflake. By following these steps, you can ensure that your Snowflake environment is robust, scalable, and easy to manage.

Happy coding, and may your data always be clean and your queries always be fast!

Did you find this article valuable?

Support Alvin Endratno's Blog by becoming a sponsor. Any amount is appreciated!