Snowflake is a cloud data warehousing system, currently ranked as the #12 database system at DB Engines. Snowflake can easily scale up or down to match the workload, and with the separation of the compute and storage functions, those capabilities can be scaled independently to meet needs. A warehouse can be set to suspend after a set interval of inactivity, and the start again once needed. Snowflake can run in AWS, Azure, or Google Cloud, so you’re not locked in to a specific platform.
Carnegie Mellon University has a video Snowflake Data Warehouse Internals that goes into depth on Snowflake internals.
Free Trial:
At the top right of the home page is a ‘Start For Free’ button where you can sign up for a 30 day free trial. You select an edition (Standard, Enterprise, Business critical) as well as your cloud provider (AWS, Azure, Google Cloud). You can also pick the languages you want to use to interact with the warehouse (I chose SQL and Python).
There’s already a test warehouse set up along with test data for tutorials, but we can also create our own objects.
Connecting To Snowflake:
Once we have an account setup, there are a few ways to connect.
1) We can write queries in the web GUI using Worksheets.
2) We can connect with a query tool. I like using DBeaver, here’s how to set up a connection with it (I’m using the Community edition, version 23.0.2).
Select ‘New Database Connection’, then select Snowflake.
Click on the ‘Driver Properties’ tab, then follow the prompts to install the Snowflake driver. Once completed, go back to the ‘Main’ tab.
Host – Use the snowflakecomputing.com address given to you at signup.
Port: 443
Database: SNOWFLAKE (Trial default)
Warehouse: COMPUTE_WH (Trial default)
Fill in with the Username and Password set at signup.
3) SnowSQL – CLI: Command Line tool to work with the warehouse.
4) Snowflake SQL REST API
Snowflake Console:
Warehouses:
The warehouse is the top level container, comparable to an instance in SQL Server.
In the left-hand menu, we can go to ‘Admin’ then ‘Warehouses’ to create a new warehouse (‘ + Warehouse’).
There are a few parameters we can set:
Sizes: X-Small up to 6XL
Warehouse Type: Standard or Snowpark-optimized (Expanded memory for machine learning or other memory intensive work)
Databases:
In the left-hand menu, go to ‘Data’ then ‘Databases’ to create a new database (‘+ Database’). Name is the only required parameter.
Worksheets:
Worksheets can be used to execute code, and to save the code for re-use.
Selecting ‘Worksheets’ from the menu will list all of the test worksheets set up for the demo. We can click the plus sign in the top right corner to create a new worksheet. Once I click, I have options to create a SQL or a Python worksheet (Since those were the languages I selected at setup).
Default name for worksheet is current date/time, but can rename by selecting from menu next to name.
In the next section, I’ll post some SQL I used for testing.
Other Items:
Dashboards – create visualizations
Marketplace – Data exchange – Data sharing and access to external data sets.
I didn’t see anywhere in the web interface to upload data files (although the documentation mentions that the ‘classic’ interface can load files up to 50 MB).
Files can be loaded into cloud storage (Like AWS S3) and loaded into the database from there. The CLI can also handle loading files.
There’s also Snowpipe that can automate the load of data into the warehouse, to create a data pipeline.
SQL Worksheets:
At the top of the page, you’ll see a drop down to select the warehouse. Right above the code window, you’ll see another drop down to select the database.
Here’s SQL to create a table with an auto-number primary key, as well as inserts for data. The SQL is mostly standard, although creating the auto-number column used a different syntax than I was accustomed to.
CREATE OR REPLACE TABLE TestTable(
id int AUTOINCREMENT START 1 INCREMENT 1 PRIMARY KEY,
name varchar(30) NOT NULL,
modified_date datetime NOT NULL,
is_active boolean NOT NULL
);
INSERT INTO TestTable(name, modified_date, is_active)
VALUES ('Record1', CURRENT_TIMESTAMP, TRUE);
INSERT INTO TestTable(name, modified_date, is_active)
VALUES ('Record2', CURRENT_TIMESTAMP, TRUE);
INSERT INTO TestTable(name, modified_date, is_active)
VALUES ('Record3', CURRENT_TIMESTAMP, TRUE);
UPDATE TestTable SET
is_active = FALSE,
modified_date = CURRENT_TIMESTAMP
WHERE name = 'Record2';
SELECT * FROM TestTable;
Links:
Eidebailly – 6 Things You Need to Know Before Using Snowflake Data Warehouse
Centric: How Snowflake Architecture Delivers a Modern Data Storage Solution
Part I Part II