Snowflake Cortex: Bringing ML and AI Solutions to Your Data

Snowflake Cortex: Bringing ML and AI Solutions to Your Data

Written by

Ross Knutson, Manager

Published

May 28, 2024

AI & Machine Learning
Data & App Engineering
Snowflake

Snowflake functionality can be overwhelming. And when you factor in technology partners, marketplace apps, and APIs, the possibilities become seemingly endless. As an experienced Snowflake partner, we understand that customers need help sifting through the possibilities to identify the functionality that will bring them the most value.

Designed to help you digest all that’s possible, our Snowflake Panorama series shines a light on core areas that will ultimately give you a big picture understanding of how Snowflake can help you access and enrich valuable data across the enterprise for innovation and competitive advantage.

What is Snowflake Cortex?

The Snowflake data platform is steadily releasing more and more functionality under its Cortex service. But, what exactly is Cortex?

Cortex isn’t a specific AI feature, but rather an umbrella term for a wide variety of different AI-centric functionality within Snowflake’s data platform. The number of available services under Cortex is growing, and many of its core features are still under private preview and not generally available. 

This blog seeks to break down the full picture of what Cortex can do. It’s focused heavily on what is available today, but also speaks to what’s coming down the road. Without a doubt, we will get a lot more new details on Cortex at Snowflake Data Cloud Summit on June 3-6. By the way, if you’ll be there, let’s meet up to chat all things data and AI.

ML Functions

Before Cortex became Cortex, Snowflake quietly released so-called “ML Powered Functions” which are now rebranded as just Cortex ML Functions. These functions offer an out-of-the-box approach for training and utilizing common machine learning algorithms on your data in the Snowflake Data Cloud.

These ML functions primarily use gradient boosting machines (GBM) as their model training technique, and allow users to simply feed the appropriate parameters into the function to initiate training. After the model is trained, it can be called for inference independently or configured to store results directly into a SQL table.

As of May 2024, there are 4 available ML Functions:

Forecasting

Use this ML function to make predictions about time-series data like revenue, risk management, resource utilization, or demand forecasting.

Anomaly Detection

This function looks to automatically detect outlier data points in a time-series dataset for use-cases like fraud detection, network security monitoring, or quality control.

Contribution Explorer

The Contribution Explorer function aims to rank data points on their impact to a particular output and is best used for use-cases like marketing effectiveness, program effectiveness, or financial performance.

Classification

To train a model that identifies some categorical value, like a customer segmentation, medical diagnosis detection, or a sentiment analysis.

In general, users should remember that these Cortex ML Functions are truly out-of-the-box. In a production state, ML use-cases may require a more custom model architecture. The Snowpark API, and eventually Container Services, allows users to import model files directly to the Snowflake data cloud, when they outgrow the limitations of the Cortex ML functions.

Overall, Cortex’s ML Functions provide a fast way for users to explore and test commonly used machine learning algorithms on their own data, securely within Snowflake.

LLM Functions / Arctic

Earlier this year, Snowflake made their Cortex LLM Functions generally available to select regions. These functions allow users to leverage LLM’s directly within a Snowflake SQL query. In addition, Snowflake also released ‘Arctic’ their open-source language model that is geared towards SQL code generation.

Below, direct from Snowflake documentation, shows how simple it is to call a language model directly within a SELECT statement with Cortex:

				
					SELECT SNOWFLAKE.CORTEX.COMPLETE('snowflake-arctic', 'What are large language models?');

				
			

In the first parameter, we defined the language model we want to use (e.g. ‘snowflake-arctic’), and in the second parameter, we feed our prompt. This basic methodology opens up a ton of possibilities for layering in the power of AI to your data pipelines, reporting/analytics, and ad-hoc research projects. For example, a data engineer could add an LLM function to standardize an free-text field during ETL. An BI developer could automatically synthesize text data from different Snowflake tables into a holistic 2-sentence summary for a weekly report. An analyst could build a lightweight RAG chatbot on Snowflake Streamlit to interrogate a large collection of PDFs.

Arctic

Arctic is Snowflake’s recently released open source LLM. It’s built to perform well in so-called ‘enterprise tasks’ like SQL coding and following instructions. It’s likely that Snowflake wants to position Arctic as the de facto base model for custom LLM business use-cases, particularly those that require fine-tuning.

Even more likely, the Arctic family of models will continue to grow. Document AI, which will give users a UI to extract data from unstructured data files, like a scanned PDF, directly into a structured SQL table. This feature is built on top of the language model ‘Arctic-TILT’.

Other Cortex / Future State

Naturally, Snowflake has joined the world is offering the Snowflake copilot to assist developers while they work with Snowflake through it’s web UI. Universal Search promises to offer an ‘augmeneted analytics’ experience where users can run a query by describing the intended result in natural language. While these features are exciting on their own, they aren’t a major focus for this blog.

Snowflake Streamlit provides a easy way to quickly build simple data applications, integrated with the Snowflake platform. Container Services opens up the possibility of hybrid architectures that leverage Cortex within external business application architectures. The VECTOR data type puts vector embeddings in columns alongside your structured data warehouse data, allowing for techniques like RAG that don’t require a new vector database like Pinecone.

Snowflake Cortex is far from fully materializing as a product, but seeing the foundational building blocks today helps paint a picture of a future data platform that enables companies to quickly and safely build AI tools at scale.

Ready to unlock the full potential of data and AI?

Book a free consultation to learn how OneSix can help drive meaningful business outcomes.