Getty Images/iStockphoto

Snowflake fuels edtech vendor's data and AI initiatives

PowerSchool is creating personalized educational experiences for teachers, students, parents, administrators and others by using its data to train AI tools.

While PowerSchool is the engine that makes hundreds of school districts go, Snowflake's data management and AI development capabilities are the fuel that provides the vendor its energy for analysis and AI development.

PowerSchool is an educational technology company based in Folsom, Calif., whose software is used by school districts throughout the U.S. and abroad to collect and provide access to information.

It's the system teachers use to input grades, take attendance and post assignments. It's where students view schedules and grades. It's where administrators manage their districts' data, including faculty and student schedules, special education programs and individualized education programs for students with special needs. And it's where parents view information about their children's assessments and in-class performance, and data regarding the schools their children attend.

At its core is a massive amount of data from nearly 17,000 school districts in North America and 90 countries worldwide that must somehow be organized to be valuable.

Over the course of a single student's journey from kindergarten through their senior year of high school, districts collect an average of more than 170,000 data points about that individual student, according to Shivani Stumpf, PowerSchool's chief product and innovation officer.

But PowerSchool doesn't serve the needs of a single student. Instead, it serves the singular needs of 60 million students.

"Every piece of software that could be used by a student, parent, educator, administrator, counselor, superintendent is provided by PowerSchool," Stumpf said.

Snowflake, a data management vendor that now also provides AI application development capabilities, is PowerSchool's means of managing all the data needed to serve the needs of students, teachers, parents and administrators.

But it wasn't always.

Before 2021, PowerSchool used different data management tools to oversee its data. Eventually, as the educational technology vendor's data volume grew and plans for operationalizing its data grew more ambitious, those tools failed.

"We moved away from [our previous data management provider] primarily for performance reasons," Stumpf said, declining to identify the provider because PowerSchool still uses some of its capabilities for other purposes. "We were really not happy, as we wanted to scale."

PowerSchool needed something else, something with more … power.

Problem, meet solution

PowerSchool collects and manages about 780 TB data and 35 billion monthly data changes. That data is critical to every aspect of a school district's operations, from meeting the needs of each student to making broad, district-wide plans.

If a student is struggling, data informs how to address those problems and devise a plan to help that student succeed. If a student is excelling, data informs how to provide a plan that will keep the student engaged. If a district needs to determine a budget for the next year, school officials use that data to make key decisions.

If a district wants to take advantage of the latest technology and use AI to assist students, teachers and administrators, the district uses that data to train the AI applications that provide the assistance.

Data and analytics, especially after COVID, was one of the most important … pieces of an education system, just like in any other industry. We started to see a tremendous amount of demand and growth.
Shivani StumpfChief product and innovation officer, PowerSchool

In 2021, as schools found ways to deal with the ongoing COVID-19 pandemic and students were back in school following shutdowns during spring 2020, data was perhaps more important than ever.

"Data and analytics, especially after COVID, was one of the most important … pieces of an education system, just like in any other industry," Stumpf said. "We started to see a tremendous amount of demand and growth."

PowerSchool's existing data infrastructure, however, was unable to deal with the increased demand for data and the company's plans to scale its use of data, she continued.

One significant issue was its inability to separate compute from storage, which forced system administrators to prioritize between loading data and loading dashboards, rather than doing both simultaneously. A resulting problem, as data initiatives increased and more compute power was needed to run those initiatives, was that school administrators would arrive in the morning, log in to their dashboards, and the data from the previous day would still be loading.

"We were really unhappy with the performance latency we were seeing," Stumpf said. "That was a very frustrating experience."

When PowerSchool had finally had enough of its old data management system, it listed about 50 criteria it wanted in a new one and began its search.

One of the key criteria PowerSchool wanted in a new data management platform was the separation of compute and storage. Another was data governance, particularly capabilities to protect sensitive data.

As an education technology provider, much of the data PowerSchool collects is personally identifiable information that is protected. If exposed, PowerSchool and the districts it serves could be subject to stiff penalties for running afoul of regulations. A platform that provides strong data governance capabilities and enables system administrators to limit what data is exposed to each user is, therefore, critical to PowerSchool.

"The No. 1 priority for us across all our applications is stringent data governance, security and privacy, and protecting that for our customers," Stumpf said.

Data governance, meanwhile, is one of Snowflake's points of emphasis, according to Josh Klahr, the vendor's head of data warehousing.

Snowflake's architecture, including its Horizon data catalog, is designed to eliminate isolation across different data governance models, data types and data sharing ecosystems, Klahr noted.

"That makes it an ideal choice for organizations with unique compliance requirements and needs," he said. "Customers get a single data governance model with comprehensive compliance, security, privacy and collaboration controls that are universally enforced to protect PII."

Beyond its data governance capabilities, Snowflake enables customers to separate compute and storage workloads, Klahr continued.

"Snowflake's elastic compute layer can scale to meet even the most demanding and unpredictable analytical workloads," he said.

However, despite Snowflake's focus on data governance and privacy protection, and its ability to separate compute and storage, PowerSchool didn't immediately choose Snowflake upon determining that it needed a new data management provider.

PowerSchool did an extensive search for a new data management provider. It was already a partner of AWS, Google Cloud and Microsoft, so it looked at each of their data management ecosystems in addition to those from other vendors.

Ultimately, despite not having a previous history with Snowflake, it chose the vendor in October 2021 to be the backbone of its data management and analytics initiatives.

"We did a very formal, deep-dive evaluation and were really happy with what we saw from Snowflake in our initial proof of concept and then decided that was the platform for us," Stumpf said. "And we're very happy with that decision."

Evolving with Snowflake

When PowerSchool started using Snowflake in late 2021, Snowflake was largely focused on data management. So, too, was PowerSchool, which simply needed a data management platform that could meet its growing data needs.

Snowflake's focus evolved, however, when OpenAI launched ChatGPT in November 2022, sparking a surge of interest in generative AI.

With data the foundation for any AI model or application, many data management vendors -- including Snowflake rival Databricks -- began creating environments for customers to develop AI tools such as natural language assistants. They began adding capabilities such as vector search and storage, retrieval-augmented generation and integrations with large language models (LLMs).

Snowflake was among them. Though perhaps slower than some to embrace AI development, the vendor has been aggressive in developing an environment aimed at enabling customers to build AI tools since changing CEOs in February.

Its current AI development suite, a managed service called Cortex AI, includes features and integrations aimed at enabling customers to easily create AI chatbots and agents -- AI assistants that go beyond question-and-answer capabilities to be proactive -- and other AI applications. In addition, it includes Snowpark Container Services, a managed service that enables the secure deployment of AI applications.

"We take care of the complex administration so that our customers can focus on quickly deploying AI and machine learning across their organizations to drive business value," Klahr said.

Most recently, Snowflake formed a partnership with AI vendor Anthropic to optimize its agents for Anthropic's LLMs and acquired Datavolo to improve data integration.

PowerSchool's primary goal is to make all of a customer's data available to them in a singular, governed fashion and to do so in real time so that the data being used to inform decisions is as complete and current as possible.

Snowflake, working in conjunction with other platforms that perform tasks such as data ingestion and data observability, enables PowerSchool to deliver on that goal, according to Stumpf. In particular, Snowflake's separation of compute and storage is key to enabling real-time data delivery.

Now, however, just as Snowflake has evolved beyond data management to include AI development capabilities, PowerSchool's goals have grown beyond real-time analytics to include using AI to enable more informed decision-making.

Like enterprises in many industries that have developed generative AI-powered assistants that enable users to ask questions of their data using natural language rather than code -- thus expanding analytics use beyond a small percentage of technical experts -- PowerSchool has done the same.

In January, using the data it has consolidated in Snowflake along with AI development capabilities tools from Snowflake -- Snowpark Container Services among them -- and Microsoft's Azure AI Studio, PowerSchool launched PowerBuddy, its version of an AI assistant.

"The data really provides our PowerBuddy its intelligence," Stumpf said. "I call it 'talk-to-your-data.' It allows a superintendent, a principal and other non-technical users to ask a natural language question."

For example, an administrator can use natural language to ask complex data-related questions, such as how many students in a given grade were absent over a certain period and received low grades in mathematics to get an immediate answer.

Parents, meanwhile, can use PowerBuddy to ask whether their child submitted their homework or find out how their child performed on an exam. And students can use PowerBuddy to help them with homework and other tasks.

"We've done a ton of work [in the background] so that now any user can ask a natural language question," Stumpf said. "The notion is that everybody in education, whether you are a parent, a student, an administrator, a counselor, a principal, could use a buddy that provides them the information at their fingertips that's relevant to them."

Since January, PowerSchool has launched five PowerBuddy assistants, each trained with different data to respond to questions.

To date, PowerSchool has PowerBuddys for Assessment so educators can create student assessments, College and Career to assist students as they prepare to move on from high school, Data Analysis so analysts and IT teams can converse with data, Engagement to provide parents a means of asking questions, and Learning to help teachers create instructional content. Each can be personalized to meet the needs of its users.

PowerBuddy is being used in nearly 250 school districts representing more than 3 million students, according to Stumpf.

Meanwhile, given all the data points collected per student from the time they start kindergarten through high school, AI provides more complete and accurate analysis than is humanly possible.

"It's not possible for humans to analyze all those data points," Stumpf said. "It can tell these are the subjects someone has mastered, this is their proficiency, this is where they need additional help. It leverages all those data points to help a teacher understand all their students, help a student understand where they need support, help a parent understand where their child needs support."

Getting started and wanting more

While PowerSchool is reaping the benefits of its shift to Snowflake, starting with a new system is no small task. And while relatively smooth, PowerSchool's transition to Snowflake for its data management and analytics needs wasn't without some hitches.

PowerSchool had two initial priorities when changing data platforms, according to Stumpf. One was to build a secure data lake for its student data, covering kindergarten through 12th grade.

"That was a net-new offering we wanted to provide our customers," Stumpf said.

The other was to migrate all its existing analytical workloads from its old system to Snowflake.

The first initiative, building a data lake from scratch, was relatively straightforward, according to Stumpf. Migrating data from its old data management platform to Snowflake, however, was not as smooth.

PowerSchool couldn't simply lift its data from its old system and drop it into Snowflake. Instead, PowerSchool's data team had to re-engineer its data to make it work with Snowflake, a labor-intensive and time-consuming process given the volume of data involved and the unique privacy requirements around student data.

"That definitely took much, much longer than what we had originally anticipated," Stumpf said. "There was a lot of re-writing, re-jiggering of work that we had to do."

Eventually, the data migration task was done, and PowerSchool was able to complete its first initiatives. And now, three years later, it is building innovative AI applications to better serve the needs of tens of millions of students and educators.

Snowflake, meanwhile, prioritizes making the onboarding process as painless as possible and then aims to assist customers as they expand beyond initial projects, according to Klahr. PowerSchool's progression with Snowflake, going from one or two projects at the start to widespread use, is therefore somewhat typical, he continued.

"Typically, customers like PowerSchool will migrate an initial analytics use case to Snowflake from a legacy data platform," he said. "From there, they can start build a strong data foundation [and] then use Snowflake to deploy advanced analytics … and expand into building sophisticated apps that tap into the power of AI and machine learning to democratize access to insights."

Now that PowerSchool has been using Snowflake for more than three years and expanded its use beyond data management, PowerSchool does want more from Snowflake's AI development environment, according to Stumpf.

With Snowflake slow to roll out certain features that are part of other vendors' development suites, PowerSchool isn't using Snowflake alone as it develops and improves PowerBuddy. Rather than use Cortex AI, much of which is still in the preview stage and not yet generally available, PowerSchool is instead using Microsoft's Azure AI Studio in connection with its Snowflake data.

One specific wish is that Snowflake provide access to more LLMs than it currently does. With Anthropic, developer of the Claude line of LLMs, now a partner, Snowflake provides integrations with seven LLM developers, including Google, Meta and Mistral AI.

One LLM provider not part of Snowflake's ecosystem is OpenAI, whose GPT models remain among the most popular and highest-performing.

"We would love to see more generative AI models available within Snowflake," Stumpf said. "That would allow us to better take advantage of Cortex and other AI capabilities. Right now, some of the models they have available are not on par with what we need in terms of accuracy and performance."

In addition, access to open source LLMs would be beneficial, she continued. Open source models require more fine-tuning than proprietary models, but they are more cost-effective.

"We're testing different models out in this moment, but OpenAI's GPT models are definitely at the top of the list at this point," Stumpf said.

After struggling to manage its growing amount of data, educational technology vendor PowerSchool is using Snowflake for its data management and AI development initiatives.
Edtech vendor PowerSchool is using Snowflake's data management and AI development capabilities to improve student experiences.

The future

As PowerSchool continues to expand its use of data and develop more AI tools, Snowflake aims to continue providing the educational technology vendor with the tools to do so.

With numerous AI development capabilities in preview -- many of them introduced Nov. 12, including tools designed specifically for developing agentic AI tools and added security measures after a hacker accessed Snowflake's environment through user passwords last spring -- Snowflake is continuing to address and improve AI development.

"PowerSchool is an inspiring example of how the right data and AI strategies can transform enterprises," Klahr said. "We're proud to say that Snowflake is their trusted partner."

PowerSchool, meanwhile, has plans to use data to further improve the educational experience for everyone involved. After launching initial versions of PowerBuddy throughout 2024, the company wants to use its data and AI to provide more than just a conversational interface that provides instant access to intelligence.

PowerSchool wants to take all those data points about each student -- to be exact, an average of 171,955 -- to create personalized educational experiences, according to Stumpf.

"The nirvana for us is providing a personalized pathway for every student so they can maximize their potential in the way that's tight for them," she said.

That includes taking into account everything from seemingly insignificant data such as a student's bus route to critical elements such as special education and language accommodations, Stumpf continued.

"If we are able to truly leverage those 171,955 data points to personalize the experience for someone to maximize their potential in a way that was never humanly possible before, that is what success would be for me and for PowerSchool," she said.

Eric Avidon is a senior news writer for TechTarget Editorial and a journalist with more than 25 years of experience. He covers analytics and data management.

Dig Deeper on Data management strategies