Let’s revisit the 1980s, an era devoid of data clouds or warehouses. The concept of data warehousing emerged to dismantle silos, enabling seamless data flow across operational systems for efficient, cost-effective decision support environments.
As time passed, data sets expanded, emphasizing semi-structured data’s significance. Data warehouses struggled due to their inability to handle this data and its diverse schemas, leaving large enterprises in siloed environments. Fast forward thirty years, and data lakes emerged—a solution for storing vast raw data in its native form, centralizing it in one repository.
More than a decade has elapsed, marked by numerous unsuccessful on-premise data lake ventures. Despite this, the demand for scalable data storage solutions has only intensified. The recognition of data’s boundless business potential, when managed effectively, drives the data analytics market. Spending on big data and business analytics (BDA) solutions rose significantly, with a 10.1% increase from 2020 to 2021 alone.
During the emergence of data lakes, QPR recognized the potential in clients’ existing data for enhancing business processes. Our experts understood the challenges clients faced in creating accurate process models efficiently and automatically.
QPR ProcessAnalyzer (PA), our process mining solution, was launched to provide enterprises with precise, objective insights into their processes, enabling them to unlock their full operational potential with pinpoint accuracy.
Process mining involves discovering, analyzing, and monitoring processes by examining data traces left when employees or software interact with IT systems. This data, known as event logs, is utilized by process mining software to visualize real-life business processes, offering valuable insights drawn from these logs.
Process mining eliminates prolonged debates over processes, cost-saving aspects, unclear reporting, and visibility gaps. Users gain access to dynamically generated flowcharts, detailing processes, performance, and compliance efficiently.
Introducing the Data Cloud
Many on-premise data lake projects failed due to core technology limitations, particularly the Apache Hadoop ecosystem. While essential, Hadoop demanded extensive system management and custom coding, rendering traditional data lakes inefficient. Without proper resources, these lakes often turned into stagnant pools, leading to their moniker: data swamps.
Amidst rapid technological advancements, cloud environments flourished, offering vast storage and computing capabilities. While some data lake providers migrated to the cloud, Snowflake opted for a unique path, crafting a cloud-native solution. They developed the Snowflake Data Cloud, a cutting-edge cloud data warehouse-as-a-service (DWaaS) with a new SQL query engine and innovative architecture.
Dreams of process mining with unlimited scalability
Let’s rewind a bit to see how this connects with process mining. When process mining emerged, solutions like Snowflake didn’t exist. Process mining software was developed based on industry-standard database technology, where individual queries were executed on single nodes, meaning one query equaled one computer. Although parallel queries and powerful computers were employed, the fundamental limitation persisted – it boiled down to the capability of a single computer.
A few years ago, our product development team pondered a breakthrough. They explored a new technology where nodes in clusters stored portions of the entire dataset locally, unlike traditional individual queries per node. The daring idea emerged: could MPP (massively parallel processing) technology handle intensive process mining queries? While unconventional and untested in process mining, the team ventured into extensive testing, driven by curiosity and determination.
Unique architectural structure
In May 2022, we proudly revealed our partnership with Snowflake, marking us as the sole process mining software seamlessly integrated with Snowflake. Curious about the synergy, I interviewed Olli Vihervuori, QPR’s Product Manager, to uncover the compelling reasons behind the harmonious alliance between Snowflake Data Cloud and QPR ProcessAnalyzer.
“The short answer to why we chose Snowflake over other solutions and providers was in its simplicity, performance. This performance is enabled by the unique architecture of Snowflake. Additionally, other factors made our choice easy, such as the ability to write SQL data as well as the easiness of use – you can create and start using Snowflake in a couple of minutes. Furthermore, Snowflake is cloud-based and cloud-based only, and the future is in the cloud.”
Vihervuori explains.
Snowflake’s architecture cleverly combines shared-disk (SD) and shared-nothing (SN) designs. They’ve harnessed the centralized data storage from SD and integrated it with SN’s MPP technology. By building a solution tailored for the cloud, they’ve enhanced these qualities further. Cloud advantages like near-infinite data scaling and parallel, independent compute clusters enable users to process vast amounts of data swiftly. For more in-depth insights, I recommend exploring Snowflake’s detailed resources.
Snowflake’s architecture comprises three vital layers: 1) database storage, 2) query processing, and 3) cloud services, all managed on chosen cloud platforms. When utilizing QPR ProcessAnalyzer as a managed application, your account can be hosted on AWS, GCP, and/or Azure. If your data is already on these platforms, simply link your Snowflake account. As a connected application, your Snowflake queries run in QPR’s cloud environment on AWS Ireland. So, what’s the difference between connected and managed?
Connected and managed -application
When the decision to develop QPR ProcessAnalyzer Powered by Snowflake was made, it wasn’t just a minor change; it was a significant leap. Unlike a simple feature or module, it’s an entirely new product. Existing customers must decide to switch from one software to another to utilize Snowflake queries.
For end-users, the experience of logging into QPR ProcessAnalyzer as a connected or managed application appears almost identical. The user interface and features remain consistent. However, there are subtle distinctions, particularly concerning data governance.
Connected application
In the connected application model, the PA customer becomes a Snowflake customer, requiring a Snowflake account. PA is then enabled as a connected application, granting it access to the customer’s Snowflake account. Unlike other process mining tools, there’s no need to duplicate data into a separate platform. Snowflake customers can connect multiple applications directly, eliminating data transfers and ensuring a unified data source queried with familiar SQL tools, ensuring a single source of truth.
Furthermore, in terms of security, Snowflake offers an exciting capability: secure and selective data sharing with customers and business partners. In the connected application model, customers have control over their data. QPR maintains the application code, while customers manage their data on their own platform. PA accesses only the necessary information for specific actions as per the customer’s data governance policy. Process mining on Snowflake provides the best and simplest way to ensure compliance with data privacy, security, industry, and government regulations.
Managed application
On the contrary, the managed application model doesn’t necessitate the customer to have a Snowflake account. Here, data and its governance are managed to some extent by QPR, similar to the regular version of QPR ProcessAnalyzer. To execute queries on Snowflake, customers opt to load the data onto Snowflake when loading it onto PA. This data is loaded onto QPR’s multi-tenant Snowflake environment, hosted on AWS Ireland. Queries are processed in Snowflake, and the results are promptly displayed on the customer’s PA dashboard interface. This approach allows customers, even those from large companies with intricate processes and vast datasets, to analyze their processes swiftly, leveraging Snowflake’s efficient scalability.