From single node technology to massively parallel processing: how QPR developed a process mining application powered by Snowflake

Let’s revisit the 1980s, an era devoid of data clouds or warehouses. The concept of data warehousing emerged to dismantle silos, enabling seamless data flow across operational systems for efficient, cost-effective decision support environments.

As time passed, data sets expanded, emphasizing semi-structured data’s significance. Data warehouses struggled due to their inability to handle this data and its diverse schemas, leaving large enterprises in siloed environments. Fast forward thirty years, and data lakes emerged—a solution for storing vast raw data in its native form, centralizing it in one repository.

More than a decade has elapsed, marked by numerous unsuccessful on-premise data lake ventures. Despite this, the demand for scalable data storage solutions has only intensified. The recognition of data’s boundless business potential, when managed effectively, drives the data analytics market. Spending on big data and business analytics (BDA) solutions rose significantly, with a 10.1% increase from 2020 to 2021 alone.

During the emergence of data lakes, QPR recognized the potential in clients’ existing data for enhancing business processes. Our experts understood the challenges clients faced in creating accurate process models efficiently and automatically.

QPR ProcessAnalyzer (PA), our process mining solution, was launched to provide enterprises with precise, objective insights into their processes, enabling them to unlock their full operational potential with pinpoint accuracy.

Process mining involves discovering, analyzing, and monitoring processes by examining data traces left when employees or software interact with IT systems. This data, known as event logs, is utilized by process mining software to visualize real-life business processes, offering valuable insights drawn from these logs.

Process mining eliminates prolonged debates over processes, cost-saving aspects, unclear reporting, and visibility gaps. Users gain access to dynamically generated flowcharts, detailing processes, performance, and compliance efficiently.

Introducing the Data Cloud

Many on-premise data lake projects failed due to core technology limitations, particularly the Apache Hadoop ecosystem. While essential, Hadoop demanded extensive system management and custom coding, rendering traditional data lakes inefficient. Without proper resources, these lakes often turned into stagnant pools, leading to their moniker: data swamps.

Amidst rapid technological advancements, cloud environments flourished, offering vast storage and computing capabilities. While some data lake providers migrated to the cloud, Snowflake opted for a unique path, crafting a cloud-native solution. They developed the Snowflake Data Cloud, a cutting-edge cloud data warehouse-as-a-service (DWaaS) with a new SQL query engine and innovative architecture.

Dreams of process mining with unlimited scalability

Let’s rewind a bit to see how this connects with process mining. When process mining emerged, solutions like Snowflake didn’t exist. Process mining software was developed based on industry-standard database technology, where individual queries were executed on single nodes, meaning one query equaled one computer. Although parallel queries and powerful computers were employed, the fundamental limitation persisted – it boiled down to the capability of a single computer.

A few years ago, our product development team pondered a breakthrough. They explored a new technology where nodes in clusters stored portions of the entire dataset locally, unlike traditional individual queries per node. The daring idea emerged: could MPP (massively parallel processing) technology handle intensive process mining queries? While unconventional and untested in process mining, the team ventured into extensive testing, driven by curiosity and determination.

Unique architectural structure

In May 2022, we proudly revealed our partnership with Snowflake, marking us as the sole process mining software seamlessly integrated with Snowflake. Curious about the synergy, I interviewed Olli Vihervuori, QPR’s Product Manager, to uncover the compelling reasons behind the harmonious alliance between Snowflake Data Cloud and QPR ProcessAnalyzer.

“The short answer to why we chose Snowflake over other solutions and providers was in its simplicity, performance. This performance is enabled by the unique architecture of Snowflake. Additionally, other factors made our choice easy, such as the ability to write SQL data as well as the easiness of use – you can create and start using Snowflake in a couple of minutes. Furthermore, Snowflake is cloud-based and cloud-based only, and the future is in the cloud.”
Vihervuori explains.

Snowflake’s architecture cleverly combines shared-disk (SD) and shared-nothing (SN) designs. They’ve harnessed the centralized data storage from SD and integrated it with SN’s MPP technology. By building a solution tailored for the cloud, they’ve enhanced these qualities further. Cloud advantages like near-infinite data scaling and parallel, independent compute clusters enable users to process vast amounts of data swiftly. For more in-depth insights, I recommend exploring Snowflake’s detailed resources.

Snowflake’s architecture comprises three vital layers: 1) database storage, 2) query processing, and 3) cloud services, all managed on chosen cloud platforms. When utilizing QPR ProcessAnalyzer as a managed application, your account can be hosted on AWS, GCP, and/or Azure. If your data is already on these platforms, simply link your Snowflake account. As a connected application, your Snowflake queries run in QPR’s cloud environment on AWS Ireland. So, what’s the difference between connected and managed?

Connected and managed -application

When the decision to develop QPR ProcessAnalyzer Powered by Snowflake was made, it wasn’t just a minor change; it was a significant leap. Unlike a simple feature or module, it’s an entirely new product. Existing customers must decide to switch from one software to another to utilize Snowflake queries.

For end-users, the experience of logging into QPR ProcessAnalyzer as a connected or managed application appears almost identical. The user interface and features remain consistent. However, there are subtle distinctions, particularly concerning data governance.

Connected application

In the connected application model, the PA customer becomes a Snowflake customer, requiring a Snowflake account. PA is then enabled as a connected application, granting it access to the customer’s Snowflake account. Unlike other process mining tools, there’s no need to duplicate data into a separate platform. Snowflake customers can connect multiple applications directly, eliminating data transfers and ensuring a unified data source queried with familiar SQL tools, ensuring a single source of truth.

Furthermore, in terms of security, Snowflake offers an exciting capability: secure and selective data sharing with customers and business partners. In the connected application model, customers have control over their data. QPR maintains the application code, while customers manage their data on their own platform. PA accesses only the necessary information for specific actions as per the customer’s data governance policy. Process mining on Snowflake provides the best and simplest way to ensure compliance with data privacy, security, industry, and government regulations.

Managed application

On the contrary, the managed application model doesn’t necessitate the customer to have a Snowflake account. Here, data and its governance are managed to some extent by QPR, similar to the regular version of QPR ProcessAnalyzer. To execute queries on Snowflake, customers opt to load the data onto Snowflake when loading it onto PA. This data is loaded onto QPR’s multi-tenant Snowflake environment, hosted on AWS Ireland. Queries are processed in Snowflake, and the results are promptly displayed on the customer’s PA dashboard interface. This approach allows customers, even those from large companies with intricate processes and vast datasets, to analyze their processes swiftly, leveraging Snowflake’s efficient scalability.

Get a quick overview so you can spend your time improving your processes.

You can easily define and set up processes according to your own needs using a convenient layout tool. In addition, since all types of workflow processes can be managed, you get all case management in one single system. 

Analyses, statistics and reports give you an unbeatable overview with real-time status and division of responsibility. Nothing has to fall through the cracks anymore and you always have an accurate and up-to-date base for decision-making.

CANEA Process is a tool that allows you to model and share business processes in an easy-to-use graphical web interface.

Seeing is understanding

Visualisation gives all employees an understanding of the organisation’s processes, activities, responsibilities and information flows.

Living processes

Identifying working methods while making it easy to update is the basis for continuous improvement of processes.

Create a complete picture

Linking together documents, information and tools with clickable process maps create an intuitive and comprehensive management system.

CANEA Workflow is an IT solution that automates, quality-assures and speeds up administrative workflow processes.

You can easily define and set up processes according to your own needs using a convenient layout tool. In addition, since all types of workflow processes can be managed, you get all case management in one single system. 

Analyses, statistics and reports give you an unbeatable overview with real-time status and division of responsibility. Nothing has to fall through the cracks anymore and you always have an accurate and up-to-date base for decision-making.

By creating executable processes in CANEA Workflow, you get both better control of the situation and smoother processing.

Streamlining the work

Ensure compliance and that handovers are done correctly and with the right information.

Correct decision support

You get an unbeatable overview of the processes in real time, with both clear reports and clear diagrams.

Improve processes continuously

Our process support can be constantly adapted to changing needs and requirements. In this way, we give you the best possible conditions for your daily work.

A document management system without complicated folder structures.

CANEA Document combines simple and intuitive search features with powerful features for managing documents from a life cycle perspective.

Maximum availability
Search and find information quickly based on what you need, not where it is stored.

High security
Ensure accuracy, changes and access to all information – with high traceability.

Fulfils requirements
Manage information according to standards, legislation and other requirements for document management.

CANEA Document supports everything from production, publication and modification to archiving and deletion – with full traceability and version management.

CANEA Document provides secure management of all types of documents – in one place. Tagging the information with metadata creates a virtual, multidimensional folder structure. This means that a document appears in multiple locations at the same time, with authorisation-controlled access. The right information in the right place for the right users at the right time!

You can usefully add your company’s process-oriented management system to CANEA Document. All production and distribution of documents is quality-assured and streamlined. In addition, your employees always have access to the most up-to-date version of the documents – directly via intranet, tablets and mobile phones.

Improve the performance of your project activities

CANEA Project is a comprehensive project tool with integrated support for all types of projects and parties– such as management, resource owners, clients, project managers and project participants. CANEA Project gives you an excellent overview of your entire project portfolio, including profitability and status, making it easier to prioritize and make decisions.

CANEA Project shares all important project information with both internal and external members.

Let your project manager focus on management rather than administration and distribution of the information.

Gather all project information in one place and make it available to both internal and external members.

Helps you with prioritization of projects, resource management and analysis of portfolios and programs.

Make reality of the strategy

CANEA Strategy makes it possible for organisations of all sizes to create a unique common thread from the strategic work to the daily operations. We do not just provide performance management tools but rather a completely new generation of IT support for strategy activation. The system guarantees and provides support throughout the chain from strategy to execution. You get no results without action. CANEA Strategy makes it possible in practise! 

CANEA Strategy ensures and provides support throughout the entire journey from strategy to execution.

Create a shared understanding of the strategy, the goal to achieve and how.

Gives management an unbeatable overview of what’s happening, how it’s progressing, and why.

Creates a clear common thread from the strategy to project, initiatives and actions.