The Data Warehouse has been critical to business decision making for over two decades. As a Microsoft business intelligence expert and systems integrator, BlueGranite frequently encounters customers looking to maximize their existing SQL Server data warehouse's performance, or who want help planning a migration to a new version of SQL Server.
One of the common problems that BlueGranite sees working with customers in many different industries is an increase in data sources, data volumes, and lengthening ETL load times. Additionally, end users expect more functionality and split-second report/dashboard response times. The days of waiting two minutes for a report are over. In most cases, customers are looking for features, functionality, techniques, and even hardware that will allow them to meet increasing performance requirements.
As a nine-year leader in the Gartner Magic Quadrant for BI, Microsoft has a vast array of solutions and products to meet the growing needs of the data warehouse. While many customers are aware of SQL Server 2016 and its earlier versions, Microsoft also has three additional product offerings for data warehouse solutions: Data Warehouse Fast Track, Azure SQL Data Warehouse, and the Analytics Platform System (details below). While SQL Server contains many BI features like Integration Services, Analysis Services, and Reporting Services in a single installation of the product, here we are focusing on the Relational database being used for data warehouse solutions.
What is It: SQL Server 2016 – Best BI Platform for over a decade
Microsoft SQL Server Enterprise is the most common data warehouse solution of the four listed above. It is asymmetric multiprocessing (SMP) database engine that is NUMA aware, which means it relies on the CPUs, RAM, and storage shared within a single server. When using SQL Server for a data warehouse, the customer must build their own hardware solution, then install and “configure” SQL Server on top of the hardware. It has industry-leading features including Clustered Columnstore Index for extremely fast, low IO, in-memory query processing.
Key Considerations: Build Your Own Deployment – Lots of Options
Customers can use a mixture of physical or virtual hardware and direct attached or shared storage via a storage area network. Most customers will configure SQL Server for data warehouse workloads using specific considerations and configurations for scan centric workloads (opposite of OLTP). While SQL Server has traditionally been installed and run on premises, customers can also run SQL Server in the cloud using Infrastructure as a Service (IaaS). SQL Server Enterprise comes with a plethora of BI features that can be used to create a cohesive solution. The speed of storage available to the server is vital to any SQL Server warehouse deployment, so many built SQL Server solutions take advantage of current PCI and flash-based storage technologies. Any design should factor in not only the capacity of storage but also its IO performance to deliver results at the required speed.
BI Features: All in One or One for All
SQL Server doesn’t put constraints on a customer’s ability to install and use multiple features on one server; this is a popular practice due to Microsoft only imposing licensing on a server basis (per core pairs) and not by feature or instance.
Programming: Regular T-SQL
Customers program their data warehouse solution using the robust T-SQL programming language that is expanded and made more robust with each version release of SQL Server.
Scale Options: Scale Up Yourself
Customers can “scale up” their SQL Server data warehouse by adding more horsepower (RAM, CPU, and faster storage). By using SQL Server in a scaled up environment, customers can expect to grow their data warehouse up to 10TB before needing to analyze other solutions.
Key Features: CCIs & Much More
SQL Server has page and row compression which allow the data to be compressed on disk so data retrieval is faster overall. Starting with SQL Server 2014, customers can also create Clustered Columnstore Indexes (CCIs) which convert data to columnar storage from row storage allowing even greater compression. CCIs have the potential to speed up data warehousing queries 100x. In-memory OLTP tables can be used for Staging tables to speed up ETL workloads. Batch mode also helps with performance as the engine can iterate on “batches” of rows at a time.
For information on SQL Server 2016 performance features, Microsoft’s library of resources is a great starting point.
SQL Server Fast Track
What is It: High Performance SMP SQL Server Appliance
SQL Server Fast Track is a reference architecture methodology for data warehouse solutions where performance is balanced between all the components of software, hardware, and storage to remove bottlenecks. Microsoft and various hardware vendors have teamed up to use the architecture to create pre-configured, tested, and shipped appliance solutions for SQL Server data warehousing. This can eliminate most of the effort and expertise needed to engineer an optimal server solution from basic parts.
Key Considerations: Time Saver – Off the Shelf Hardware & Software Deployment
The reference architecture can be manually applied to a plain SMP SQL Server data warehouse solution, but customers get the most value in being able to pick an off-the-shelf solution for their desired performance level and price, as this can speed up solution deployments. When buying a Fast Track appliance, customers can expect the vendor to ship and configure the hardware and software to the customer’s on-premises data center.
BI Features: Only SQL on the Appliance
Because Fast Track is an appliance-based approach, there are some underlying restrictions. First, Fast Track appliances are only currently available on-premises. Second, while SMP SQL Server allows the customer to install features for BI, this is typically not supported on the appliance.
Programming: Regular T-SQL Compatible
Since Fast Track is just SQL Server Enterprise SMP, you can use regular T-SQL.
Scale Options: Scale Up – Predetermined Scale Size Limit (5TB to 145TB)
Fast Track is a scale up solution. Appliances are available in many different sizes (5TB to 145T) from many vendors. Since the appliance is already scaled up, they typically can’t be scaled up any further. If a customer exceeds their Fast Track size threshold, they typically must buy another one.
Key Features: Fast Storage, All SQL Server Features Available
The main benefits of Fast Track are screaming-fast performance (up to 10 GB/second read/write throughput), usage of plain T-SQL, and access to all the performance enhancements to SQL Server like CCIs. This means that customers can migrate their data warehouse solutions to Fast Track, leave their code unchanged, and obtain tremendous performance benefits.
For more information on Fast Track, try checking out Microsoft’s website.
Analytics Platform System
What is It: SQL Server MPP Appliance
The Microsoft Analytics Platform System (APS) is a SQL Server Enterprise-based data warehouse appliance created by Microsoft and hardware vendors that can scale to up to 6PB of data. APS is an Massively Parallel Processing (MPP) database engine, meaning that it is a set of multiple SQL Servers that are running in concert together.
Key Considerations: Scale Out & Distribute Your Data
APS is a scale out solution. APS fundamentally works by taking a big problem and breaking it up into smaller pieces. APS does this by implementing a control/compute topology, where one SQL Server functions as the brains of the solution (control), and other SQL Servers function as the brawn (compute) to perform parallel operations on data that has been distributed among the compute SQL Servers.
BI Features: Only SQL on the Appliance
Similar to Fast Track, APS does have some limitations, and it’s only available on-premises. APS is only on SQL Server; additional BI features cannot be installed on the appliance, though all BI features can use the appliance.
Programming: DSQL is a little different and may require conversion
APS uses Distributed SQL (DSQL). DSQL is slightly different than T-SQL and customers typically have some code conversion (DDL & DML) that is needed before they can move their databases to APS. This conversion can be minimal or extensive depending on the customer’s coding practices.
Scale Options: Scale Out in Scale Units
Customers should look at APS when they have around 10TB of data and expect data growth. APS can scale out to as large as 6PB. APS scales with linear cost and performance because it can be expanded by adding additional compute nodes to the appliance (called adding Scale Units).
Key Features: It’s Extremely Fast
When using APS, customers can expect significant performance improvements in querying and loading data. Customer reporting queries can often be sped up by a factor of 100 and data can be loaded as fast as several TBs an hour. Because APS is SQL Server under the covers, there is nothing special needed to connect to it, so customers can expect a seamless transition of their reporting and semantic layer tools.
For more information on APS, see Microsoft’s website.
Azure SQL Data Warehouse
What is It: SQL Server MPP in the Cloud
Azure SQL Data Warehouse (SQL DW) is a Platform as a Service (PaaS) data warehouse service in the Microsoft Azure cloud. SQL DW is an MPP database engine relying on a control/compute topology using Azure SQL Database and Azure Blog Storage. SQL DW is not APS, but it is very similar.
Key Considerations: SQL DW is Scale Out PaaS – Infrastructure is managed by Microsoft
SQL DW is a scale out solution like APS, and like APS, data is distributed to compute nodes for faster performance and scale. Since SQL DW is an Azure offering, it only runs in Azure and is not available on-premises. The offering is also PaaS, so customers don’t have to worry about on-premises or even cloud infrastructure as it is all managed by Microsoft.
BI Features: Only SQL On the Appliance
SQL DW is only SQL Server like Fast Track and APS. While you can’t add BI features to the cluster, SQL DW can be used with all the SQL Server and Azure BI features.
Programming: DSQL is a little different and may require conversion
Like APS, SQL DW uses DSQL and has similar programming limitations.
Scale Options: Elastically Scale in Minutes
Customers should look at SQL DW when they have TBs of data. Using SQL DW, they can expect to scale out to as large as PBs. SQL DW can be elastically scaled out in seconds by changing the Data Warehouse Units (DWU – unit of scale performance) setting in the Azure portal.
Key Features: Separation of Compute & Storage
Customers can create a SQL DW cluster in minutes. Just like other Azure offerings, it is a pay-as-you-go pricing model so you are charged by the hour. Not only can SQL DW be elastically scaled out, Customers can even pause their instance which eliminates compute charges. Storage and compute are completely decoupled so when customers pause their cluster, their data is persisted.
If you’d like more information about Azure SQL DW, please see Microsoft’s website.
Scale Your Company with BlueGranite
Whether you want to scale an existing data warehouse platform or migrate to a new one work, BlueGranite would love to help. Contact us today – we’re experts at helping organizations of all sizes navigate the many Modern Data Platform solutions and better understand the benefits of Big Data. If your curious how Modern Data Architectures can provide faster insights, please check out our free e-book, here.