Public Sector / Modern Data Platform

BlueGranite Infuses County with Data Lake Hydrator Enabling Rapid Data Transformation

SB_Census Data


A large county in Texas has been working with various departments to build a more data-informed government. A common challenge for government is finding a data platform that is stable enough to deliver consistent information while also being flexible enough to grow and scale based on the needs of the organization. In this county’s case, an initial goal was to start using Azure cloud data services to transform and deliver insights related to the criminal justice system more efficiently. Later, the County wanted to leverage the same platform to better understand revenue expectations for the tax department.

The Challenges

Slow Data Transformation
Many State and Municipal governments across the United States are working to become more data driven for purposes of greater efficiency, more effective policy making, and accountability, among other reasons. Demands on how to access data have only increased, sometimes putting pressure on the IT Department and the legacy systems that have held and managed data for decades.

An approach to resolving these issues is to move much of the work to the cloud where storage and processing power can scale to the demands of a specific government agency. Just saying, “move to the cloud” is easier said than done, but it is not impossible.

The County was struggling to transform its data on a nightly basis and as a result, their Power BI dashboards would often time out on refresh.

Criminal Justice System
The County’s IT team needed to deliver reports with metrics like daily inmate counts, costs per inmate, and financial impacts of prisoners being transported to County jails from other levels of government. Additionally, they had a vision of better understanding how to reduce incarceration to save the County money, inform judges of the consequences of their policies, and gauge the mental health impact of serving time in County jails. To do this, robust data pipelines and reporting infrastructure were needed.

Tax Revenue
Additionally, the population of the County has grown rapidly over the past decade, which impacts expected tax revenue. The County wanted to use US Census Bureau population estimates paired with historic tax payments to understand and predict tax revenue in the future. Ingesting data from the US

Census Bureau is not something the County IT team had done previously, nor had they paired data from a federal agency like the US Census Bureau with their own tax data. The County also did not want to build a new platform to manage this data. Instead, they wanted to leverage the platform built for the criminal justice project and tailor it to the tax project.

The Solution

After leading an Architectural Design workshop with the data team, BlueGranite was able to draw up a solution architecture leveraging Azure Data Services powered by BlueGranite’s Data Lake Hydration Framework.

Within the first week of the engagement, BlueGranite stood up the Data Lake Hydrator Framework in the client’s Azure environment and successfully landed most of the data required for the modern solution into Azure.

Criminal Justice
Prior to this project, the County was using an on-premises version of SQL Server to store and transform the data used for reporting. On a nightly basis, the data transformation process took upwards of eight hours, and often would fail or time out. BlueGranite, using its Catalyst Framework and Data Lake Hydrator, helped the County get up and running within Microsoft Azure in about two months.

The Data Lake Hydrator takes a snapshot of data in a source system and drops it into an Azure Data Lake Storage Gen2 resource. Then, using the code the County had already built, with minimal changes, BlueGranite was able to stage the data and ultimately put into an Azure SQL DB. This process, which previously took more than eight hours, now takes 2.5 hours and Power BI reports which used to time out, now take just 15 seconds to refresh.

The County now has the infrastructure set up in Azure so they can continue to incorporate data and more efficiently deliver reporting to their client agencies.

Tax Data
Because the Data Lake Hydrator had been deployed already for the criminal justice project, the BlueGranite team was able to leverage tools that already existed to kick off work for the tax project.

After meeting with the tax department and the City’s Geographic Information Services (GIS) team, BlueGranite was able to ingest tax payment history data and geographic data associating properties in the city with their respective census tracts. The BlueGranite team also recommended specific variables from the US Census Bureau’s American Community Survey that might be helpful in predicting future tax revenue. BlueGranite then ingested data from the US Census Bureau via its API through the Data Lake Hydrator.

From there, the team was able to integrate tax payment data, data from the GIS team, and US Census Bureau data into curated datasets ready for analytics and reporting, all living in Azure Synapse.

With datasets processed, BlueGranite was then able to move onto delivering a forecasting model to predict how much revenue the County was likely to collect in the future based on population growth. BlueGranite delivered a proof-of-concept machine learning model build in Azure ML, a reusable Jupyter notebook containing code for the model, and documentation for connecting the data in Azure Synapse to Azure ML.

An overview of the tools used in this solution are below:

Technology Business Outcome

Data Lake Hydrator

Setting up resources and moving data from source systems and REST APIs used to be challenging and would take days or weeks. With BlueGranite’s Data Lake Hydrator, it took hours.

Azure Data Lake Storage

The Data Lake centralized data from a variety of systems from different levels of government and freed up source system capacity. Previously, the County would build analyses and run queries from its production systems, saying that it could take minutes or sometimes hours to get a result.

Azure Synapse

Synapse set this County up with a scalable data warehouse solution that could be managed using SQL. In addition, using the serverless capability, there were no servers to worry about turning on or off and the cost was within their budget.

Azure ML

Azure ML allowed the County to quickly deploy and predict likely tax revenue based on population growth. Using Azure ML allows for the County to implement the model in their larger data pipelines so they can regularly see updates to the predictions.

Azure SQL Database


Azure SQL DB was used to both orchestrate stored procedures for the criminal justice part of the solution and then was also used as a final storage location for the tax data solution because it can manage geographic data types in a way that Azure Synapse currently cannot.

Power BI

The County was already using Power BI for its criminal justice work and was able to shift the data source to Azure SQL DB instead of their previous on-premises solution. Power BI was also used to visualize findings and metrics from the tax solution.

The Results

Prior to this project, the County was running all its processes against its production databases, potentially putting additional stress on legacy systems. By moving these processes to Azure, the bulk of the data transformation work no longer is completed against production databases, but instead uses the data lake as the source of the data. This ensures that data the County sees in its reports is both fresh and efficiently gathered and compiled.

Since this project focused on using data related to public safety and incarceration, it was critical that the information remain protected – any solution needed to follow Criminal Justice Information Services (CJIS) Security Policies. The project was completed in Azure Government Cloud which is in compliance with CJIS Security Policies, reducing any concern about moving data from on-premises to the cloud.

Government agencies often need to retain data for specific periods of time. Though not a specific use case that this project required, the use of the Azure Data Lake ensures a daily snapshot of data will remain in existence for as long as the County is required to keep it.

The County was also able to leverage the tools that had been deployed in phase 1 of the project to enable phase 2. This allowed for the tax data to be quickly ingested and built on, rather than spending additional time deploying a new set of tools. Going forward, the County will be able to continue to follow this pattern, easily ingesting data to the data lake, then curating and shaping the data for different use cases.

Interested in learning more about how BlueGranite can help accelerate your modern analytics solution? Contact us today.


Kickstart Your Digital Transformation With BlueGranite's Business Intelligence Solutions