This post discusses new features released in May 2019: shared datasets, as well as dataset endorsements which include certified and promoted datasets. These features are interrelated and can play an important role in a self-service BI implementation of Power BI. When you have self-service users responsible for report creation activities, allowing those report authors to easily locate trustworthy data is imperative. These new features can make that process easier.
What are Shared Datasets?
A shared dataset is a data model which is reused, or shared, across many different reports. The related reports can reside in various workspaces within the Power BI Service. This new ability to span workspaces provides additional flexibility we have not previously had.
The existence of a shared dataset can become more apparent to report authors via the use of endorsements, which are discussed next.
What are Dataset Endorsements?
Certified and Promoted datasets are considered dataset endorsements. Endorsements are a property of a dataset:
At the time of this writing, there are two badges (other than default): Promoted and Certified. The meaning of a promoted dataset or a certified dataset is not strictly defined, so it is up to you at your organization to determine what the specific definitions are.
We recommend that the certification process be formally structured with specific subject matter experts who may certify a dataset. The promoted dataset badge may be less rigid, and is likely to be more user driven. In both cases, alignment with governance processes is very important to avoid the endorsements becoming meaningless, inconsistent, or confusing. Educating dataset authors from overusing the badges is also important.
What is the Dataset Discovery Experience?
When a report creator is working in Power BI Desktop and searches for a Power BI dataset that exists in the Power BI Service, they are presented with a dialog which represents the dataset discovery experience:
Once a dataset has been assigned a Promoted or Certified endorsement, reuse of this dataset is encouraged because its existence is more prominently displayed at the top of the dataset discovery dialog.
Why is it Important to Reuse Existing Datasets?
As discussed above, dataset endorsements influence the increased usage of shared datasets. In turn, the use of shared datasets impacts data reusability via the dataset discovery process. The frequent reuse of data by self-service report authors is generally seen as a very positive characteristic of successful self-service BI deployments. Following are some additional considerations:
Utilizing Shared Datasets in Power BI
A shared dataset may represent multiple subject areas comprising a larger consolidated data model. A single semantic layer is what's commonly thought of when we start talking about data reusability (which certainly has many benefits – data is inherently more valuable once it's related to other data). However, we don't have to focus only on a consolidated data model in order to gain massive benefits from shared datasets – especially if self-service BI is decentralized throughout the organization.
Shared datasets can be utilized very effectively for a consolidated data model, or within a single subject area like accounting. For instance, the following diagram illustrates use of a shared dataset called Accounting Transactions:
In the above diagram, a few noteworthy things are happening:
- The Accounting Data workspace (top left) is for the purpose of data only. Segregation of the data into its own workspace allows permissions to be defined for data management separate from report management. In this illustration, only a small number of people who are responsible for the accounting data may edit the contents of the Accounting Data workspace.
- The Corporate Data workspace (top right) is also for the purpose of data only. It will contain data which can be used across the organization, such as Date, Employees, Geography, Organization, and so forth. In this type of workspace, usually a centralized IT or the BI team have permission to edit the data.
- The two workspaces across the bottom, Accounting Monthly Close and Accounting Analytics, are report-driven workspaces owned by the functional area. Permissions for editing and viewing reports may be set with a focus on report authoring needs. Typically, only reports and dashboards reside in these workspaces, but it would be possible for a dataset to exist in one of these workspaces (not depicted above) if the dataset is highly specific with no potential for reuse.
- A linked Accounting Transactions dataset, which connects back to the original dataset, is displayed in each of the reporting workspaces. The workspace where the original dataset is located is where data processing will occur (which is particularly relevant when using Premium capacity).
- Report authors utilize a live connection to the Accounting Transactions dataset in order to take advantage of data reusability. As the name implies, a live connection causes the queries to go back to the original dataset and avoids re-importing the data into yet another dataset when it's not necessary to do so.
- Dataflows are depicted in the above diagram, but are not a requirement for the use of shared datasets. Since dataflows are another aspect of reusability, they are shown to illustrate what could be done. Dataflows become particularly useful if certain data will be utilized across many datasets.
Technical Steps for Taking Advantage of Shared Datasets
(A) New Workspace Experience. Shared datasets across workspaces (as depicted in the diagram above) are only supported with the new workspace experience. To be clear, in classic workspaces users can still use the live connection functionality in Power BI Desktop for referring to an existing dataset in Power BI; this technique still decouples reports from the dataset even if the objects all exist in the same workspace. Migration to the new workspace experience permits shared datasets to work across workspaces.
(B) Workspace Permissions. The workspace role assigned to each user (admin, member, contributor, viewer) is crucial for ensuring that the desired user experience across workspaces is achieved. Also, the build permission is required for any user who will be connecting to an existing dataset within Power BI Desktop via the live connection functionality.
(C) Tenant Setting: Use Datasets Across Workspaces.
Typically, we recommend the "Use datasets across workspaces" tenant setting be enabled for the entire organization. However, it can be enabled for selective groups if your implementation is more restrictive by design. See this post for the impact of restricting this functionality.
(D) Tenant Setting: Dataset Certification.
We recommend the ability to certify datasets be highly restricted to ensure that certified datasets are indeed trustworthy.
Subject matter experts (SMEs) who are permitted to certify content should be handled in conjunction with other data governance guidelines. Those who have the data knowledge, and thus should be permitted to certify datasets, should be a different group of people than the Power BI administrators.
(E) Dataset Property for Endorsement.
The endorsement settings, including Promoted and Certified, are associated with each individual dataset.
This setting impacts how the dataset is displayed in the dataset discovery process discussed earlier in this post.
Final Recommendations for Using Shared Datasets
Ensure that the Power BI authors in your organization are aware of the benefits of dataset reuse and how to utilize the live connection to a Power BI dataset when creating a report in Power BI Desktop.
Define what a promoted dataset means within your organization, and how it differs from a certified dataset. Ensure that your report creators are very clear on the meaning of each. Integrate these definitions with your governance plan and data catalog when possible.
Create a meaningful process for dataset certification which involves both a subject matter expert that knows how to evaluate data accuracy, as well as a Power BI expert who can attest to the quality of the data model. Ensure that when the certified badge is utilized it truly conveys trustworthiness and data quality.
Audit your permissions assignments on a regular basis to ensure that security objectives are being met, and that workspaces are being used as intended in order to achieve the best (yet secure) experience for authors and consumers in Power BI.
In addition to the considerations mentioned in this post, there are many more aspects of a successful Power BI implementation. Learn more about BlueGranite's Power BI deployment and adoption framework, or Power BI training with our team of experts.