3Cloud has enjoyed several opportunities recently working with state and local governments to help enable modernization of their systems with Power BI to support everyday operations. Like many, these agencies working to serve their communities are also seeking to realize the benefits of modernization: advanced capabilities, potentially delivered at much lower costs than those of existing, aging solutions.  Yet they face unique challenges, and this article takes a technical dive into how to meet one such challenge.

Usage Auditing – Logins

Power BI helps government agencies manage their data for increased operational efficiency, often at lower cost than existing solutions.  As with all systems that manage sensitive data, government agencies are required to track and capture system usage details to ensure that the security and integrity of citizens’ data is maintained.

Detailed analysis of internal and federal standards applicable to the government agencies reveals that information to be captured for auditable events should include the:

  1. Date and time of the event.
  2. Component of the information system (e.g. software or hardware) where the event occurred.
  3. Type of event.
  4. User/Subject identity.
  5. Outcome (success or failure) of the event.

Additionally, the standards direct that agencies shall retain audit records for at least one year, and that once the minimum retention time period has passed, agencies shall continue to retain audit records until it is determined they are no longer needed for administrative, legal, audit, or other operational purposes.

To ensure the integrity of systems and sensitive citizens’ data, user logins are among some of the most fundamental events to track.

Solutions

Power BI Desktop

Power BI Desktop is a complete data analysis and report development application that is used on workstations.  It includes functionality to connect to many different kinds of data sources and then shape the data into integrated models.  It also includes functionality to analyze, format, and display the data in many kinds of interactive visuals.  A separate Report Builder application extends Power BI to include pixel-perfect, paginated reports.

Power BI Cloud Service

Once data connections and reports are developed on workstations, Power BI software allows data modelers and report authors to publish and share their works in the Power BI service. Each user is required to log in to a cloud-enabled account to publish to the Power BI cloud service. Government agencies have the option to subscribe to an exclusive Government Community Cloud that is a comprehensive suite of advanced cloud hosting, cybersecurity, and information management technologies that caters to public sector needs for additional security and U.S. data residency, as required by federal regulation.

Power BI Logins

Power BI has its roots in Microsoft’s suite of cloud-enabled office productivity tools, Office 365.  To maintain identity integrity across its many components, user authentication with Office 365 is implemented as a centralized service. Therefore, each “login” is an instance where the user provides credentials to access an app or service within Office 365. Logins are each recorded in the Office 365 Audit Logs and are available for reporting. Depending on an organization’s security settings, user sessions resulting from a single login can span extended periods of time with users navigating into, away from, and back to apps and services without requiring re-authentication. This is especially true when “Keep me logged in” options are enabled.

Retrieving Login Events

Power BI is part of the Office 365 composite application consisting of numerous apps and services. Activities are tracked within each app or service and then centralized into a Unified Audit Log for all Office 365 apps and services. Activities are stored in the unified log for a limited time and must be extracted and preserved long-term by security administrators in an external repository using an available REST API.

Three REST APIs are available to retrieve Power BI activity events for long-term storage:

  1. Office 365 Management REST API

The Office 365 Management REST API can be used to retrieve login events across all Office 365 apps and services. As described earlier, these events pertain to users’ access to Power BI based on administrative settings. Events up to seven days old can be retrieved;   these might be delayed up to 48 hours before appearing in the unified log for retrieval. Access to the Office 365 API requires Office 365 Global Administrator privilege. The API is called from a scripting language (such as PowerShell) using an OAUTH authentication pattern. This API support a large volume of events while offering limited options for retrieving a subset of the data.

  1. Exchange Online Rest API

The Exchange Online REST API can be used to retrieve login events across all Office 365 apps and services. As described earlier, these events pertain to users’ access to Power BI based on administrative settings. Events up to 90 days old can be retrieved, these might be delayed up to 48 hours before appearing in the unified log for retrieval. Access to the Office 365 API requires O365 Global Administrator privilege. The API is called from a PowerShell script using a convenient cmdlet. This API offers many options for retrieving a subset of the data (such as only logins) but is less able than other API options to support a large volume of events.

  1. Power BI Management REST API

The Power BI Management REST API can be used to retrieve Power BI events solely but doesn’t include login events. Events up to 30 days old can be retrieved; these might be delayed up to 30 minutes before appearing in the activity log for retrieval.  Access to the Power BI REST API requires Power BI Administrator privilege.  The API is called from a PowerShell script using a convenient cmdlet.  This API retrieves only Power BI activity events and offers additional options for retrieving subsets of the data, while also supporting large volumes of events over other API options.

Due to the nature of Office 365 activity log processing, usage of an available REST API will require an approach to handle late-arriving events. These are events that can take up to 48 hours to appear in the unified audit logs from which the available REST APIs retrieve activity events. Given that the REST APIs do not offer a method to identify and select only these late-arriving events, only periods older than 48 hours are guaranteed to contain all events. Periods retrieved less than 48 hours old may need to be retrieved again to ensure that they contain any late-arriving events. For example, a daily process may choose to retrieve activity for the past three days, accepting that data within the past 48 hours may be missing some late-arriving events that will be included in subsequent daily process re-runs.

Implementation

To demonstrate how Power BI can meet these challenges, the Exchange Online REST API can be used to retrieve login events. This API has a convenient cmdlet (Search-UnifiedAuditLog) that can be called from a PowerShell script, making it easier to implement and maintain than the Office 365 REST API. A basic PowerShell script to retrieve today’s logins:

Install-Module -Name ExchangeOnlineManagement -Scope CurrentUser
$conn = New-PSSession -ConfigurationName Microsoft.Exchange `
-ConnectionUri “https://ps.outlook.com/powershell/” `
-Credential (Get-Credential) `
-Authentication Basic `
-AllowRedirection
Import-PSSession -Session $conn -CommandName Search-UnifiedAuditLog -AllowClobber
$todayUTC = [System.DateTime]::UtcNow.ToString(‘yyyy-MM-ddT00:00:00.000’)
$response = Search-UnifiedAuditLog `
-RecordType AzureActiveDirectoryStsLogon `
-SessionCommand ReturnLargeSet `
-StartDate ([datetime]$todayUTC) `
-EndDate   ([datetime]$todayUTC + [timespan]”23:59:59.9999999″) `
-SessionID $todayUTC
Remove-PSSession -Session $conn

Some comments on the basic script:

  • Office 365 Global Administrator credentials can be used for the connection to Exchange. More limiting options are possible but require custom configuration.
  • The StartDate and EndDate parameters must be on the same day UTC.
  • The ReturnLargeSet session command enables the cmdlet to return the maximum number of events (50,000) for the interval. For very large organizations where there may be even more events than this in a single day, then the data would be retrieved in smaller increments.

For example:

    -StartDate ([datetime]$todayUTC + [timespan]”00:00:00.0000000″) `
    -EndDate   ([datetime]$todayUTC + [timespan]”11:59:59.9999999″) `    
    -SessionID “$todayUTC 00:00:00.0000000”
 

and

    -StartDate ([datetime]$todayUTC + [timespan]”12:00:00.0000000″) `
    -EndDate   ([datetime]$todayUTC + [timespan]”23:59:59.9999999″) `
    -SessionID “$todayUTC 12:00:00.0000000”
  • Each call to the Search-UnifiedAuditLog cmdlet returns an array of objects with key fields:

AuditLogsKeyFieldsTable

The array of objects returned may contain one, some, or all of the events found in the search.  Because of this, the call to the API must be repeated using the same SessionID until all records have been retrieved (hence why the SessionID should uniquely identify the interval).  The records may be returned out of order.

To complicate matters, the API self-limits the span and pace of activity.  There are varying limits on the number of sessions that can be established in a short time frame, as well as varying limits on the number of records that can be retrieved in a short interval.  When these varying limits are exceeded, the API will “reset” the stream of activities returned.

These varying limits only rarely disrupt the smooth retrieval of events from the API because the limits are fairly generous.  One strategy to manage these disruptions would be to try to avoid them altogether through introduction of purposeful delays in the pace of calls to the API.  However, this strategy unnecessarily increases the overall time it takes to retrieve all records – this can be a problem for large government agencies where the total number of activities can be very large and take an extended time to retrieve.  Because the disruptions are rare, a better strategy is to carefully monitor the sequence of calls to the API to detect when disruptions occur and then restart the retrieval.

A sample script to accomplish the monitoring, detection, and recovery under API self-limits:

#retry loop
do {
  if ($retry) {
    Write-Host ‘   Service API throttled/timeout … retrying …’
    $retry = $false
  }
  $resultCount = 0 #the number of results to retrieve
  $results = @{}   #accumulator for results
  #accumulate results loop
  do {
    #get a batch
    $response = Search-UnifiedAuditLog …
    #check results
    if (!$response) {
      #should be $null only if search found no records; otherwise there was a problem
      $retry = ($resultCount -ne 0)
    } else {
      #should return index 1 only in the first batch; otherwise there was a problem
      $retry = (($response[0].ResultIndex -eq 1) -and ($resultCount -ne 0))
    }
    if ($response -and !$retry) {
      #so far so good, set target number of results and accumulate unique results
      $resultCount = $response[0].ResultCount
      foreach ($result in $response) {
        $results += @{ $result.ResultIndex = $result } #build hash table of unique results
      }
    }
  } until (($results.Keys.Count -eq $resultCount) -or $retry)
} while ($retry)

Finally, the API appears to sometimes duplicate activity events that happen to occur at the top of an hour.  For example, an activity that happens at 01:00:00 sometimes will be returned by the API twice, each with the same unique event ID!  While this duplication persists, the final step to retrieving events from the API is to eliminate duplicates based on the unique event id.

Summary

Technology modernization delivers greater capability for less cost than existing, aging solutions. State and local governments face unique challenges in tracking and capturing system usage details to ensure that the security and integrity of sensitive data is maintained. This is especially true for very large government agencies that have many users and much activity. Power BI can meet these challenges by providing functionality and features that help these agencies meet tracking challenges and realize modernization goals.