cotgnitive services ebook front 1


eBOOK | October 2019

Discover applications that See, Hear, Speak, Understand, and Interpret data when applied to apps, websites, and bots.

Azure Cognitive Services are application programming interfaces (APIs), software development kits (SDKs), and services that enable developers to easily build smart applications. Applications that can see, hear, speak, understand, and even begin to reason are key to business transformation and growth. The Cognitive Services APIs are developed by need and split into five different uses, which we will cover in this eBook. In this preview, we discuss Vision APIs and Cognitive Services Labs. Download the full eBook for insight into the other cognitive service APIs available through Microsoft. 
Cognitive services-1


“Because the Cognitive Services APIs harness the power of machine learning, we were able to bring advanced intelligence into our product without the need to have a team of data scientists on hand.”
- Aaron Edell, Chief Product Owner | GrayMeta

  • Search - A cloud search service with built-in AI capabilities, Azure Search allows for easy identification and exploration of relevant content.
  • Speech - Azure Speech Services combine speech-to-text, text-to-speech, and speech translation in a single Azure subscription to enable applications, tools, and devices to transcribe, identify, and train custom language models.
  • Language - Language APIs ensure that apps and services can understand the meaning of unstructured text, and speakers’ voices.
  • Vision - Azure Vision Services allow apps to analyze content within images, videos, and digital ink, capturing valuable information and real-time data.
  • Decision - Decision APIs build apps that surface recommendations for informed and efficient decision-making, giving your software human-like capabilities through machine learning processes.


Recognize, identify, caption, index, and moderate your pictures, videos, and digital ink content.

Analyze an Image This tool retrieves data about visual content found in an image. By employing advanced optical character recognition in the Read operation, it can perceive embedded printed and handwritten text, extract recognized words into machine-readable character streams, and facilitate searching. The API recognizes over a million well-known figures spanning business, politics, sports, and entertainment, and nearly 10,000 landmarks worldwide. Object Detection can be used to get locations of thousands of objects within an image, as well as identify image types and color schemes within a picture. These features can also be used on video.

Custom Vision Use Custom Vision to tailor innovative computer vision models for matchless use cases, using just a few uploaded, labeled images. These images then have the ability to teach Custom Vision service the concepts that suit your business and use. Simple REST API calls are used to quickly tag images with new custom computer vision models, or can be simply exported to devices to run real-time image understanding.


Face Detection Face detection and verification can be used for security purposes, emotion recognition, face search, and face grouping. Use Face detection to perceive human  faces in images and return facial attributes that contain machine learning based predictions of facial features. The face attribute features include Age, Emotion, Gender, Pose, Smile,  and Facial Hair, along with 27 landmarks for each face in an image. Emotion recognition can  also be used to identify feelings such as anger,contempt, disgust, fear, happiness, neutrality,  sadness, and surprise. These emotions are understood to be cross culturally and universally  communicated by particular facial expressions.

Video Indexer Video Indexer can be used to receive insights on videos right away, without writing code, and makes videos more discoverable.


An impressive Microsoft case study from public consultancy group Black Radley and England’s Shrewsbury Museum & Art Gallery details using Vision APIs to create a device that tailors viewing experiences. An IoT device detects and photographs museum guests as they approach an exhibit. It uses Face API to determine a patron’s approximate age and gender, then plays audio meant to appeal to that individual’s age demographic. A viewer between the ages of 12 and 17 looking at a panoramic painting of Shrewsbury hears lively audio and generation- specific cultural references. Audio delivered to the 55- to 64-year age group is more formal and detailed. Motion sensors are used to stop the audio when a visitor walks away from the exhibit. The audio restarts after a new facial image has been captured.The device further allows the museum to track and analyze key data, including patron age, gender, and emotions by storing the images captured each day.

Vision case study infographic cognitiveservices ebook

The key steps in this complex solution involved creating a Universal Windows Platform (UWP) app that runs on an RP3 device running the Windows IoT operating system. Captured photos are sent to a proxy Web API that passed images through the Face API, gathering data, logging it for future analysis, and passing it to the UWP app. The main entry point – WebcamFaceDetector.xaml inside of App.xaml – initializes the FaceTracker object, starts a live webcam capture, executes face tracking, and retrieves results from FaceTracker. The Web API is then used to get demographic information from all frames that contain faces. An introductory audio file is played by the Voice Player while the images are added to the proxy API by using Post Image To API Async. The pause and duration of the introductory audio file gives the API time to return the appropriate demographic information and determine which file to play. Images are stored as patron objects in Azure table storage to be used in further research on how visitors interact with exhibits.

“Using a device like this takes visitor data collection to a new level, providing museums with much more detailed information about visitor behavior. This would allow museums to identify hotspots within the museum and to spot trends as they develop.” -Joe Collins, CTO | Black Radley


Cognitive Services Labs give an early look at emerging technologies. These are ever-changing; only becoming available once they have been tested and found by developers to hold enough value. Following are a few examples:

Cognitive Services Labs Tools
  • Project Gesture uses cutting-edge technology to create more intuitive and natural experiences by allowing users to control and interact with technology through hand gestures. The SDK enables you to define desired hand poses using simple constraints built with plain language. Once a gesture is defined and registered in your code, you receive a notification whenever a user completes that gesture and can select an action to assign in response.
  • Project Local Insights is an API used to help ‘score’ the attractiveness of a location, based on how many of a particular amenity are within a specific distance. Users can search by time or distance, while also taking into consideration predicted traffic at any given time.
  • Project Event Tracking is designed to help find events associated with Wikipedia entities. The API lives on an engine listening to the world and collecting new signals that report events. The engine provides evidence with structural information from different sources, both positive and negative, biased and impartial, and helps characterize events happening in the future, using signal clusters.
  • Project URL Preview uses industry-leading speed to inform users clicks by enabling creation of web page previews from a given URL. The API can also flag adult content to suppress the preview. The API returns a page description, preview image, and family friendly content flag.
  • Project Personality Chat enhances your bot’s conversational capabilities, by handling small talk, in line with a distinct, chosen personality. Using intent classifiers to identify common small talk intents and generating responses consistent with the personality used, the bot becomes more personable for target users.


Organizations that deploy Cognitive Services will work more efficiently, safely, and sustainably, and deliver more engaging and immersive experiences to their customers. BlueGranite can help your company leverage cognitive services to overcome your biggest workflow inefficiencies and bottlenecks, and transform your business.

We hope you have found this preview of the eBook useful. If you are looking to bring in new approaches, combined with proven techniques, to support decision making at all levels of your organization, download the full PDF version today!