Microsoft Cognitive Services eBook

Discover applications that See, Hear, Speak, Understand, and Interpret data when applied to apps, websites, and bots.

Use artificial intelligence to solve business problems

Azure Cognitive Services are application programming interfaces (APIs), software development kits (SDKs), and services that enable developers to easily build smart applications. Applications that can see, hear, speak, understand, and even begin to reason are key to business transformation and growth. The Cognitive Services APIs are developed by need and split into five different uses, which we will cover in this eBook. In this preview, we discuss Vision APIs and Cognitive Services Labs. Download the full eBook for insight into the other cognitive service APIs available through Microsoft.

  • Search - A cloud search service with built-in AI capabilities, Azure Search allows for easy identification and exploration of relevant content.

  • Speech - Azure Speech Services combine speech-to-text, text-to-speech, and speech translation in a single Azure subscription to enable applications, tools, and devices to transcribe, identify, and train custom language models

  • Language - Language APIs ensure that apps and services can understand the meaning of unstructured text, and speakers’ voices.

  • Vision - Azure Vision Services allow apps to analyze content within images, videos, and digital ink, capturing valuable information and real-time data.

  • Decision - Decision APIs build apps that surface recommendations for informed and efficient decision-making, giving your software human-like capabilities through machine learning processes.



Recognize, identify, caption, index, and moderate your pictures, videos, and digital ink content.



Cognitive Services Labs give an early look at emerging technologies. These are ever-changing; only becoming available once they have been tested and found by developers to hold enough value.

eBook_Microsoft Cognitive Services_Landing Page



  • Analyze an Image: This tool retrieves data about visual content found in an image. By employing advanced optical character recognition in the Read operation, it can perceive embedded printed and handwritten text, extract recognized words into machine-readable character streams, and facilitate searching. The API recognizes over a million well-known figures spanning business, politics, sports, and entertainment, and nearly 10,000 landmarks worldwide. Object Detection can be used to get locations of thousands of objects within an image, as well as identify image types and color schemes within a picture. These features can also be used on video.

  • Custom Vision: Use Custom Vision to tailor innovative computer vision models for matchless use cases, using just a few uploaded, labeled images. These images then have the ability to teach Custom Vision service the concepts that suit your business and use. Simple REST API calls are used to quickly tag images with new custom computer vision models, or can be simply exported to devices to run real-time image understanding.

  • Face Detection: Face detection and verification can be used for security purposes, emotion recognition, face search, and face grouping. Use Face detection to perceive human  faces in images and return facial attributes that contain machine learning based predictions of facial features. The face attribute features include Age, Emotion, Gender, Pose, Smile,  and Facial Hair, along with 27 landmarks for each face in an image. Emotion recognition can  also be used to identify feelings such as anger,contempt, disgust, fear, happiness, neutrality,  sadness, and surprise. These emotions are understood to be cross culturally and universally  communicated by particular facial expressions.

  • Video Indexer: Video Indexer can be used to receive insights on videos right away, without writing code, and makes videos more discoverable.


  • Project Gesture: uses cutting-edge technology to create more intuitive and natural experiences by allowing users to control and interact with technology through hand gestures. The SDK enables you to define desired hand poses using simple constraints built with plain language. Once a gesture is defined and registered in your code, you receive a notification whenever a user completes that gesture and can select an action to assign in response.
  • Project Local Insights: is an API used to help ‘score’ the attractiveness of a location, based on how many of a particular amenity are within a specific distance. Users can search by time or distance, while also taking into consideration predicted traffic at any given time.
  • Project Event Tracking: is designed to help find events associated with Wikipedia entities. The API lives on an engine listening to the world and collecting new signals that report events. The engine provides evidence with structural information from different sources, both positive and negative, biased and impartial, and helps characterize events happening in the future, using signal clusters.
  • Project URL Preview: uses industry-leading speed to inform users clicks by enabling creation of web page previews from a given URL. The API can also flag adult content to suppress the preview. The API returns a page description, preview image, and family friendly content flag.


Download our eBook today!