Brief Introduction to AWS Rekognition

Michael Flores
4 min readMar 15, 2021

One of the last segments of the course mentioned in these previous blogs was about Amazon Rekognition. Even though it was discussed for merely half of this section, I was fascinated enough by this small tidbit that I wanted to delve deeper into how it works and how it is used.

Amazon Rekognition is a computer vision service that provides for automatic image and video analysis for identifying objects, such as faces or text. Conventional means for identifying objects of interest involves having individuals tediously going through multiple images and is prone to errors and increasing costs. This also provides difficulty in properly scaling if we are dealing with large quantities of images.

This is where Amazon Rekognition excels, by providing a highly accurate and scalable deep learning API. It is pre-trained and highly optimized for image and video recognition tasks. An important advantage of using Rekognition is that it does not require its users to be experts with computer vision or deep learning models. Rekognition has a wide variety of use cases, from usage in facial search and recognition, text detection, as well as moderation for inappropriate content.

The course provides examples of how we would use Rekognition for object detection within an image and text detection. In this first example, we are trying to identify bicycles as they pass by a camera. In order to utilize Rekognition using Boto3, we must first provide it with images. We can do this by first initializing the S3 client and then upload a file through its respective method.

After this, we construct the Rekognition client using Boto3.

For the purpose of object detection, we will call the detect_labels method. Within this method, we specify the image and two optional parameters: MaxLabels and MinConfidence. MaxLabels is the maximum number of labels or objects that we wish to have in our response and MinConfidence is the minimum confidence that Rekognition must have in accuracy for a detected label. In the example below, Rekognition will list only up to 10 labels and labels that were detected with a 95% confidence in its accuracy.

The response from detect_labels is an array of labels detected in the image. This response will list all labels detected within the image as well as the respective confidence for those labels. In addition, there is a list of label instances that contains bounding box information. The bounding box information corresponds to the label’s location within the image.

Sample of the detect_labels response for image below
Rekognition sees a bike, multiple cars, and a person in this image. Note the bounding boxes for each car, the bike, and the person.

Text detections works in a similar way, using the detect_text method and passing in the image. This response contains two types of detections: lines and words. A word is defined as any combination of characters that isn’t separated by a space. A line is a continuous string of words that has no more text after it. This can prove difficult if, for instance, we are dealing with text detection for two adjacent signs. The response also contains the geometry information of the bounding box for the text’s location on the image.

Image used for following examples below
All words detected in the image
All lines detected in the image

It was interesting to see how the complex and intricate field of computer vision can be made so accessible for business use cases. The speed and accuracy of Rekognition allows for large companies to help greatly reduce turnaround for recognition tasks. It also provides a way to perform image detection without requiring much prior knowledge of how machine learning models work. The documentation for Rekognition can be found here. In the next blog, I will look to cover how Amazon Comprehend works for sentiment analysis. Thank you for reading!

--

--