Grocery Item Detection using TensorFlow Object Detection API

It’s a very tedious job to stand in a queue at the checkout side of retail shops. It is taking a long time to scan all the products one by one and then generate a bill. Why one needs to waste their time whereas we have a better solution.


In this generation of artificial intelligence, I come up with a new solution that can really reduce the time of checkout and billing by 50%. What if all products which the customer bought, come together and scanned in less than a minute. Yah! Sounds interesting… Let’s do it.

In this article, we will:

  • Perform object detection on custom images using Tensorflow Object Detection API
  • Use Google Colab free GPU for training and Google Drive to keep everything synced.
  • Detailed steps to tune, train, monitor, and use the model for inference using your local webcam.

I have created this Colab Notebook if you would like to start exploring. It has all the codes step by step for a single class object detection. I suggest looking at it after reading this tutorial.

Let’s get started!

  1. Collecting Images and Labeling them.
  2. Environment Setup.
  3. Installing Requirements.
  4. Preprocessing Images and Labels.
  5. Downloading Tensorflow model.
  6. Generating TFRecords.
  7. Selecting a Pre-trained model.
  8. Configuring the Training Pipeline.
  9. Tensorboard.
  10. Training.

I will be using pictures of soft drinks. The dataset contains 800 pictures of MUG beer in various positions, rotations, and backgrounds. I have used the Labelimg tool for annotations. You may use your own images or use the dataset I am using here!

If you have your own images collected, great!. If not, you can collect images from google or you can take pictures from your mobile phone too, depending on your problem.

3 things to take care of while collecting your own images:

  1. At least 50 images for each class. The more, the better! Get even more if you are detecting only one class.
  2. Images with random objects in the background.
  3. Various background conditions; dark, light, in/outdoor, etc.

Save your images in a folder namedimages.

Once you have your images gathered, it’s time to label them. There are many tools that can help you with labeling your images. Perhaps, LabelImg is the most popular and easiest to use. Using the instructions from the Github repo, download and install it on your local machine.

Using LabelImg is easy, just remember to:

  1. Create a new directory for the labels, I will name it annotations
  2. In LabelImg, Click on Change Save Dir and select the annotations folder. This is where the labels/annotations will be saved.
  3. Click on Open Dir and select the images folder.
annotations using Labelimg

Each image will have one .xml file that has its labels. If there is more than one class or one label in an image, that .xml file will include them all.

Setup your google colab notebook.

  1. Create a new Notebook.
  2. Change runtime type to GPU from hardware accelerator.

Upload your dataset and annotations.

You will have to zip the images & annotationsfolders and simply move them to your notebook.

Structure of directories:

Structure of directories

Google Colab has most of the packages pre-installed already; Python, Tensorflow, pandas, etc.

These are the packages we will need and they don’t get pre-installed by default. Install them by running:

Importing Libraries:

Here, We need version 1.15.0 of TensorFlow to run a pre-trained model ssd_mobilenet_v2.

Splitting the images into training & testing:

Depending on how large your dataset is, you might want to split your data manually. If you have a lot of pictures, you might want to use something like this to split your data randomly.

Now we need to create two csv files for the .xml files. It will contain each image’s file name, the label /box position, etc. Also, more than one row is created for the same picture if there is more than one class or label for it.

We need one pbtxt file that will contain the label map for each class. This file will tell the model what each object is by defining a mapping of class names to class ID numbers.

Make sure that all the images are in .jpg format.

Working directory at this point:

Tensorflow model contains the object detection API we are interested in. We will get it from the official repo.

Next, we need to compile the proto buffers

Finally, run a quick test to confirm that the model builder is working properly:

If it gives you an “OK” after executing, then everything is going great!

Tensorflow accepts the data as TFRecords data.record. TFRecord is a binary file that runs fast with low memory usage. It contains all the images and labels in one file.

In our case, we will have two TFRecords; one for testing and another for training. To make this work, we need to make sure that:

  • The CSVs file names is matched:train_labels.csv and test_labels.csv (or change them in the code below)
  • Current directory is object_detection/models/research
  • Add your custom object text in the function class_text_to_int below by changing the row_label variable (This is the text that will appear on the detected object). Add more labels if you have more than one object.
  • Check if the path to data/ directory is the same asdata_base_url below.

A pre-trained model simply means that it has been trained on another dataset. That model has seen thousands or millions of images and objects.
COCO (Common Objects in Context) is a dataset of 330,000 images that contains 1.5 million objects for 80 different classes. Such as dogs, cats, cars, bananas, etc.

I will be using ssd_mobilenet_v2_coco model for my project. You could use any pre-trained model you prefer.

Let’s start with selecting a pretrained model:

Download the selected Pre-Trained Model:

While training, the model will get autosaved every 600 seconds by default. The logs and graphs, such as, the mAP, loss and AR, will also get saved constantly. create a folder for all of them to be saved in during training:

  • Create a folder called training inside object_detection/model/research/

Tensorflow Object Detection API model comes with many sample config files. For each model, there is a config file that is ‘almost’ ready to be used.

Required edits to the config file:

  1. model {} > ssd {}: change num_classes to the number of classes you have.
  2. train_config {}: change fine_tune_checkpoint to the checkpoint file path.
  3. train_input_reader {}: set the path to the train_labels.record and the label map pbtxt file.
  4. eval_input_reader {}: set the path to the test_labels.record and the label map pbtxt file.

Here you can visualize everything that’s happening during training. You can monitor the loss, mAP, AR and many more.

Visualization at Tensorboard

To use Tensorboard on Colab, we need to use it through ngrok. Get it by running:

Next, we specify where the log files are stored and we configure a link to view Tensorboard:

When you run the code above, at the end of the output there will be a url where you can access Tensorboard.

10. Finally… It’s Training!

It’s the simplest step if you have done all the above things correctly😉. We just need to give it the following 3 lines of code:

Now set back and watch your model’s performance on Tensorboard.

After the successful completion of training, you need to export and download the trained model.

by executing the following lines of code you will be able to export your model and then you can download it.

Results & Conclusion:

successfully detected soda botel

So, In this tutorial, I have tried to cover all the required steps to detect any object using Tensorflow Object Detection API for a single class. Here I am able to detect one grocery item which is MUG root beer soda. I will be looking forward to increasing the number of classes and try to add as many as objects at a single time. I hope you find this article helpful.