We’ll employ machine learning approaches to develop an image classification project with ESP32 CAM in this ESP32-CAM tutorial. The ESP32-CAM will be used to capture an image, which will subsequently be recognised using a machine learning model that has been trained. Clarifai’s image recognition AI models will be used for this project.

Connecting ESP32-CAM with FTDI programmer

We will require the following components for this project:

  1. ESP32-CAM development board
  2. FTDI Programmer/ USB Serial to TTL Converter
  3. Connecting Wires
  4. External 5V power supply (optional)

Unlike the ESP32 development board, the ESP32-CAM does not come with a USB port attached to it. So to upload a program sketch to the ESP32-CAM, we will need to use an FTDI programmer (USB to TTL Serial converter).

ESP32-CAMFTDI Programmer
5VVCC
UOE(GPIO3)TX
UOT(GPIO1)RX
GNDGND

To start up, connect the ESP32-CAM’s 5V pin to the FTDI programmer’s VCC pin. Both devices’ grounds will be connected in the same way. The FTDI programmer’s TX pin will be linked to the ESP32-CAM’s UOR (GPIO3). Similarly, the RX pin of the ESP32-CAM module will be linked to the UOT (GPIO1).

A brown-out detector error occurs on some ESP32-CAM boards due to insufficient power provided by the FTDI cable. In that situation, connect ESP32 to an external 5V power supply as illustrated below:

Connecting with Clarifai

“Clarifai Inc. is a computer vision firm that employs machine learning and deep neural networks to detect and analyze images and videos,” says the company.

Obtain the API key that we will use to configure our ESP32-CAM to successfully connect with the Clarifai platform after successfully setting up your free account.

To generate a new API key, go to the API Keys area and select ‘Create new API key.’ Make a note of it.

Setting up Arduino IDE

Before we continue, make sure you have the most recent version of the Arduino IDE installed on your computer. Additionally, in the Arduino IDE, you should install an ESP32 add-on. 

Installing ArduinoJSON Library

Since we’ll be working with JSON script, you’ll need to install the ArduinoJSON library by Benoit Blanchon. By going to Sketch > Include Library > Manage Libraries, you may access the Arduino Library Manager. In the search tab, type ‘ArduinoJSON’ and hit enter. Install the version 6.17.2 of the library.

ESP32-CAM Image Classification Arduino Sketch

Open your Arduino IDE and go to File > New to open a new file. Copy the code given below in that file. For this code to work with your ESP32-CAM board you will have to replace the Wi-Fi network credentials and the API key from Clarifai.

How does the Code Works?

Now, let us understand how each part of the code works.

Including Libraries

To begin, we’ll incorporate all of the key libraries that are required for this project.

Our ESP32 module is connected to the local WIFI network using the WiFi.h library. The JSON script will be written in ArduinoJSON.h. To initialise the ESP32-CAM, you’ll need Arduino.h and esp camera. We’ll also need base64.h to encode the image to Base64 format and HTTPClient.h to connect to the machine learning platform successfully.

#include “Arduino.h”

#include “esp_camera.h”

#include <HTTPClient.h>

#include <ArduinoJson.h>

#include <base64.h>

#include <WiFi.h>

Following that, we’ll establish two global variables, one for the SSID and the other for the password. These will be utilised to connect to our wireless router and will save our network credentials. To ensure a successful connection, replace both of these with your network credentials.

const char* ssid = “YOUR_SSID”;

const char* password = “YOUR_PASSWORD”;

The following definitions are for OV2640 camera module pins. We are using CAMERA_MODEL_AI_THINKER.

#define CAMERA_MODEL_AI_THINKER // Has PSRAM

#define PWDN_GPIO_NUM     32

#define RESET_GPIO_NUM    -1

#define XCLK_GPIO_NUM      0

#define SIOD_GPIO_NUM     26

#define SIOC_GPIO_NUM     27

#define Y9_GPIO_NUM       35

#define Y8_GPIO_NUM       34

#define Y7_GPIO_NUM       39

#define Y6_GPIO_NUM       36

#define Y5_GPIO_NUM       21

#define Y4_GPIO_NUM       19

#define Y3_GPIO_NUM       18

#define Y2_GPIO_NUM        5

#define VSYNC_GPIO_NUM    25

#define HREF_GPIO_NUM     23

#define PCLK_GPIO_NUM     22

setup()

Inside the setup() function, we will open a serial connection at a baud rate of 115200.

Serial.begin(115200)

The code below will connect our ESP32-CAM board to the local network using the network credentials we specified earlier. The WiFi.begin() function will be used. The SSID and password that we defined earlier in the code will be the parameters. “WiFi Connected!” appears on the serial monitor after a successful connection is established.

WiFi.begin(ssid, password);

  while (WiFi.status() != WL_CONNECTED) {

    delay(500);

    Serial.print(“.”);

  }

  Serial.println(“”);

  Serial.println(“WiFi Connected!”);

The following code sets the OV2640 camera module and the settings required for the photo capturing.

camera_config_t config;

  config.ledc_channel = LEDC_CHANNEL_0;

  config.ledc_timer = LEDC_TIMER_0;

  config.pin_d0 = Y2_GPIO_NUM;

  config.pin_d1 = Y3_GPIO_NUM;

  config.pin_d2 = Y4_GPIO_NUM;

  config.pin_d3 = Y5_GPIO_NUM;

  config.pin_d4 = Y6_GPIO_NUM;

  config.pin_d5 = Y7_GPIO_NUM;

  config.pin_d6 = Y8_GPIO_NUM;

  config.pin_d7 = Y9_GPIO_NUM;

  config.pin_xclk = XCLK_GPIO_NUM;

  config.pin_pclk = PCLK_GPIO_NUM;

  config.pin_vsync = VSYNC_GPIO_NUM;

  config.pin_href = HREF_GPIO_NUM;

  config.pin_sscb_sda = SIOD_GPIO_NUM;

  config.pin_sscb_scl = SIOC_GPIO_NUM;

  config.pin_pwdn = PWDN_GPIO_NUM;

  config.pin_reset = RESET_GPIO_NUM;

  config.xclk_freq_hz = 20000000;

  config.pixel_format = PIXFORMAT_JPEG;

  // if PSRAM IC present, init with UXGA resolution and higher JPEG quality

  //                      for larger pre-allocated frame buffer.

  if(psramFound()){

    config.frame_size = FRAMESIZE_QVGA;

    config.jpeg_quality = 10;

    config.fb_count = 2;

  } else {

    config.frame_size = FRAMESIZE_QVGA;

    config.jpeg_quality = 12;

    config.fb_count = 1;

  }

Initializing ESP32-CAM:

  esp_err_t err = esp_camera_init(&config);

  if (err != ESP_OK) {

    Serial.printf(“Camera init failed with error 0x%x”, err);

    return;

Next, we will call the classify() function that will be responsible for image classification

classify();

Additionally, we will keep the ESP32-CAM in deep sleep mode afterwards and it will wake up after a RESET.

 Serial.println(“\n Going to Sleep…”);

  esp_deep_sleep_start();

Image Classification

The classify() function will first capture the image, encode it in base64 and then apply image recognition to it.

void classify() {

   camera_fb_t * fb = NULL;

   fb = esp_camera_fb_get();

   if(!fb) {

    Serial.println(“Camera capture failed”);

    return;

   }

  size_t size = fb->len;

  String buffer = base64::encode((uint8_t *) fb->buf, fb->len);

  String payload = “{\”inputs\”: [{ \”data\”: {\”image\”: {\”base64\”: \”” + buffer + “\”}}}]}”;

  buffer = “”;

  Serial.println(payload);

  esp_camera_fb_return(fb);

  String model_id = “aaa03c23b3724a16a56b629203edc62c”;  //General Model

  //String model_id = “bd367be194cf45149e75f01d59f77ba7”;  //Food Model

  HTTPClient http;

  http.begin(“https://api.clarifai.com/v2/models/” + model_id + “/outputs”);

  http.addHeader(“Content-Type”, “application/json”);     

  http.addHeader(“Authorization”, “Key 16f848599c3c4c5e8c8b5c15f4c4a457”); 

  int response_code = http.POST(payload);

  String response;

  if(response_code >0){

  Serial.print(response_code );

  Serial.print(“Returned String: “);

  response = http.getString();

  Serial.println(response);

 } 

else {

 Serial.print(“POST Error: “);

 Serial.print(response_code);

return;

}

const int jsonSize = 2*JSON_ARRAY_SIZE(0) + JSON_ARRAY_SIZE(1) + JSON_ARRAY_SIZE(20) + 4*JSON_OBJECT_SIZE(0) + 7*JSON_OBJECT_SIZE(1) + 5*JSON_OBJECT_SIZE(2) + JSON_OBJECT_SIZE(3) + 21*JSON_OBJECT_SIZE(4) + JSON_OBJECT_SIZE(5) + JSON_OBJECT_SIZE(6) + JSON_OBJECT_SIZE(7) + JSON_OBJECT_SIZE(18)+ 3251;

DynamicJsonDocument doc(jsonSize);

deserializeJson(doc, response );

for (int i=0; i < 10; i++) {

  const String name = doc[“outputs”][0][“data”][“concepts”][i][“name”];

  const float prob = doc[“outputs”][0][“data”][“concepts”][i][“value”];

  Serial.println(“________________________”);

  Serial.print(“Name:”);

  Serial.println(name);

  Serial.print(“Probability:”);

  Serial.println(prob);

  Serial.println();

}

}

We will first capture an image with ESP32-CAM by using the esp_camera_fb_get() method. The following lines enable us to do that.

  camera_fb_t * fb = NULL;

   fb = esp_camera_fb_get();

   if(!fb) {

    Serial.println(“Camera capture failed”);

    return;

   }

Then we encode the image in base64 format:

 size_t size = fb->len;

  String buffer = base64::encode((uint8_t *) fb->buf, fb->len);

  String payload = “{\”inputs\”: [{ \”data\”: {\”image\”: {\”base64\”: \”” + buffer + “\”}}}]}”;

  buffer = “”;

  Serial.println(payload);

After that, we will connect to the Clarifai platform to use its pre-trained model for image recognition.

Here we have specified the general model ID taken from Clarifai for Image Recognition AI (https://www.clarifai.com/models/image-recognition-ai). This will enable us to classify 11,000 different concepts including objects, themes, and much more.

String model_id = “aaa03c23b3724a16a56b629203edc62c”;  //General Model

Moreover, you can use other pre-trained models provided by Clarifai as well including food models, face detection models, and much more. (https://www.clarifai.com/developers/pre-trained-models). Just define the model ID.

For example, if classifying foods you can use the model ID associated with the Food Model:

//String model_id = “bd367be194cf45149e75f01d59f77ba7”;  //Food Model

Now initiate the connection between the Clarifai platform by providing your API key and the model ID.

HTTPClient http;

  http.begin(“https://api.clarifai.com/v2/models/” + model_id + “/outputs”);

  http.addHeader(“Content-Type”, “application/json”);     

  http.addHeader(“Authorization”, “Key 16f84859*******e8c8b5c15f4c4a457”); 

Moreover, transfer the base64 encoded image now stored in payload to the cloud machine learning platform.

int response_code = http.POST(payload);

  String response;

  if(response_code >0){

  Serial.print(response_code );

  Serial.print(“Returned String: “);

  response = http.getString();

  Serial.println(response);

 } 

else {

 Serial.print(“POST Error: “);

 Serial.print(response_code);

return;

}

Then, to manage the concepts derived from the image, we’ll utilise the JSON library. These will include various labels that have been assigned to the image based on a likelihood value. To categorise the image, we’ll display these on the serial monitor.

const int jsonSize = 2*JSON_ARRAY_SIZE(0) + JSON_ARRAY_SIZE(1) + JSON_ARRAY_SIZE(20) + 4*JSON_OBJECT_SIZE(0) + 7*JSON_OBJECT_SIZE(1) + 5*JSON_OBJECT_SIZE(2) + JSON_OBJECT_SIZE(3) + 21*JSON_OBJECT_SIZE(4) + JSON_OBJECT_SIZE(5) + JSON_OBJECT_SIZE(6) + JSON_OBJECT_SIZE(7) + JSON_OBJECT_SIZE(18)+ 3251;

DynamicJsonDocument doc(jsonSize);

deserializeJson(doc, response );

for (int i=0; i < 10; i++) {

  const String name = doc[“outputs”][0][“data”][“concepts”][i][“name”];

  const float prob = doc[“outputs”][0][“data”][“concepts”][i][“value”];

  Serial.println(“________________________”);

  Serial.print(“Name:”);

  Serial.println(name);

  Serial.print(“Probability:”);

  Serial.println(prob);

  Serial.println();

}

ESP32 CAM Image Classification Demo

We’re now ready to compile the code and upload it to our ESP32-CAM. Check that the FTDI programmer is properly connected to the module and that GPIO0 is also grounded.

Before uploading your code to the ESP32-CAM board, make sure you’re using the correct board and COM port. Select ESP32 AI Thinker from Tools > Board.

Next, go to Tools > Port and select the appropriate port through which your board is connected.

After you have successfully uploaded your code to the board, remove the connecting wire from GPIO0 and GND.

Now open the serial monitor. In a few moments, the Wi-Fi will get connected, and the image will get captured, encoded, and then sent to the Clarifai platform.

author avatar
Aravind S S