Automatic Map Generation from High Resolution Images Applying Deep Learning Techniques

Introduction:

Automating building detection has become imperative for urban planning and management, with Unmanned Aerial Vehicle (UAV) imagery emerging as a valuable resource. The utilization of high-resolution UAV images allows for intricate details in urban landscapes to be captured with precision. Among the advanced methodologies, the U-net architecture, originally developed for biomedical image segmentation, has proven effective in semantic segmentation tasks. In the context of building detection, U-net excels in capturing contextual relationships and preserving spatial information. This approach significantly streamlines the mapping process, offering an efficient and accurate solution for identifying building structures. The integration of U-net with high-resolution UAV imagery enhances the potential for automating building detection, paving the way for robust applications in urban development, disaster response, and environmental monitoring.


What is CNN?

Firstly, let’s understand the basics of Convolutional Neural Networks (CNN).

If you’re new to the world of neural networks, CNNs, and image classification, I recommend going through these excellent in-depth tutorials:


Motivation

•    Traditional image segmentation techniques often struggle with complex and variable object shapes, textures, and background conditions as are the building patterns in India. 

•    They may require manual tuning of parameters and lack the ability to capture hierarchical features effectively.

•        Fully convolutional neural networks (FCNN) address these limitations by automatically learning hierarchical features from data, enabling them to handle more intricate patterns and achieve superior performance in image segmentation tasks.


Methodology Flow:

  • Surveying & Preprocessing of UAV images: DSMs and Orthophotos have been generated using images captured by UAV after processing using Photogrammetry techniques in Pix4D and Meta-shape software.
  • Training Procedure: For generating the training dataset various features such as buildings, water bodies, farms etc. have been digitized as vector data in GIS environment. Georeferenced Images from high resolution satellites and ortho-photos captured using UAV are being used for the digitization. Vector data needs to be then converted into labeled raster data.  To increase the processing speed large images will be broken into smaller patches using python library patchify before feeding into neural network. It is proposed to use 70% of data to train the network and 30% of data to check the accuracy.
  • Semantic Segmentation: Deep learning technique learns the patterns in input data and predicts various object classes.  The main   deep learning architecture used for image   processing is a Fully Convolutional Network (FCN) framework.  The convolution layers extract the features from the input data by convolving the image with learned filters, while pooling layers perform a dimensionality reduction of these convoluted layers.
  • Noise removal: In general, segmented images contain lot of noise. Morphological filters are applied to combat the noise ie. Erosion followed by dilation and thresholding  will  be  applied  on  the  images  to   remove  the  noise  from  the output.
  • Splitting of individual features: To split connected feature such  as  buildings, techniques such as distance  transformation  (Jain,  1989)  may  be  applied  in  the segmented  image.  We assume that connections are much smaller in area than the buildings.  Thus, connections were assigned less weight as compared to buildings. 
  • Vector map generation and smoothing: Segmented and cleaned image obtained from the above procedure will be converted to vector data using gdal/ogr library (GDAL/OGR, 2018). Georeferencing information will be preserved at each step for correct overlaying of vector features. Vector data may be simplified using Douglas-Peucker algorithm (Douglas, 1973) which is used to simplify the  curve  or  line  feature  with  similar  curve  or  line  but  with fewer points while preserving its shape and geometry. 3D vector maps will be generated by extracting elevation information from DSMs.
  • Accuracy assessment & development of an open-source tool: Accuracy assessment will be carried out by establishing a comparison between the area and elevation parameters of the ground truth data and model derived vector features using linear cost parameters (RMSE, MAE, MAPE etc) Moreover, an open-source tool in the GIS environment will be developed for the entire workflow. This tool will be useful for researchers and industry to utilize the methodology  for  their  day  to  day  applications.  

Steps before model training:

  • Annotating the images:

Annotation is a process of labeling the pixels in an image with a class or category. For semantic segmentation, annotation is done at the pixel level, meaning that each pixel in an image is assigned a label indicating the object or region it belongs to. Annotation is an important step for training a convolutional neural network (CNN) for garbage detection, as it provides the ground truth labels that the CNN learns to predict.

However, annotation is also a challenging and time-consuming task, as it requires high accuracy and consistency across different images and classes. Therefore, annotation tools and methods must be carefully chosen and evaluated for garbage detection using CNN.

I have explored various tools for annotating image data which include:

1. VGG Image Annotator

2. Make Sense

3. Labelme

4. LabelImg

5. Label Studio

Out of the above 5, I found Label Studio to be the best tool for annotation.


  • Installing Label Studio and Annotating images:

Label studio package can be installed in Anaconda or any supporting virtual environment. As the Anaconda environment is handier to use so, I installed this package on the Anaconda environment.

Code to install Label studio on a virtual Anaconda environment: Run the following command in Anaconda Prompt:

conda env list

conda create -name project_Name pip #Only once for installing

conda activate project_Name

pip install -U label-studio #Only once for installing

label-studio

Then sign in using your email id and set a password. After logging in you will be directed to an interface like this:

Now you need to create a project by clicking on the Create option.

Update the Project Name and description accordingly the project.

Now upload the photos and files that are selected for annotating. By clicking on the Upload files section.

After uploading the images, we can go for labeling setup.

As per the need of our project we will select the template for semantic segmentation. We have 2 options available here for semantic segmentation one is by masks and the other by polygons. Creating mask annotations, it needs high concentration and time, thus, we will go by polygons option.

After setting everything up, we can save and move for annotating the images. We will be directed to this type of interface, where we need to select images individually for annotating.

Annotations can be done this way by clicking on the desired labels:

After annotating the required images, we can export the annotations in the following formats: JSON, JSON-MIN, CSV, and COCO formats.

Masks is the desired format which is unavailable so we will be exploring the masks in JSON-MIN format, which we will convert to masks using a program.

  • Conversion of annotated JSON files to jpg files

For building the model, we need to convert JSON-MIN format to x.jpg or x.png format. Thus we wrote a program that converts JSON to JPG format.

Here is the collab notebook which is used to convert JSON-MIN to JPG format. For converting, we need to put the JSON-min file in the content and execute the code, which is in the first section of the notebook.

The JPG files will be saved in the directory as you mentioned in the last line of code. You can change it according to your convenience.

cv2.imwrite('DIrectory' + img_name ,image.astype(np.uint8))

So the mask images were saved in a folder according to the name we wanted to give.

WHY it is needed?

When we saved the files in JPG format, then the files are saved in different different folders and it will be difficult to train the model by making multiple pipelines and ensuring data flow. Thus we need to make a single path from which data will flow in the right manner (image and it’s related mask) thus we need to make it flow from single directory. 

  • Patchified Model:

Why do we need patchification?

As the drone images are very large i.e., 3000 * 4000, which is very large. For training these huge-size images are not possible in normal processing units. It will need access to a supercomputer. To optimize this and solve the problem, we have two options:

1. Reduce the dimension of image:

In this case, we will reduce the size of images from 3000*4000 to 576*576 and train the model on this size images.

This involves a problem of data loss, which leads to a reduction of the quality of data. As the garbages present in the image is very less as compared to the actual size of the image and compressing it further will be very dangerous for training the model.

For the above reason we will not use this process to build the model.

2. Patchify image to small units:

For training the model, we need to cut the images in small small fragments of 256 * 256 size images.

This also involves some challenges:

a. Missing the edges of images:

As the image size is 3000 * 4000 and making image segments of 256 * 256 will lead to missing the edge cases. We are ignoring this case for convenience to train the model for now.

This can be solved by rotating and flipping the images and patichifying that image.

b. No labels in a large amount of images:

As we have seen, the foreground and background ratios of labels are very low, this makes a lot of patchified images with no  labels. Training on these labels which don’t have  labels, will overfit the model and make the predictions worse.

Thus to tackle this problem, we will add a filter to select and save those patches which contain a good amount of  sample.

Results:

  • Loss: Cross Entropy Loss
  • Accuracy: Intersection over Union (IoU) score



  • Binary Mask


  • 2D Vectorized buildings of BHU


Codes available at:

https://github.com/rajbhatt1302/CNN-based-segmentation.git

Comments