Automatic Map Generation from High Resolution Images Applying Deep Learning Techniques
Introduction:
Automating
building detection has become imperative for urban planning and management,
with Unmanned Aerial Vehicle (UAV) imagery emerging as a valuable resource. The
utilization of high-resolution UAV images allows for intricate details in urban
landscapes to be captured with precision. Among the advanced methodologies, the
U-net architecture, originally developed for biomedical image segmentation, has
proven effective in semantic segmentation tasks. In the context of building
detection, U-net excels in capturing contextual relationships and preserving
spatial information. This approach significantly streamlines the mapping
process, offering an efficient and accurate solution for identifying building
structures. The integration of U-net with high-resolution UAV imagery enhances
the potential for automating building detection, paving the way for robust
applications in urban development, disaster response, and environmental
monitoring.
What is CNN?
Firstly, let’s understand
the basics of Convolutional Neural Networks (CNN).
If you’re new to the
world of neural networks, CNNs, and image classification, I recommend going
through these excellent in-depth tutorials:
- Introduction to Neural Networks (Free Course!)
- Demystifying the Mathematics behind Convolutional Neural
Networks (CNNs)
- Build your First Image Classification Model in just 10 Minutes
Motivation
• Traditional image segmentation techniques often
struggle with complex and variable object shapes, textures, and background
conditions as are the building patterns in India.
• They may require manual tuning of parameters and lack
the ability to capture hierarchical features effectively.
• Fully convolutional neural networks (FCNN) address
these limitations by automatically learning hierarchical features from data,
enabling them to handle more intricate patterns and achieve superior
performance in image segmentation tasks.
Methodology Flow:
- Surveying & Preprocessing of UAV images: DSMs and Orthophotos have been generated using images captured by UAV after processing using Photogrammetry techniques in Pix4D and Meta-shape software.
- Training Procedure: For generating the training dataset various features such as
buildings, water bodies, farms etc. have been digitized as vector data in
GIS environment. Georeferenced Images from high resolution satellites and
ortho-photos captured using UAV are being used for the digitization.
Vector data needs to be then converted into labeled raster
data. To increase the processing speed large images will be
broken into smaller patches using python library patchify before feeding
into neural network. It is proposed to use 70% of data to train the
network and 30% of data to check the accuracy.
- Semantic Segmentation: Deep learning technique learns the patterns in input data and
predicts various object classes. The main deep
learning architecture used for image processing is a
Fully Convolutional Network (FCN) framework. The convolution
layers extract the features from the input data by convolving the image
with learned filters, while pooling layers perform a dimensionality
reduction of these convoluted layers.
- Noise removal: In general, segmented images contain lot of noise. Morphological
filters are applied to combat the noise ie. Erosion followed by dilation
and
thresholding will be applied on the images to remove the noise from the
output.
- Splitting of individual features: To split connected feature
such as buildings, techniques such as
distance transformation (Jain, 1989) may be applied in the
segmented image. We assume that connections are much
smaller in area than the buildings. Thus, connections were
assigned less weight as compared to buildings.
- Vector map generation and smoothing: Segmented and cleaned image obtained from the
above procedure will be converted to vector data using gdal/ogr library
(GDAL/OGR, 2018). Georeferencing information will be preserved at each
step for correct overlaying of vector features. Vector data may be
simplified using Douglas-Peucker algorithm (Douglas, 1973) which is used
to simplify
the curve or line feature with similar curve or line but with
fewer points while preserving its shape and geometry. 3D vector maps will
be generated by extracting elevation information from DSMs.
- Accuracy assessment & development
of an open-source tool: Accuracy assessment will be
carried out by establishing a comparison between the area and elevation
parameters of the ground truth data and model derived vector features
using linear cost parameters (RMSE, MAE, MAPE etc) Moreover, an
open-source tool in the GIS environment will be developed for the entire
workflow. This tool will be useful for researchers and industry to utilize
the
methodology for their day to day applications.
Steps before model training:
- Annotating the images:
Annotation
is a process of labeling the pixels in an image with a class or category. For
semantic segmentation, annotation is done at the pixel level, meaning that each
pixel in an image is assigned a label indicating the object or region it
belongs to. Annotation is an important step for training a convolutional neural
network (CNN) for garbage detection, as it provides the ground truth labels
that the CNN learns to predict.
However,
annotation is also a challenging and time-consuming task, as it requires high
accuracy and consistency across different images and classes. Therefore,
annotation tools and methods must be carefully chosen and evaluated for garbage
detection using CNN.
I
have explored various tools for annotating image data which include:
1.
VGG Image Annotator
2.
Make Sense
3.
Labelme
4.
LabelImg
5. Label Studio
Out of the above 5, I found Label Studio to be the best tool for annotation.
- Installing Label Studio and Annotating images:
Label studio package can be installed
in Anaconda or any supporting virtual environment. As the Anaconda environment
is handier to use so, I installed this package on the Anaconda environment.
Code to install Label studio on a
virtual Anaconda environment: Run the following command in Anaconda Prompt:
conda env list
conda create -name project_Name pip
#Only once for installing
conda activate project_Name
pip install -U label-studio #Only once
for installing
label-studio
Then sign in using your email id and
set a password. After logging in you will be directed to an interface like
this:
Now you need to create a project by
clicking on the Create option.
Update the Project Name and description
accordingly the project.
Now upload the photos and files that
are selected for annotating. By clicking on the Upload files section.
After uploading the images, we can go
for labeling setup.
As per the need of our project we will
select the template for semantic segmentation. We have 2 options available here
for semantic segmentation one is by masks and the other by polygons. Creating
mask annotations, it needs high concentration and time, thus, we will go by
polygons option.
After setting everything up, we can
save and move for annotating the images. We will be directed to this type of
interface, where we need to select images individually for annotating.
Annotations can be done this way by
clicking on the desired labels:
After annotating the required images,
we can export the annotations in the following formats: JSON, JSON-MIN, CSV,
and COCO formats.
Masks is the desired format which is
unavailable so we will be exploring the masks in JSON-MIN format,
which we will convert to masks using a program.
- Conversion of annotated JSON
files to jpg files
For building the model, we need to convert JSON-MIN format to x.jpg or
x.png format. Thus we wrote a program that converts JSON to JPG format.
Here is the collab notebook which is used to convert JSON-MIN to JPG format.
For converting, we need to put the JSON-min file in the content and execute the
code, which is in the first section of the notebook.
The JPG files will be saved in the directory as you mentioned in the
last line of code. You can change it according to your convenience.
cv2.imwrite('DIrectory' + img_name ,image.astype(np.uint8))
So the mask images were saved in a folder according to the name we wanted to give.
WHY it is needed?
When
we saved the files in JPG format, then the files are saved in different
different folders and it will be difficult to train the model by making
multiple pipelines and ensuring data flow. Thus we need to make a single path
from which data will flow in the right manner (image and it’s related mask)
thus we need to make it flow from single directory.
- Patchified Model:
Why
do we need patchification?
As
the drone images are very large i.e., 3000 * 4000, which is very large. For
training these huge-size images are not possible in normal processing units. It
will need access to a supercomputer. To optimize this and solve the problem, we
have two options:
1.
Reduce the dimension of image:
In
this case, we will reduce the size of images from 3000*4000 to 576*576 and
train the model on this size images.
This
involves a problem of data loss, which leads to a reduction of the quality of
data. As the garbages present in the image is very less as compared to the
actual size of the image and compressing it further will be very dangerous for
training the model.
For
the above reason we will not use this process to build the model.
2.
Patchify image to small units:
For
training the model, we need to cut the images in small small fragments of 256 *
256 size images.
This
also involves some challenges:
a.
Missing the edges of images:
As
the image size is 3000 * 4000 and making image segments of 256 * 256 will lead
to missing the edge cases. We are ignoring this case for convenience to train
the model for now.
This
can be solved by rotating and flipping the images and patichifying that image.
b.
No labels in a large amount of images:
As
we have seen, the foreground and background ratios of labels are very low, this
makes a lot of patchified images with no labels. Training on these labels
which don’t have labels, will overfit the model and make the predictions
worse.
Thus
to tackle this problem, we will add a filter to select and save those patches
which contain a good amount of sample.
Results:
- Loss: Cross Entropy Loss
- Accuracy: Intersection over Union (IoU) score
- Binary Mask
- 2D Vectorized buildings of BHU
Codes available at:
https://github.com/rajbhatt1302/CNN-based-segmentation.git

Comments
Post a Comment