Learning to Count Buildings in Diverse Aerial Scenes

Counting buildings in aerial scenes is an important yet challenging task. We propose to learn the relationship between building counts and low level features and infer building counts directly based on low level features. Although deep learning based approaches show promising performance on object segmentation, this method does not require expensive training and can deal with image where individual objects are difficult to be separated. The main contributions of this study are described as follows.

Learning from map data

Building footprints from GIS maps combined with images provide the data that can be used to learn the relationship between building counts and image features. However, it is very common that images and maps are not aligned well. We perform a cross-correlation between building footprints and image gradients, which greatly reduce misalignments. See below the alignments before (left) and after (right) correction.

Straight line extraction

We utilize straight line segments to estimate building numbers, because a major characteristic of buildings from an aerial view is straight edges. We follow the line support region framework proposed by Burns (1986), which identifies line support regions as spatially contiguous pixels with consistent gradient orientations, and estimates line parameters (orientation, centroid, and length) based on regions.

Previous work estimates line parameters based on boundary shapes of line support regions. However, region boundaries do not always reflect the actual orientations and locations of lines. We determine line orientations based on structure tensors and locate lines based on Hough transform. This method utilizes gradients of all pixels in a region and thus generates more reliable results. In the figure below, left is the result from the Unsanlan and Boyer method (2004), right is our result.

Line-building relationship

We collect a large number of image tiles and corresponding building counts. We find that for similar buildings there is a strong linear relationship between line numbers of building numbers. Here are a few examples.

This observation leads to a simple approach for counting buildings with similar appearances. We estimate a linear regression model between line numbers and building numbers based on a few examples and feed the total extracted line number to the model to obtain total building number. Below is an example for counting shelters in a refugee camp (within blue polygon), where line and building numbers in two red windows are used to estimate the regression model.

For the method dealing with scenes containing different types of buildings, please see the following paper.

Jiangye Yuan and Anil Cheriyadat, Learning to count buildings in diverse aerial scenes, ACM SIGSPATIAL GIS, 2014. [pdf]