Texton Based Segmentation for Road Defect Detection from Aerial Imagery

Road defect especially potholes and road cracks is caused by insufficient pavement thickness, insufficient drainage, failures at utility trenches and castings, and pavement defects and cracks that left unmaintained and unsealed [1]. Potholes grow from cracks on the road surface into several inches of widths or depths. Road defect can damage vehicles tires, wheels, and suspensions. In a highway, it can cause serious accidents. Therefore, the government releases an online complaint system that utilizes information system and GPS technology.

the area of defect using threshold on histogram based shape. From the segmentation result, they estimate the shape of defect using morphological operation and elliptic regression. Texture in the area of defect is compared to the surrounding area to determine the defect area. The experiment result shows good accuracy. Azhar et al. [3] detect potholes using shape based method. Histogram of Oriented Gradients (HOG) is used to extract feature descriptors from input image. Naïve Bayes is used to train and classify the feature to detect potholes. To localize potholes area after classification, a segmentation method using graph-cut is performed. The result shows 90% detection accuracy. Wang et al. [4] detect potholes into two steps. First, they extract wavelet energy from image to detect potholes using morphological operation and geometry criteria. Second, the detected potholes are segmented using Markov Random Field (MRF). Edge is extracted using edge detection method. From the experiment, the method shows 86.7% accuracy for various road condition.
Zhang et al. [6] detects potholes using stereo vision. They use disparity map and distance measurement from the road surface. The proposed system gives information of potholes position, size, and volume that helps corresponding agencies to perform analysis of damage. Kamal et al. [7] use Kinect sensor to detect potholes. Kinect sensor is used to visualize the depth of road surface and detect a curvature that assume as potential potholes. They use surface information and extract metrology to measure the level of damage.
Google in 2014 has patented a potholes detection system [9]. System is implemented on the vehicles by installing some sensors such as GPS and vertical accelerometer which sent back to Google server if vehicles passing through a pothole. Received data will be processed and the result is included in the Google map and navigation system so the user is informed and can choose safer route.
This work focuses on detecting road defect such as road cracks and potholes from aerial imagery. Our main contribution is to perform texton based segmentation to segment road area, SVM to detect the road defect, and blob analysis to locate, measure, and determine the type of defect. We use UAV to capture the road aerial imagery. The rest of this paper is organized as follows: Section 2 presents the proposed road defect detection, Section 3 presents the results and analysis, and Section 4 describes the conclusion of this work.

A. Road Defect Detection
In this research, road defects are detected based on texton. Texton is used to segment road area using K-Nearest Neighbor (K-NN) algorithm then classify the road defect using Support Vector Machine (SVM). We use road aerial imagery taken from Unmanned Aerial Vehicle (UAV) to cover large area. Fig. 1 shows the general procedure of the proposed method.
The proposed method consists of three steps. At the first step, texton features are extracted from training images. The features are trained using K-NN to generate road area detector and trained using SVM to generate road defect detector. At the second step, the accumulative energy extracted from the road area detector is used to segment the road area. After we find the road area, edges are extracted using edge detection and morphological operation is performed. SVM is used to detect the road defect from the edges area. Finally, blob analysis is performed to locate the road defect, measure the defect size, and determine the defect type. The methods used in this research will be explained in the following sub section.

B. Texton
Texton is a representation of textures generated from the result of clustering the filter responses into a small set of prototype response vectors [10]. Filter responses can be calculated by convolving filter banks on image [11,12]. In this research, we use Leung-Malik (LM) filter banks [11]. LM-filter banks are formed from 48 filters with multi-scale variation and multi-orientation. It consists of first and second derivatives of Gaussian filters at 6 orientations on 3 scales, 8 Laplacian of Gaussian (LoG) filters, and 4 Gaussian filters. An example of LM filter bank is shown in Fig. 2. LM-filter banks generate 48 filter responses that will be grouped using k-means clustering method. K-means clustering has been widely used on image segmentation task, one of them in [13]. The procedure of k-means clustering is explained below: 1. Initialize K number of cluster and cluster centers.
2. Calculate distance for each data to each cluster centers using distance metric. In this research, we use squared Euclidean distance metric and chi-squared distance metric.

3.
Assign each data to the nearest cluster centers using (1) for squared Euclidean distance or (2) for chi-squared distance where c(i) is the i-th distance data, xi is the i-th data, w is the weights associate with each dimension, and µj is the j-th cluster center.
4. After all data assign to its nearest cluster, calculate new cluster centers. 5. Repeat step 2 to 4 until converge or all cluster members are not changing label. 6. Get the final cluster centers and cluster labels.
Texton histogram is calculated from the cluster labels. For each pixel, the corresponding cluster label on that pixel and all cluster labels in its neighborhood within certain radius are used to compute texton histogram. The number of bin histogram is equal to the number of cluster. Texton features are depended on the order of cluster labels. Because k-means always return different cluster labels when using random initialization of cluster centers, texton should be computed using fixed initial centers. We sort the cluster labels based on the number of member within cluster on ascending order. This method will keep the cluster labels in order even if the initialization is random. Before extracting texton from grayscale image, the grayscale image need to be normalized to zero mean and unit variance.

C. Segmentation based on Texton
Image segmentation can be done based on color or texture. In this research, a texton based segmentation is used to segment road area. The procedure of road area segmentation is explained below: 1. Image is convolved using LM-filter banks to get filter responses.
2. Perform k-means clustering on filter responses with K1 number of cluster and using squared Euclidean distance metric.
3. Sort the centroid based on the number of member within cluster on ascending order.
4. Set energy on each pixel to zero. Higher energy means higher possibilities as road area.

5.
Train K-NN model using texton features extracted from training images. The model consists of several pairs of K1 centroid of filter responses called C1 and its corresponding pairs of K2 centroid of texton called C2 also the labels for each centroid.
6. Determine the number of sample for energy calculation. The samples are chosen among the C1 trained data that gives minimum distance from the centroid of test data in step 3.

For each samples do:
a. The filter responses on step 1 are grouped using k-means and C1 as initial centroid with 1 iteration.
b. For each pixel, the corresponding cluster label on that pixel and all cluster labels in its neighborhood within certain radius are used to compute texton histogram.
c. Texton histograms are grouped using k-means with K2 number of cluster and chi-squared distance metric. Chi-squared distance is used because it is good when dealing with histogram d. Centroid from step 7 (c) is classified using K-NN model in step 5 with C2 trained data to be labelled as road area or non-road area.
e. If the centroid belong to the road area, then the energy of each member within that cluster is added by one.
8. If the final energy is higher than threshold, then that pixel is belong to the road area.
Remove small road area and repair the shape of the road using morphological operation.

D. Road Defect Classification
Classification of road defect is done using SVM. SVM is machine learning method that use hyperplane. Hyperplane can be calculated by maxing the distance or margin between two objects from different class. In classification, SVM uses voting strategy to determine in which group the data belong to. To perform classification using SVM, the procedure is explained as follows.
1. Compute SVM kernel. (3) where is the class label, is the value 0 ≤ ≤ where is the reguralization parameter, ( , ) is the kernel that has input which is the input data, is the support vector model and is the bias.

Compute decision function using
3. Repeat step 1 and step 2 for the other class.
4. Determine the member of class from class function that gave maximum votes corresponding to the data.
The procedure to perform road defect detection on road area is explained as follows: 1. Perform edge detection to extract edges and calculate the mean of grayscale pixels on the road area. Edges are important to find the road defect because road cracks and potholes both tend to have strong edges.
2. Apply morphological operation dilation to the edges to get the potential defect area.
3. Remove road lines by thresholding the grayscale pixels using mean of road area. This step is used to minimize the false detection that caused by non-defect edges from road lines.
4. Train SVM model using texton features extracted from training images.
5. Perform sliding window over potential defect area to classify the area into road defect or normal road using SVM with previously trained road defect model in step 4.
Perform blob analysis to determine the defect location, to measure the defect size, and to determine the type of defect. Defect type is categorized as road cracks if the ratio of defect and its bounding box less than threshold otherwise it is categorized as potholes.

E. Performance Evaluation
In order to measure the performance of the proposed method, we use contour matching score or boundary F1 score (BF score) [14]. BF score is said to be a good measure for semantic segmentation. BF score measures how close the predicted boundary of an object matches the ground truth boundary using (4). The score range from 0 to 1 when score 1 indicates a perfect match.

III. Results and Analysis
The proposed texton based segmentation for road defect detection on road aerial image is written using Matlab and runs on laptop with processor Intel Core i-7 6700K, 16 GB of RAM, and NVidia GTX 1050. We use KITTI road dataset [15] as training data to segment road area and grab road defect image (potholes and road cracks) from the internet as training data to detect road defect.

A. Training Process
The training process produces road area detector and road defect detector. Road area detector is trained using K-NN with K = 5 and consists of several pairs of centroid from filter responses and several pairs of centroid from texton. Road defect detector is trained using linear-SVM. The clustering method to extract texton features for both models uses k-means with K = 40. The size of LM-filter banks kernel is 21x21 and the size of texton kernel is 5x5. Smaller kernel tends to produce smooth segments. Fig. 3 shows the road defect samples that used in this research.

B. Road Area Segmentation
Road area is segmented based on texture. Parameters that used to extract texton features are LMfilter banks size is 21x21, texton kernel size is 5x5, number of cluster (K) for k-means is 40, 5 samples of centroid set for energy calculation, and energy threshold is number of sample / 3. We remove small area which less than 32x32 pixels and apply morphological operation dilation with kernel size 32x32 to the potential road area. The largest area from potential road area is the final road area. Fig. 4 shows the result of road area segmentation (a) road aerial image, (b) the energy result from K-NN classification (the scale from lowest to highest is indicated by bluered color), and (c) the result of road area segmentation (magenta color marker). From Fig. 4(b), the road lines, grass and worn out asphalt have lower energy (blue color) because they have different texture than the road texture in the training images. From Fig. 4(c), the segmentation can cover most of the road area with 0.9884 BF score. The BF score is high because the road aerial image has a fine road texture with almost no defect on the road surface. On the road side, the rough grass texture is completely different with the road texture which makes the road shape is easier to be identified.  5 shows the other samples of road segmentation and its corresponding BF score. From Fig.  5(a), the road aerial image has rough road texture with many cracks and rough grass texture on the road side. The different textures make the road shape easier to be identified which resulted in 0.9108 BF score. From Fig. 5(b), the road aerial image has rough road textures with some defects, various grass textures on the left side, and soft soil texture on the right side. The result of road area segmentation has 0.7567 BF score. The moderate score is caused by false segmentation of similar textures, even though they have different color, that present on image such as soft grass texture on the left and soft soil texture on the right side of the road.

C. Road Defect Detection
Road defect is detected based on texture using linear-SVM. Assume that road defect has different texture than normal road then there is exist a strong edge in the road area. Therefore, we perform edge detection on road area using Canny edge detection. Because of the thinning characteristic of Canny edge detector, a morphological operation dilation with kernel size 21x21 is applied to expand the potential defect area. SVM is used to classify the potential defect area into defect or normal category. Small area is removed and blob analysis is performed to find the defect location, measure the defect size, and calculate the ratio of defect area and its bounding box to determine the defect type. The defect type is categorized as road cracks if the ratio is lower than 0.5 otherwise it is categorized as potholes. Fig. 6 shows the result of road defect detection. Fig. 6(a) is the road aerial image, (b) is the result of road area segmentation (shown by magenta color marker), (c) is the result of morphological operation on edge (shown by cyan marker), and (d) is the result of road defect segmentation along with information of defect location, defect size, and defect type (shown by cyan marker and green text and rectangle). Fig. 6(d) shows that the result of road defect detection on the image is several road cracks.
From Fig. 6(b), the proposed method is failed to cover all of the road area and resulted in 0.6217 of BF score. The moderate score is mainly caused by large damage on the road surface (shown by red circle marker). The damage is too large to be covered by morphological operation dilation with kernel size 21x21. Similar texture of road and gravels in the road side also contributes to lower the score. This poor segmentation of road area will affect the road defect detection in the next step.
From Fig. 6(c), the edge detection followed by morphological operation dilation with kernel size 15x15 is almost covers all the road cracks although there are some non-defect edges that caused by poor road segmentation. From Fig. 6(d), the result of road defect detection is only 0.6024 of BF score. The moderate score is caused by large road damage shown by red circle marker that does not belong to road area hence not considered as road defect by the proposed method. Fig. 7 shows the other samples of road defect detection on road aerial image. Fig. 7(a) shows that the result of road defect detection is several road cracks and has 0.7379 of BF score while Fig. 7(b) shows that the result is potholes and has 1.0000 of BF score.

IV. Conclusion
In this research, a method to detect road defect from aerial imagery is proposed. From the experiment, we can conclude that the proposed method can detect road defect if the defect is on early stage such as road cracks and small size potholes. The proposed method works well in detecting road defect inside the road area. Potholes can only be detected if the proposed method found closed contours and will fail otherwise (see Fig. 6(b)). From several tests, false detection can also be caused by shadow and object on the road. The present of object on the road can caused a hole in the road area segmentation and messed up the edge. Therefore, the acquisition of road aerial image should be done in less traffic and captured multiple times to accommodate the hole with other same image location but with different acquisition time. Shadow can be removed with certain algorithm. Overall, the proposed method can detect road cracks and potholes on road aerial image and gives information of defect location, defect size, and defect type. This method is effective because it covers large road area, do not disturb the traffic, and produce good BF score.