Implementation of Data Mining on Rice Imports by Major Country of Origin Using Algorithm Using K-Means Clustering Method

Indonesia is a country where most of its people rely on the agricultural sector as a livelihood. Indonesia's rice production is so high that it cannot meet the needs of its population. Consequently, Indonesia still has to import rice from other food producing countries. One of the leading causes is the enormous population. Statistics show that in the range of 230-237 million people, the staple food of all residents is rice, so it is clear that the need for rice becomes very large[1].


I. Introduction
Indonesia is a country where most of its people rely on the agricultural sector as a livelihood.Indonesia's rice production is so high that it cannot meet the needs of its population.Consequently, Indonesia still has to import rice from other food producing countries.One of the leading causes is the enormous population.Statistics show that in the range of 230-237 million people, the staple food of all residents is rice, so it is clear that the need for rice becomes very large [1].
Cluster analysis is a multivariant technique with the primary objective of grouping objects based on the characteristics they possess.Today, cluster analysis has been applied in many fields written in various studies and journals [2].In the clustering method, the central concept is emphasized the iterative cluster center search, where the center of the cluster is determined by the minimum distance of each data in the center of the group [3].The data used in this study are based on import importation information documents produced by the Directorate General of Customs and Excise through the website https://www.bps.go.id.In this case, the researcher raised the topic of rice import by leading country of origin where the process of the method is clustering [4].The results of the cluster can be used as input for the Indonesian state as a form of mapping of the leading country of origin.The mapping process can be clustered into 3 (three) clusters, i.e., the highest import production, medium import production and low import production [5].
Endang Sugiharti in his research conducted clustering of lecturers data related to the activities and their performance by the implementation and responsibility of the K-Means method [6].Cluster in the research into the cluster Networking, Software Engineering, and E-Learning.The clustering method used in this research is the K-Means method [7].
K-means can also be defined as a Clustering method that is included in the partitioning approach.The K-Means algorithm is a centroid model [8].Centroid mode is a model that uses centroid to create clusters.The centroid is the midpoint of a cluster.The centroid is a value.The centroid is used to calculate the distance of a data object against the centroid [9].
A data object is included in the cluster if it has the shortest distance to the cluster's centroid.K-Means algorithm can be interpreted as a simple learning algorithm to solve a grouping problem that aims to minimize double faults [10].The purpose of this research is to apply K-Means in clustering rice imports by leading country of origin [11].

II. Method
In this study, the technique used is the method of data mining as follows.(A) Data collection stage, (b) Data processing stage, (c) Clustering stage and (d) Stage Analysis.Scenes in the method are further described as follows: In the application of data mining of imported rice production by the first country of origin, relevant data is required.Sources of research data obtained from the data collected by documents of importation of imports produced by the Directorate General of Customs and Excise through the site https://www.bps.go.id.The data used in this study is the data of rice imports by country of origin from 2000-2015 consisting of 10 countries namely Vietnam, Thailand, China, India, Pakistan, United States, Taiwan, Singapore, Myanmar, and Others.Variable used (1) the amount of rice import (net) and (2) the value of import purchases (CIF).The data will be processed by clustering rice imports based on central destination countries in 3 clusters, i.e., high imported cluster, medium introduced cluster, and low imported cluster level.The data that has been processed will be processed first to be clustered.In the previous stage, the data of each country of origin of rice imports will be summed in every aspect so that at this stage has been obtained the calculation of the value to be processed at the clustering stage.Clustering is an unattended classification and is a process of partitioning a set of data objects from one set into multiple classes.This can be done by applying various equations and steps about the distance of the algorithm, i.e., with Euclidean Distance.Cluster analysis is a method used to divide the data set into groups based on predetermined similarities.In determining the cluster based on the data already available, it takes a flowchart to facilitate in determining the flow of calculation as a groove to find the results of the application of the cluster to the data to be processed.Here is a flowchart in determining the cluster with K-Means.At this stage, the data analysis of rice imports by country of origin with the primary application of the tool.Rapid Minner.Rapid Miner is a machine learning environment for mining, text mining and predictive analytics.The data obtained is processed by using the weight calculation of each index.In the preceding stages, it has been determined to be clustered into 3 clusters of high imported clusters; medium introduced clusters and low import level clusters.At this stage will be analyzed the results.

III. Result and Discussion
In clustering, the data obtained will be calculated in advance based on the number of rice imports in 2000-2015 based on the leading country of origin.The sum result based on two assessment criteria is net weight and CIF value as shown in table 1.The data is then accumulated based on two criteria, i.e., net weight and (2) CIF value as shown in table 2 below: Once accumulated it will get the value of all rice imports by the destination country.Then the data will go into the clustering stage by applying the K-Means algorithm using rapid manner to cluster the data into three clusters.Accumulated data will be entered into the Rapid Minner tool.So it can be clustered into two.Once introduced into Rapid Minner.The results of data accumulation can be seen in table 2. In the application of the K-means algorithm the value of midpoint or centroid is obtained from the data collected with the provision that the desired clusterization is 3, the cluster determination is divided into three parts namely the high import level cluster (C1), the medium import cluster (C2) and the cluster level Low import (C3).Then the value of the midpoint or centroid also there is 3 points.The determination of the cluster point is carried out by taking the largest (maximum) amount for the high imported level cluster (C1), the average cost for the medium imported cluster (C2) and the smallest (minimum) amount for the low import level cluster (C3).The point value can be seen in Table 3 below: By using the centroid can be clustered data that has been obtained into 3 clusters.Cluster process by taking the closest distance from any data that is processed.From the import data of rice of the leading country of origin, the clustering was obtained in the iteration of 1 to 3 clusters.High import level clusters (C1), i.e.Vietnam and Thailand, medium imported clusters (C2), i.e.China, India and Others and low imported cluster C3, i.e.Pakistan, USA, Taiwan, Singapore, and Myanmar.The process of finding the shortest distance, the data grouping in iteration 1 and Clustering data can be described in the following tables and figures: In table 5, the K-Means process will continue to iterate until the data grouping equals the previous iteration data grouping.In other words, the process will continue iterating until the data in the last iteration is the same as the previous iteration.The iterative grouping graph 1 can be seen in the following figure: In iteration 1, it is obtained clusters of rice import data of the main country of origin, which can be seen in Figure 2. In iteration 2, the process of midpoint or centroid value for iteration two can be seen in Table 6 below: After getting the value of the midpoint or centroid, the same process is done by finding the closest distance.The process of finding the shortest distance, the data grouping in iteration 2 and Clustering data can be described in the following tables and figures: In iteration 3, the grouping of data conducted on 3 clusters with iteration 2 obtained the same result.Of the 10 main rice import data of the leading country of origin can be known, 2 Cluster of high import level (C1) ie Vietnam and Thailand, 4 clusters of moderate import level (C2) ie China, India Pakistan and Others and 4 clusters of low import level (C3) United, Taiwan, Singapore and Myanmar.

IV. Conclusion
To assess the import of rice the leading country of origin can be Is done by applying a K-Means clustering method.The data is processed to derive the value of the imported rice production of the leading country of origin.The data were processed using Rapid miner to determine centroid values in 3 clusters, i.e., high introduced cluster level (C1), medium imported clusters (C2) and low imported level clusters (C3).Centroid data for imported high-level clusters 7429179,9 and 2735452,25, Centroid data for medium import level clusters 1046359.5 and 337703.05and Centroid data for low import level clusters 185559.425and 53089.225.Hence, it was obtained the assessment based on rice import index with 2 clusters of high import level (C1), i.e., Vietnam and Thailand, four medium imported clusters (C2), ie China, India Pakistan and Others and 4 clusters of low import level (C3), ie USA, Taiwan, Singapore, and Myanmar.The results of the research can be used to determine the amount of rice imported by the leading country of origin.In the processing of data for clustering can provide weighting criteria for data produced more accurate.Also, it also needs to increase the accuracy of clustering on the data.Added rules can be made to get more accurate results in the clustering process.

Table 1 .
Rice Import Data by Main Country of Origin, 2000-2015

Table 2 .
Rice Import Accumulation Data

Table 3 .
Centroid Initial Data

Table 4 .
Calculation of cluster central gaps

Table 5 .
Grouping of iterative data 1

Table 7 .
Calculation of the center distance of the iteration cluster 2 Agus Perdana Windarto (Implementation of Data Mining on Rice Imports by Major Country of Origin Using AlgorithmUsing K-Means Clustering Method)