The Analysis of Research Trends and Public Awareness of Smart Farms using Text Mining
Article information
Abstract
Background and objective
To deal with recent issues such as climate change, rural aging, food security, and the Fourth Industrial Revolution, there is a growing interest in smart farms that can efficiently produce food with ICT. In response to the international issues, this study analyzed articles on smart farms published in international journals and KCI journals as well as Instagram hashtags through text mining and identified relevant research trends and public awareness.
Methods
This study collected total 584 articles on smart farms from 2010 to 2021 and hashtags in Instagram posts uploaded in 2021. To improve the reliability of the analysis results, nouns were exclusively extracted from the abstracts and hashtags, and data preprocessing was performed by removing nouns that appear customarily and combining synonyms. After that, we analyzed frequency, degree centrality, betweenness centrality, and topic modeling.
Results
The analysis results of words with high frequency and centrality by research data are as follows. KCI and international journal articles had a tendency to mainly focus on ICT system development for efficient operation of smart farms. However, KCI articles considered relevant policies to establish the technologies. On the other hand, international journal articles tended to conduct research on smart farms in a wider area of agricultural fields than KCI articles. The main topics on Instagram were diet food, rural migration, and urban agriculture. This result shows that healthy food, experiences, and education through smart farms are gaining public interest.
Conclusion
Currently, there is insufficient analysis of research trends in smart farms. In this vein, this study has significance as it included academic trends and public awareness by considering both research articles and Instagram posts. We expect the results of this study to be used as useful data for decision making to set the research and policy directions required to advance smart farms in the future.
Introduction
According to the United Nations (UN) World Population Prospects, the global population could grow to 9.7 billion in 2050, showing a 26% increase compared to 2022 (UN, 2022). The World Resources Institute (WRI) reported that the world will need to produce about 69% more food in 2050 compared to 2006 to prepare for the growing population (WRI, 2013). However, despite these prospects, the stable production of food worldwide is becoming uncertain due to changes in the cropping system brought by climate change, increase in pests and diseases, and the aging rural population (Lee, 2012a; 2020). Accordingly, there is an increasing global need for smart farms that automatically observe the growth environment of crops and efficiently promote optimal quality and production by applying information and communications technology (ICT) to the existing agricultural production methods.
Smart farming refers to agriculture that can maintain and manage an appropriate growth environment for crops such as temperature, humidity, and amount of light based on ICT and sensors (Choi and Jang, 2019; Kim, 2020; Oh et al., 2022). Since smart farms can check crop growth data in real time without space and time constraints, it is beneficial for increasing crop productivity, improving quality, and reducing labor costs. Despite the barren soil environment not suitable for crop production and labor shortage, the Netherlands has established itself as the world’s second largest agricultural exporter after the United States with advanced smart farming technology established since the late 1970s (Choi and Jang, 2019). Crop production per unit area in the Netherlands is reported to be 40% higher than South Korea (MAFRA, 2014; 2016; Yeo et al., 2016).
As of 2020, the size of the global agri-food market is $ 7.71 trillion (aT, 2022), which is 2.2 times that of the automobile market (IBISWorld, 2022). The global smart farm market size is estimated at $9 billion as of 2020, which is expected to increase by 20% on average each year (Juniper Research, 2020). With the recent increase in social demands for smart farms, various related studies have been conducted over the past few years (Ma et al., 2015; Quiroz and Alférez, 2020; Verdouw et al., 2021; Rodríguez et al., 2021). Recent studies are focusing on developing ICT necessary for operations, such as deep learning image analysis and smart farm operation system development. Ma et al. (2015) identified whether diseases occur through image analysis of crops in a greenhouse. Quiroz and Alférez (2020) proposed a model that recognizes images of Legacy Blueberry, a variety that accounts for 80% of Chilean blueberry production, through deep learning based on the convolutional neural network (CNN). Verdouw et al. (2021) found a conceptual framework for implementing digital twins in smart farms and applied it to the European IoF 2020 project. Rodríguez et al. (2021) implemented the Iot-Agro platform based on 3-tier architecture (Agriculture Perception, Edge Computing, Data Analytics) to provide optimal smart farm solutions for coffee farms in Colombia.
Meanwhile, identifying the research trends and public awareness from the past to the present is useful for finding matters that are insufficient ore must be supplemented in the relevant field and setting future research and policy directions (Park and Na, 2016; Park and Bae, 2018). Most studies on the trends are literature analyses focused on researchers or topics, which have limitations in determining which topics of studies are conducted in particular. Text mining, which is a field of data mining, can be used to examine the topics and trends of the field from multiple angles using vast text data.
Accordingly, research trend analysis using text mining techniques is being conducted in various academic fields such as food, ecology, geography, and agriculture. Bae et al. (2013) analyzed the frequency of occurrence of words used in abstracts of articles on food in relation to climate change using text mining techniques and extracted keywords and trends of interest in related studies. Kim and Lee (2018) extracted the abstracts of KCI articles related to ecological restoration technology and pursued future development plans for ecological restoration technology through social network analysis such as time series trends, major keywords, and degree centrality. Park and Bae (2018) extracted abstracts of KCI articles related to DMZ conducted over the past 15 years and analyzed frequency, language network, and topics by applying text mining techniques. Oh et al. (2022) identified research trends using keywords, language network, and topic modeling with KCI articles related to smart farms. Kim et al. ( 2021) collected comments and hashtags related to SW education on Facebook and Instagram, and then identified public awareness using major keywords and topic modeling.
As such, there are very few research trend analyses in smart farms, and they have limitations in that they considered only KCI articles. To secure future food sources related to smart farms and secure competitiveness in the rapidly changing international society, it is necessary to identify and prepare for the paradigm shift in international R&D. Meanwhile, the main users of these study results are the nation and the local governments and ultimately the public, and it is desirable to consider public awareness of consumers to promote practical use of the study results. Therefore, the purpose of this study is to identify an integrated research trend using text mining techniques on international journal articles and KCI articles using smart farm, which is a matter of global concern, as a keyword. In addition, this study also collected hashtags of posts related to smart farms on Instagram, considering public awareness about smart farms in addition to academic research trends.
Research Methods
Data collection
To collect articles for this study, we searched the keyword ‘smart farm’ (Korean and English) on Web of Science and Korea Citation Index (KCI) and extracted about 586 articles published from 2010 to 2021 (481 international journal articles, 105 KCI journal articles). The reason for setting the publication period from 2010 is because there are no articles published with smart farm as a keyword in South Korea before 2010. In South Korea, studies had already been conducted on facility agriculture in the 1990s, but the term ‘smart farm’ has begun to be used fully since the 2010s. Among articles primarily collected through the procedures above, only the articles submitted to SCI-level international journals (hereinafter referred to as international journals) on Web of Science and the articles submitted to KCI journals were extracted. Total 584 articles (479 international journal articles, 105 KCI journal articles) were finally selected excluding articles without abstracts and keywords or articles not related to smart farms. The bibliographic information of articles that can be used in research trend analysis includes keywords and abstracts. This study used abstracts, which include more details about research compared to keywords, as the data for text mining.
To identify public awareness about smart farms, we collected posts uploaded by entering smart farm in the Instagram search bar from August 20 to 30, 2022. We could see that there were approximately 37,000 posts found when searching smart farm on Instagram, but only 17,786 posts were actually collected. This is because some posts are hidden on Instagram or old posts do not provide related content or can collect only partial data. The number of posts collected by year was 1 in 2017, 3,071 in 2020, 7,481 in 2021, and 7,233 in 2022, which proves that there were almost no posts on smart farms in the 2010s. Moreover, posts in 2020 had almost no content uploaded before May, and posts in 2022 were limited in representing the content of the posts that year since data was collected only up to August. Accordingly, this study used only the posts of 2021 in which data from January to December was fully collected compared to other years. Total 7,196 posts were extracted excluding posts with no information about smart farms. To increase the reliability of the study results, only one post was selected to build the data when the same posts were uploaded redundantly with the same ID. Information of Instagram posts that can be used to analyze public awareness includes the text, hashtags, and photos. This study used hashtags that directly express the core content of the posts as the data for analysis.
Data preprocessing
Prior to text mining, we conducted data preprocessing as follows. Data preprocessing determines data quality and the reliability of analysis results, and it is the process that requires the most time and effort in text mining (Park and Bae, 2018). In this study, we only used nouns in the abstracts and hashtags for analysis. The extracted nouns were used after eliminating the stop words through the following process. First, through inverse document frequency (IDF) analysis, we excluded nouns in which the value becomes 0 when rounded to one decimal place (Equation 1). IDF is obtained by dividing the total number of documents by the number of documents in which a specific word appeared and then taking the log. IDF values of words that appeared in relatively many documents become smaller and closer to 0. Words with small IDF values are likely to be common words, so these words were excluded from the selection of nouns to be studied. A typical noun with a small IDF value in this study is smart farm, which is the search word. Second, among the nouns derived through the process above, we excluded nouns such as research, method, result, conclusion, contents, article, and Instagram that are used customarily even if the IDF value is not 0. Third, in addition to this, we excluded nouns such as suggestion or proposal, which are placed in front of the verb and merely play a descriptive function in the entire sentence. In addition, words with the same meaning, such as rural migration or migration to a rural area, were integrated into one, and words with a length of 1 that are difficult to understand were also removed.
D : Total number of documents
|{ dj: tj ∞ dj }|: Number of documents in which keyword tj appears
Text mining
After preprocessing was completed, we analyzed frequency, language network, and topic modeling of words extracted from international journal articles, KCI articles, and Instagram, and compared the results by research data. Tools that can do this include NetMiner, UCINET/NetDraw, NodeXL, Pajek, and Gephi (Lee, 2012b). This study used NetMiner 4.0, which has the highest utilization frequency and can perform all of the processes above.
For frequency analysis, we identified the words that appeared the most in the articles and on Instagram, which were visualized into a word cloud. Language network analysis refers to analyzing various characteristics and intentions of the text by extracting words with meanings from the text and identifying their connections (Lee, 2014; KISTEP, 2017). Here, words with meanings are generally referred to as keywords (nodes) as they are selected from many words extracted from the text. The main types of language network analysis indicators include basic attribute (density), centrality (degree centrality, closeness centrality, betweenness centrality), and sub-network (clustering, wavelength, structural equivalence) (Lee, 2012b), and previous studies were mostly adopting centrality (Lee, 2012b; Kim and Lee, 2018; Park and Bae, 2018; Oh et al., 2022). Centrality is identifying the degree of influence and connectivity of each node (word or keyword) in the network, and this study analyzed degree centrality and betweenness centrality. Degree centrality is an indicator of how much a particular node is connected to other nodes around it, and the more nodes it is connected to, the higher centrality it has. Betweenness centrality is an indicator for finding nodes that serve as mediators and is mostly used to identify boundary spanners (Long et al., 2013). A boundary spanner is a subject that expands existing boundaries and can be described as a new business or convergence research with scalability (Kim and Lee, 2018). As described above, in order to analyze the relationship and centrality between words, this study converted the existing preprocessing data in the form of a 2-mode network such as articles × words and posts × words into a 1-mode network of keywords × keywords.
Topic modeling is a data analysis methodology for finding key topics in documents, and it has the advantage of being able to extract consistent topics from a large number of documents without prior knowledge or classification once the researcher sets the number of topics (Kim et al., 2021; Nam, 2016). There are various algorithms for topic modeling, such as NMF (Non-Negative Matrix Factorization), LSA (Latent Semantic Analysis), PLDA (Parallel Latent Dirichlet Allocation), and PAM (Pachinko Allocation Model), but LDA that extracts latent topics according to the number of topics is the most commonly used model (Kim and Lee, 2018; Park and Bae, 2018; Oh et al., 2022). LDA is a probabilistic graphic model proposed by Biel (2012) (Fig. 1). Based on the number of topics set by the researcher in advance, LDA analyzes which topics are composed in the entire document and in what proportion, and also provides major keywords for each topic and thus is effective in deriving insight from the documents.
In LDA, the results vary greatly depending on the number of topics, which is why it is important to determine an adequate number of topics. If there are too many topics, it is beneficial for deriving various fields, but the meanings may be redundant. If there are very few topics, the redundancy may decrease and they are easy to interpret, but it is difficult to derive various keywords (Greene et al., 2014). Accordingly, previous studies (Park and Lee, 2019; Kwon and Kim, 2021; Lee and Yi, 2021; Park, 2021; Park et al., 2022) are determining the number of topics through perplexity, topic coherence, silhouette coefficient, and expert decision making. This study finally decided on the number of topics considering the silhouette coefficient inherent in NetMiner 4.0 and the results of previous studies. The silhouette coefficient is a value that quantifies how efficiently data is separated based on the distance between words belonging to a cluster, and represents a value close to 1 when there is a big difference between data groups and the topic is adequately classified. However, due to the nature of the coefficient, fewer topics lead to higher coefficients. Thus, rather than applying the number of topics with the highest coefficient, it is necessary to set a certain part in which the silhouette coefficient remains high as the appropriate range of topics. This study comparatively analyzed the condensed topics and silhouette coefficients by number of topics while changing the number of topics from 4 to 20. As a result of analyzing the mean values of silhouette coefficients by number of topics on international journal articles, KCI articles and Instagram, the highest coefficient was derived in 4, but the section in which the high coefficient was retained consistently while easily classifying the interpretation by category was 6–8 (Fig. 2). A previous study on KCI articles about smart farms (Oh et al., 2022) set the number of topics to 7 based on expert opinions. Accordingly, this study finally set the number of topics to 7 by reflecting the silhouette coefficients and results of previous studies. Dirichlet parameters such as α and β were set as 2 and 0.1, respectively, based on results of previous research (Oh et al., 2022) for LDA analysis.
Results and Discussion
Frequency analysis
Table 1 and Fig. 3 show the results of analyzing the frequency of occurrence of words related to smart farms by research data such as KCI articles, international journal articles, and Instagram. The top 5 words that appeared the most in KCI articles were environment (133 times), farmhouse (112 times), technology (109 times), system (101 times), and agriculture (95 times). In international journal articles, agriculture (841 times) appeared most frequently, followed by system (650 times), crop (325 times), climate (287 times), and device (281 times). The top 5 words that appeared the most on Instagram were strawberry (4,726 times), agriculture (4,328 times), farmer (3,766 times), young people (3,138 times), and rural migration (2,548 times). Among the words ranked in the top 20 by research data, the words that appeared in common were system, agriculture, model, crop, information, energy, and device, with KCI and international journal articles showing some-what similar tendencies. In other words, KCI and international journal articles showed high frequency of 7 out of top 20 words, but only 1 word appeared in common between articles and Instagram. This may be because while articles focus on developing systems and devices to efficiently operate smart farms, Instagram focuses more on sharing daily life and information, publicity, and marketing. Meanwhile, previous research analyzing the research trends in KCI articles related to smart farms (Oh et al., 2022) reported that the frequency of occurrence was high in words such as environment, system, use, technology, and cultivation. The results of this study were generally similar, although there are some differences according to the words excluded.
Language network
Through language network analysis, we analyzed and visualized the words with high degree centrality and betweenness centrality for each research data (Table 2) (Fig. 4). The top five words with high degree centrality in KCI articles were technology, environment, farmhouse, system, and agriculture. In international journal articles, words such as agriculture, system, crop, device, and information showed high degree centrality. On Instagram, words such as agriculture, farmer, young people, rural migration, and hydroponics showed strong degree centrality. Among the top 20 words by research data, words with high degree centrality were crop, system, agriculture, device, model, information, and effect, which showed similar results with frequency analysis. That is, in articles, words related to ICT systems and devices that can properly maintain and manage the growth environment of crops and livestock show high connectivity with other words. On the other hand, there was only one common word with high degree centrality between articles and Instagram. Meanwhile, words with high betweenness centrality in KCI articles were environment, technology, farmhouse, system, and agriculture, and those in international journal articles were agriculture, system, crop, device, and network. On Instagram, words such as agriculture, farmer, young people, plant, and hydroponics tended to show high betweenness centrality. In general, words with high degree centrality also showed high betweenness centrality. In previous research that analyzed degree centrality and betweenness centrality in KCI articles related to smart farms (Oh et al., 2022), words such as use, technology, environment, result, system, and application showed high centrality, which tended to be similar to the results of this study. The results are not completely consistent even though this study used the same KCI article data as previous studies because words such as result and application that are used commonly in articles were excluded from this study unlike previous studies.
As a result of language network visualization, the shape was similar to degree centrality and betweenness centrality. However, while the network of articles was generally concentrated, the network of Instagram was relatively scattered. In KCI articles, keywords such as environment, system, technology, and agriculture formed a network structure through communication, device, crop, advanced, method, and policy. Among the types of research data, international journal articles showed the most concentrated network, and system, which is one of the keywords, was connected together with words such as irrigation, efficiency, energy, temperature, and plant. This shows the direction and subject pursued by studies related to smart farm systems around the world. On Instagram, keywords such as farmer, young people, and farm were forming a network. Detailed words such as beginner, rural migration, and lifespan show that the public has much interest in agricultural life in general in association with new nature-friendly jobs or hobbies, such as rural migration, startup, and weekend farm. In addition, health, which is a recent trend (MAFRA, 2014; Jang, 2018) as well as one of the main keywords, formed a network with diet, shopping mall, sprout, and mushroom, which implies that the public focuses on safe and environment-friendly foods and where to buy them.
Topic modeling
Deriving main topics
As a result of analysis by setting the number of topics to 7 by research data, the topics extracted from KCI articles were ICT system, environment control device, energy saving system, optimal growth environment for crops, 4th industry convergence, technology advancement, and education and policy (Table 3) (Fig. 5). Each topic name is inferred based on the words associated with the topic and the title and abstract included in the articles. Among these topics, education and policy were the one that appeared the most in articles, followed by 4th industry convergence and ICT system. Previous research deriving 7 topics from KCI articles related to smart farms (Oh et al., 2022) also derived government policy related to smart farms and smart farm platform design as the main topics, which are similar to this study. In other words, KCI articles focused on advancing the optimal smart farm operating system in convergence with the 4th industries and tended to suggest education, policy, and the role of the government to increase dissemination on site.
In international journal articles, the 7 topics derived were energy operating system, value of smart farms, sensor node, climate smart agriculture, irrigation system, dairy farming, and IoT system. IoT system was the topic that appeared the most in articles, followed by climate smart agriculture and irrigation system. More specifically, unlike KCI articles that focused more on crops, international journal articles also considered climate change delay and sustainability in various fields of agriculture such as dairy farming and livestock industry and focused on developing an IoT system that can efficiently operate and manage smart farms.
On Instagram, the 7 topics derived were rural migration, urban agriculture, agricultural tourism, startup, sale and education, wellbeing, and diet food. Diet food was the topic that appeared the most in posts, followed by rural migration, urban agriculture, startup, and agricultural tourism. In other words, unlike researchers focusing on developing and validating technology, the public are direct consumers of smart farms that prefer healthy diet foods focused on vegetables and fruits and are highly interested in direct and indirect experience in agriculture such as rural migration, urban agriculture, and rural experience. In addition, there is a recent trend in which startups by young people are actively supported, which lead to the trend of sharing startup information in association with production, processing, selling, experience, and education of smart farms.
Time series analysis of topics
This study analyzed the number of articles published by topic in KCI and international journals by year to identify the dynamic changes in research trends (Fig. 6). The number of publications of KCI and international journal articles per year was absent or only in single digits until 2015, but gradually increased from 2016 and then rapidly increased from 2019. KCI and international journal articles published after 2019 accounted for about 75% and 76% of all articles, respectively. Trends by topic since 2016 when studies on smart farms fully began show that education and policy (Topic 7) accounted for the biggest part of KCI articles and has been continuously increased. Recently, the ratios of articles on the optimal growth environment of crops (Topic 4), the environment control device to control this (Topic 2), and ICT system (Topic 1) are rapidly increasing. In international journal articles, climate smart agriculture (Topic 4) has been consistently accounting for the biggest part, but recently articles related to IoT system (Topic 7) are rapidly increasing. In other words, KCI articles had been focused on policy and education to stably establish smart farms, and international journal articles had been focused on climate change to prepare for the New Climatic Regime, but recently studies are actively conducted around the world to develop ICT and IoT to efficiently operate agriculture in line with the Fourth Industrial Revolution. Since ICT and IoT are receiving attention as a promising technology to lead the future of the world, this field will continue to account for a big part in research of smart farms in not only South Korea but the entire world.
Conclusion
To deal with recent issues such as climate change, rural aging, food security, and the Fourth Industrial Revolution, there is a growing interest in smart farms that can promote efficient food production and farmhouse income worldwide. To be ahead in securing future foods remated to smart farms and gain competitiveness in the rapidly changing international society, it is necessary to identify the global research trends and the awareness of the public as the main consumers. Accordingly, this study analyzed the related research trends and public awareness in order to secure the necessary background information for the continuous advancement of smart farms and future research, policy, business, and education.
As a result of performing text mining on data such as KCI articles, international journal articles, and Instagram related to smart farms, it was found that KCI and international journal articles had the same words appearing at about a 35% level, whereas the words and topics between Instagram and articles were very different. Words with high frequency of occurrence and centrality in KCI and international journal articles were system, agriculture, model, crop, information, energy, and device, and the articles mainly focused on developing ICT systems to efficiently operate and manage smart farms. However, according to the topic modeling results, KCI articles proposed not only smart farm technology advancement but also education and policy to disseminate the technology to the field, and international journal articles tended to encompass topics such as dairy farming, climate change, and sustainability in addition to crop. This is because South Korea has been conducting studies on smart farms in earnest since the 2010s, whereas major advanced countries have already been seeking ICT development throughout various fields of agriculture before that. On Instagram, words with high frequency and centrality were farmer, young people, agriculture, rural migration, and hydroponics. The main topics on Instagram were diet food, rural migration, urban agriculture, startup, and agricultural tourism, and the public tended to show much interest in healthy food, experience, and education as direct consumers of smart farms.
Meanwhile, as a result of analyzing the number of articles published by topic by year, KCI and international journal articles were relatively insufficient before 2015 and tended to show a rapid increase in 2019. In particular, articles on ICT and IoT in both KCI and international journal articles have increased rapidly since 2019, which is due to the intensifying competition and growing interest in the Fourth Industrial Revolution worldwide. Since ICT and IoT are receiving global attention as promising technologies to lead the future, they are expected to continue accounting for a high portion of research about smart farms.
This study could identify the global academic trends and public awareness of smart farms by considering KCI articles, international journal articles, and Instagram hashtags. By associating the major research trends and public awareness of smart farms according to the results above, it is possible to infer that the main interests in South Korea are producing wellbeing food in association with ICT, nurturing young farmers, and increasing urban agriculture opportunities. However, to further advance the trends in smart farms, it is necessary to additionally consider all kinds of social media data such as Facebook and Twitter as well as international patents and news. Moreover, since Instagram was limited to only posts in 2021 due to insufficient posts about smart farms by year, the results of this study must be compared and validated by continuously building related data in the future.
Awareness of smart farms will continue to increase with the development of the information society and technological advancements. Considering the study results presented thus far, smart farming is still considered a well-equipped system for automatically growing certain crops through efficient human activities by adopting advanced technology. However, this may change with the advancements of technology, and the definition of smart farms may also change depending on the changes of various human behaviors that will appear in a technology-intensive society. The results of this study can be useful background data for forecasting future trends in smart farms and setting the direction for related research, policies, and industries. In addition, they can be used in discovering new research and projects by identifying the importance and insufficiencies of topics over time or in developing publicity and educational materials. In the future, it is necessary to continue research on the awareness of smart farms and discuss the direction for development.