An Analysis on Research Trends in Rural Spaces using Text Mining
Article information
Abstract
Background and objective
As the significance and function of rural areas change, the value of rural space has become increasingly recognized in modern society. Consequently, while the concept of rural space has developed over time and led to modifications in relevant policies, there are insufficient studies that focus on these changes based on a broad definition of rural space. Therefore, the aim of this study is to analyze trends in rural space research over time by applying text mining to academic papers on the topic of domestic rural space.
Methods
First, for data collection, the search keyword was set as 'rural space' in the Korea Citation Index (KCI), and 1,581 academic papers from 2003 to 2024 were collected. After removing inappropriate ones, a total of 473 articles were selected. The analysis was conducted by dividing the data into five periods. Following that, preprocessing and word frequency, TF-IDF, and LDA topic modeling analysis were performed through Textom, a big data analytics platform.
Results
The analysis revealed several key trends over the five periods. First, the volume of research on rural space has increased over time, indicating a growing interest in this field. Second, shifts in the paradigm of rural space research were observed through changes in keywords, reflecting evolving research priorities. Third, it was discovered that rural space research is gradually developing into a comprehensive approach and expanding into empirical research. This expansion is characterized by three themes extracted from each period: 'village-level development,' 'landscape and environment,' and 'service development and value creation'.
Conclusion
The findings showed an increasing focus on preserving 'ruralness', which is a unique feature of rural space, as attention shifts toward landscape preservation, ecology, and the environment. This trend aligns with the expectations and values for rural space, as well as social demands that value sustainable development, indicating the potential for further research expansion and revitalization.
Introduction
Traditionally, agriculture has been used as a concept that refers to an activity of the primary sector, such as cultivation and livestock farming. Accordingly, rural areas were perceived as communities where most of the population engaged in agriculture, contrary to urban areas where workers in secondary and tertiary sectors such as manufacturing and services are concentrated (Seo et al., 2007). The significance and value of agriculture in modern society are reexamined from various perspectives in terms of natural environment and socioeconomic aspects beyond its meaning as the primary sector. Furthermore, as agriculture, which had traditionally been the primary economic activity of rural areas, has also come to take place in cities, there have been changes in the meaning and roles of rural areas (Kim, 2022).
In particular, there is a growing importance of perceptions toward the value of rural areas as living spaces beyond the industrial activity of agriculture itself, and positive perceptions toward the public value of agriculture and rural areas are continuously increasing since the onset of the COVID-19 pandemic in 2020 (Kim and Park, 2023), suggesting that there is a social consensus on the importance of the public value of agriculture and rural areas. However, merely maintaining and protecting agriculture cannot ensure the sustainability of rural space, and there are no comprehensive spatial plans including higher-level plans and long-term visions encompassing rural space (Han et al., 2023).
In this context, the government enacted the Act on Support for Rural Spatial Restructuring and Regeneration (hereinafter referred to as the Rural Spatial Restructuring Act), which came into effect on March 29, 2024, laying the institutional foundation for addressing rural depopulation and create pleasant rural space (Seong et al., 2024). Various academic studies were conducted to promote agriculture and rural areas until the enactment of the Rural Spatial Restructuring Act. These studies included research classifying rural space into various types (Choi et al., 1985; KRIHS, 2002; Yim, 2005; Choi et al., 2010; Jang and Kim, 2023), establishing a classification system for rural amenity resources based on the structural characteristics of rural space (Choi and Kim, 2012), proposing policies for effective and sustainable rural development (Yoon, 2013), and presenting physical development plans for basic living infrastructure that reflect the unique characteristics of rural space and agriculture (Im et al., 2024).
Moreover, studies on agriculture and rural areas using big data were also conducted, such as research analyzing agricultural and rural research topics compared to social perceptions using text mining (Kim, 2018), analyzing trends related to the public functions of agriculture and rural areas (Kim, 2022), and analyzing rural research trends using topic modeling (Kim et al., 2023). Most studies on rural space merely separated urban and rural areas using administrative districts as the unit, analyzed the characteristics for single years, proposed policies, or analyzed trends in agriculture and rural areas using big data.
However, despite the evolving meaning and scope of rural space due to changing situations and related policies, there has been insufficient research that comprehensively understand and analyze the changes in rural space. Accordingly, this study seeks to apply text mining to academic journal articles using rural space as a keyword in South Korea, identify key issues and research trends in rural space research over time, and provide foundational data for policy establishment and research advancement for sustainable development of rural space.
Research Methods
Data collection and preprocessing
Analyzing research trends using text mining techniques helps overcome the limitations of traditional literature reviews, which are centered on researchers and reflect their subjective interpretations (Goo and Kim, 2014), which is why there are increasing cases where text mining is applied in recent research trend analyses. This study used Textom 24 (The IMC Inc., Korea), a big data collection and analytics solution, as a text mining tool.
To collect research articles, the search keyword was set as “rural space” on the Korea Citation Index (KCI), a platform operated by the National Research Foundation of Korea. Research articles published in KCI-listed and candidate journals from 2003 to September 2024 were collected. A total of 1,581 articles were collected, and their titles, abstracts, and bibliographic information were saved in an Excel file. Two PhD-level experts participated as evaluators and excluded 1,108 articles that were either duplicates, lacked relevance to rural space, or did not provide a Korean abstract. As a result, 473 articles were selected for final analysis.
The Excel file containing the selected articles was uploaded to the big data analytics platform Textom for data cleaning. We used Mecab-IMC, a morphological analyzer provided by Textom, to remove stop words such as single-syllable words and particles as well as special characters and to extract nouns.
Analysis methods
To analyze detailed keywords and major topics based on the preprocessed data, this study conducted a related analysis by applying the periods outlined in the Plan to Develop Agriculture, Rural Communities, and Food Industry from the Framework Act on Agriculture, Rural Community and Food Industry formulated throughout five periods, starting with the first period in 2003. The specific periods were first period (2003–2007), second period (2008–2012), third period (2013–2017), fourth period (2018–2022), and fifth period (2023–2027).
First, term frequency analysis was conducted to analyze which terms occur frequently in rural space research, and TF-IDF analysis that is most commonly used in text mining was conducted to measure the relative importance of specific terms and derive major terms. In term frequency analysis that simply analyzes the frequency of terms, the frequently used words show high frequency in several documents, while words with important meanings in specific documents show low frequency, which makes it difficult to identify the importance of terms for specific topics. The IDF (Inverse Document Frequency) analysis method has been proposed to address this by assigning higher weights to terms that are important in specific documents and lower weights to terms that appear commonly across multiple documents (Jones, 1972). The analysis focused on examining the frequency of terms appearing in the documents from various perspectives using term frequency and TF-IDF analysis.
Subsequently, latent Dirichlet allocation (LDA) analysis, a topic modeling algorithm frequently used in research trend analysis within text mining techniques, was conducted to extract core themes. Through LDA analysis, significant topics can be automatically extracted from a text corpus based on the context of word usage and the relationships between words. This method has the advantage of ease in result interpretation, while also addressing the overfitting problem and reducing large volumes of information into various topics (Blei et al., 2003; Nam, 2016). To determine the optimal number of topics, perplexity and coherence were measured, conducting LDA topic modeling by determining the number of topics with low perplexity and high coherence (Park et al., 2022).
Results and Discussion
The number of articles published in each period (Table 1) shows that 59 articles were published during the first period, accounting for 12.47% of the total, which increased to 94 articles (19.87%) in the second period, 133 articles (28.12%) in the third period, and peaked at 146 articles in the fourth period. For the ongoing fifth period, 41 articles have been published so far, accounting for 8.67% of the total; however, this figure represents only up to this point of the period that has not yet ended.
Overall, the number of articles steadily increased from the first to the fourth period. Notably, approximately 59% of all articles were concentrated in the third and fourth periods, indicating that related research has been actively conducted over the past decade. This trend confirms that research on rural space has increased quantitatively over time.
The results of extracting the top 50 keywords based on word frequency for each period are shown in Table 2. A total of 22 words were commonly found in the entire period and in each period, such as region, village, characteristic, resource, development, plan, life, landscape, society, urban area, center, service, resident, culture, policy, and environment.
The results of term frequency analysis of related research articles showed that the keywords showing the highest frequency in the entire period were rural area (2,814), space (1,951), region (1,838), study (1,249), and village (1,081). This implies that research on rural space has been conducted with focus on the relationship between planned utilization and urban area at the regional and village levels. The results of analysis by period are as follows. In the first period, the top-ranking keywords were rural area (376), region (265), space (180), village (171), and study (116). Development (77), plan (75), and life (69) also showed relatively high frequency, which suggests that early studies on rural space are focused on development and planning at the regional and village levels.
In the second period, the top keywords were rural area (570), space (364), village (333), region (317), and study (237). Notably, landscape (159), development (130), urban area (90), and environment (82) showed increasing frequencies, implying a growing interest in the landscape and environment of rural areas and their relationship with urban areas.
In the third period, the top keywords were rural area (775), space (595), region (513), study (354), and village (322). The frequencies of housing (208), facility (158), experience (80), and design (72) increased, while new keywords such as return to farming (51), population aging (51), and standard (50) emerged. This reflects a growing interest in improving the physical environment of rural areas as well as in social issues such as return to farming/rural village and population aging.
In the fourth period, rural area (770) showed the highest frequency, followed by space (610), region (597), study (410), utilization (248), and urban area (243). In particular, there were notable keywords such as agriculture (184), healing (69), land (66), regeneration (63), complex (62), and community (60), which demonstrates a shift in rural space research from development-centered studies to exploring connectivity to urban areas, multifunctionality of rural space, and revitalization of communities.
In the fifth period, keywords such as rural area (323), space (202), region (146), study (132), and utilization (105) showed high frequency. There was a new emphasis on regeneration (38), design (37), aggregate (24), and agricultural and fishing villages (22), indicating that there has recently been a growing interest in the regeneration of rural space, design, and resource utilization. Moreover, the ranks of resident (48) and survey (40) increased, demonstrating that the importance of resident participation and empirical research is increasing in recent studies of rural space.
Examining the overall changes in keyword frequency reveals that keywords such as 'rural area', 'space', and 'region' consistently remained at the top across all periods, forming the basic framework for rural space research. By period, while earlier periods like the first focused on development at the regional and village levels, the interest later shifted to landscape, environment, and relationship with urban areas. After that, social issues were highlighted, such as improvement of residential environment, return to farming/rural village, and population aging. More recently, the focus is on creating new values such as healing, regeneration, and community. In particular, 'healing' ranked 37th (69 times) in the fourth period, while 'regeneration' moved up from 40th (63 times) in the fourth period to 14th (38 times) in the fifth period, suggesting that rural space is being reinterpreted as a space for healing and regeneration beyond production and residence. Moreover, the emergence of keywords such as 'community' and 'design' implies that there is a growing interest in social and aesthetic values of rural space.
TF-IDF analysis
The TF-IDF analysis results are shown in Table 3. First, for the entire period, the keywords with high importance were village (1085.1), landscape (863.3), housing (857.8), plan (648.7), urban area (629.5), service (565.5), resource (565.1), region (561.3), design (550.4), and facility (547.7). This suggests that in rural space research, the core themes were housing environment planning centered around villages and landscapes and the relationship with urban areas.
The analysis results by period are as follows. In the first period, high-ranking keywords were resource (163.3), landscape (136.1), village (125.7), population (123.5), and evaluation (117.3). In particular, longevity (104.5), network (77.5), church (68.6), and worship (47.2) appeared among the top 50 only in the first period, indicating that early rural space research focused on resource utilization and physical development such as religious facilities and longevity villages.
In the second period, village (234.2), landscape (218.8), service (167.9), housing (144.5), plan (126.9), and tourism (119.8) showed high importance. New keywords like amenity (105.5), crime (101.1), and remodeling (81.9) emerged, suggesting that there was a growing interest in amenities, crime prevention, and housing environment improvement in rural areas during this period. Moreover, there was also an increasing interest in the service functions of rural areas and their values as tourism resources.
In the third period, the top keywords were housing (320.4), village (312.8), plan (193.6), facility (182.9), landscape (167.3), and type (151.8). New keywords such as design (156.2), blueprint (143.3), standard (143.1), and experience (141.1) emerged, suggesting the growing interest in the planned designs of rural space, development of experiential programs, and the classification of rural space.
In the fourth period, landscape (283.3), agriculture (270.2), village (263.4), healing (235.6), urban area (217.9), and life (717.7) showed high importance. Smart (108.9), food (108.3), farmer’s market (108.3), and data (108.1) newly appeared, which implies that there has been a growing interest in healing, smart transformation of rural areas, and local foods. It also reveals that rural space research shifted the focus from development to landscape, culture, and living environments.
In the fifth period, design (95.6), aggregate (95.3), village (91.3), public (78.7), and plan (72.1) were among the top. New keywords such as logistics (51.6), regeneration (50.6), and vacant house (45.9) emerged, indicating a growing interest in the regeneration of rural space, utilization of idle space, and the enhancement of logistics functions.
These changes in keywords by period reflect a shift in the paradigm of rural space research. Initially, the focus was on resource utilization and improvements to the physical environment. Gradually, however, it shifted towards services, tourism, and planned design. In recent years, research interests have been moving towards creating new values and enhancing functions, such as smart transformation, healing, and regeneration. In particular, 'village' and 'landscape' are maintaining high importance throughout all periods, confirming that these are core themes of rural space research.
LDA topic modeling
To determine the optimal number of topics related to the themes prior to LDA analysis, the number of topics was set to three for the analysis by identifying perplexity and coherence. To begin with, topic names were assigned as 'village-level development', 'landscape and environment', and 'service development and value creation' based on the keywords constituting each of the three topics extracted for the entire period. These topics encompass the types of rural space research conducted from the first to the fifth periods.
Fig. 1 shows the proportion of each topic across the entire period and for each period based on the topic names assigned, indicating that early studies in the first period (56.3%) and second period (42.0%) primarily focused on structural analysis of village-level development. The topic of village-level development accounted for over half of the research at 56.2% among the three topics in the first period, and the same topic also showed the highest proportion at 42.0% in the second period.
Starting from 2013, which is the third period, interest in measures to utilize rural areas increased, which led to active research on service development and value creation. This topic accounted for 10.3% in the first period, but it increased to 32.5% in the second period and to 63.7% in the third period, proving that research during this period mainly focused on new value discovery and planned utilization of rural space.
From the fourth period onward, interest in landscape and environment grew, which increased the proportion of research on this topic that had been gradually decreasing from the first period, reaching 35.1%, followed by a significant rise to 52.3% in the fifth period.
These changes in topics reveal that rural space research has evolved from simple village-level analysis to comprehensive, regional-level planning and structural approaches, and that the scope is being expanded and developed into empirical research focused on landscape preservation, rural space environments, and creation of various services.
Conclusion
This study applied keyword analysis and LDA topic modeling to research articles related to rural space and analyzed changes in research trends over time to identify academic trends and issues related to rural space and provide foundational data for the sustainable development of rural space.
The findings of the study can be summarized as follows. First, rural space research showed a clear quantitative growth from the first to the fifth period, with more than half of all articles concentrated in the third and fourth periods, indicating that related research has been actively conducted over the past decade. In particular, the focus has been placed on landscape preservation and the ecological or environmental aspects of rural spaces. This emphasis highlights the increasing research interest in preserving the unique characteristics or 'ruralness' of these areas.
Second, early research had been focused on regional and village-level development, but the attention shifted gradually to service, tourism, and planned approaches. Recently the research trend is changing toward a more comprehensive approach that considers the relationship with urban areas and regional characteristics. This shows that rural space research is shifting beyond functional approaches to rural areas or rural development and is instead developing to take an approach from multiple perspectives in light of the relationship with urban areas and regional characteristics. This is expected to lead to the potential for expanding research topics and further activate research on rural space.
Third, rural space research has shifted beyond simple physical development or planning toward approaches considering the environment, landscape preservation, and resident participation, increasing the importance of studies based on empirical research. This suggested that more systematic and scientific research is being conducted. In particular, given that utilization and preservation of ruralness focused on villages and landscapes have become important research topics, it is evident that this trend aligns with the expectations and values for rural space as well as the social demands that value sustainable development. In light of this reality, future research should not only propose longterm development directions to convey the unique appeal of rural areas but also seek policies for effective implementation of these directions.
Fourth, the development of policies for rural spaces should adopt a tailored approach that takes into account regional characteristics, recognizing these areas as complex entities with scenic, cultural, and ecological values. It is essential to analyze local natural resources and community traits to develop differentiated development strategies and establish resident-participatory planning systems. Recent studies emphasize the importance of a bottom-up approach driven by residents, which requires enhancing resident capabilities, creating participation platforms, and implementing village-level decision-making systems.
This study is significant in that it applied text mining to identify the trends of rural space research over the past 20 years and analyze the research flow and significance. However, this study has the following limitations. First, the article collection platform was limited to KCI, and theses and dissertations on rural space were excluded from the collection. To provide a more comprehensive analysis of research trends, it would be beneficial to include theses and dissertations as well so that research trends can be identified from multiple perspectives.
Second, the fifth period of data collection is 2023 to 2027, which is still ongoing, so not all data from the fifth period is included in this study, thereby failing to reflect the overall trends of research during this period. Therefore, future research needs to strengthen the conclusions based on additional data collection and analysis.
Third, this study used the Mecab-IMC morphological analyzer provided by Textom. Since Mecab is a dictionary-based morphological analyzer, its accuracy may be lower for words that are not registered in the dictionary. Further research should develop a more detailed user dictionary by validating the meanings with reference to multiple expert opinions in order to improve the accuracy of morphological analysis.