Changes in the Cultural Trend of Use by Type of Green Infrastructure Before and After COVID-19 Using Blog Text Mining in Seoul
Article information
Abstract
Background and objective
This study examined the changes in the cultural trend of use for green infrastructure in Seoul due to COVID-19 pandemic.
Methods
The subjects of this study are 8 sites of green infrastructure selected by type: Forested green infrastructure, Watershed green infrastructure, Park green infrastructure, Walkway green infrastructure. The data used for analysis was blog posts for a total of four years from August 1, 2016 to July 31, 2020. The analysis method was conducted keyword frequency analysis, topic modeling, and related keyword analysis.
Results
The results of this study are as follows. First, the number of posts on green infrastructure has increased since COVID-19, especially forested green infrastructure and watershed green infrastructure with abundant naturalness and high openness. Second, the cultural trend keywords before and after COVID-19 changed from large-scale to small-scale, community-based to individual-based activities, and nondaily to daily culture. Third, after COVID-19, topics and keywords related to coronavirus showed that the cultural trends were reflected on appreciation, activities, and dailiness based on natural resources. In sum, the interest in green infrastructure in Seoul has increased after COVID-19. Also, the change of green infrastructure represents the increased demand for experience that reflects the need and expectation for nature.
Conclusion
The new trend of green Infrastructure in the pandemic era should be considered in the the individual relaxations & activities.
Introduction
COVID-19 has changed the daily lives of urban residents all over the world and even affected the patterns of using urban spaces (Li et al., 2020; Morita et al., 2020). According to a Google Trends survey, the use of park & greens has increased worldwide since COVID-19. As of the end of March 2020, the rate increased by approximately 51% in Seoul, which is globally high (Park, 2020). He increased use of green infrastructure since COVID-19 has affected urban residents’ reference for space as well as changes in their behavior (Venter et al. 2020; Derks et al., 2020; Rice et al., 2020). For example, the use of leisure space within 2 miles increased in the U.S. (Rice et al., 2020), the ratio of visiting mountains within 10 km increased in Japan (Yamap, 2020), and the preference for walkways and forest paths in the suburbs increased in Norway (Venter et al. 2020). The purpose of use also changed in addition to preference, with an increasing need to promote health and reach social equity (Derks et al., 2020; Xie et al., 2020; Samuelsson et al., 2020; McCunn, L. J., 2020; Slater et al., 2020). The main users also changed to more younger generations and families (Derks et al., 2020, Chae et al., 2021), which created a new culture of use. The number of people in their 20s–30s visiting mountains increased in Korea, creating a culture of ‘alone hiking’ (Kim, 2020). The culture of use changes according to the sociocultural background. Woo and Suh (2020) claimed that the trend of urban parks has changed from a space to relax and take walks to a space for fun and entertainment. They stated that it is necessary to categorize the behaviors of large-scale groups and meet citizen demands through big data analysis to identify the sociocultural trends. Big data analysis has become even more important as internet browsing and search volume increased in the contact-free era triggered by COVID-19. Blogs provide data to identify the cultural consumption patterns that represent individual experiences and awareness (Lee and Chung, 2014), and significant results can be derived from text mining analysis. Text mining studies had been used to identify massive research trends (Park and Bae, 2018; Byeon and Seo, 2020; Park et al., 2021; Choi and Choi., 2021) but recently they are actively conducted to identify the trends through perception changes and issues based on user experience (Kim, 2015; Kim and Jeon, 2018; Chae et al., 2021; Woo and Suh, 2018; Kong and Wang. 2019; Kim et al., 2019; Park and An, 2019; Park and Oh, 2019; Do et al., 2020; Park and Yeon, 2020; Shin et al., 2020; Woo and Suh, 2020; Woo and Suh, 2021). Therefore, this study examines the changes in the main keywords of each type of green infrastructure before and after COVID-19 through blog text mining, derives key topics related to ‘coronavirus’, and analyzes related keywords, thereby capturing the changes in the cultural trend of use and providing policy implications.
Research Methods
Site and Subjects
Previous studies showed that the types of green infrastructure may change depending on the purpose, and it has thus far been classified by components, space elements, function and legal system (Ministry of Enviroment, 2009; Kang et al., 2014; Kim et al., 2018, Chae and Lee, 2020; Ministry of Government Legislation a, 2021; Ministry of Government Legislation b, 2021). They had limitations in being applied to policies, and thus could not reflect various social needs and culture of use such as direct experiences of users in green infrastructure. Therefore, this study categorized the sites in Seoul that are the most popular and frequently visited considering two main legal characteristics. The types of green infrastructure in terms of experience and perception of user include the forested G·I (mountains and forests), the watershed G·I (rivers and streams), the park G·I (neighborhood park and large park), and walkway G·I (linear parks and street trees) (Fig. 1).
For the subjects, we selected the sites in Seoul, two of each of the four types of green infrastructure considering the regional distribution: Namsan and Gwanaksan for the forested green infrastructure, Yangjaecheon and Hangang for the watershed green infrastructure, Olympic Park and Seoul Forest Park for the park green infrastructure, and Gyeongui Line Forest Park and S eoullo 7017 for walkway green infrastructure (Fig. 2).
Data collection and cleansing
We collected blog posts from the most commonly used portal site (Naver) in Korea. Data for a total of 4 years was collected based on a previous study that collected data for three years in order to conduct research on prevailing trends (Park & Oh, 2019), which consisted of data for 3 years before December 15, 2019 (August 1, 2016 – December 31, 2019) and then data for 7 months after that (January 1 – July 31, 2020).
We eliminated posts that do not meet the purpose of analysis among those initially collected and cleansed the texts into the form suitable for analysis. In the process, we used KOMORAN morphological analyzer in python open source package KoNLPy to for Korean information processing to perform ‘tokenizing’ of text and ‘part-of-speech tagging (POS tagging)’.
Collection and sorting of posts
We collected posts that ‘include park names in the title or main text’ among Naver blog posts created during the data collection period from August 1, 2016 to July 31, 2020. Due to the nature of the automated collection by search, posts that do not meet the purpose of analysis (advertisements, repeated posts) are also collected, which is why we excluded some of the posts from the analysis by setting up a sorting standard.
The posts were sorted by ‘① extracting nouns in the titles of posts’, ‘② establishing the sorting standard by frequency of keywords by park’, and ‘③ applying and repeating’. By electing only nouns from the titles of posts using the morphological analyzer, we converted the title of one post into ‘a sequence of nouns’ in the title.
We calculated the number of nouns in all titles for each site and aligned them in the order of highest frequency, and as a result, we discovered that ‘words related to local advertisements’ such as specific brand, store, or product name or ‘words related to real estate advertisements’ such as lotting out, sales, or jeonse were top frequency words, and thus excluded posts that include those words. The same process was repeated again to set an additional exclusion standard we could not see in the first round. As a result of eliminating redundantly collected posts after the initial collection, there were total 1,030,152 posts for 8 green infrastructures. The number of posts used in the analysis was 174,972 by adding posts of all green infrastructures that take up 16.98% of the initially collected posts (Table 1).
Extracting nouns in the main text and eliminating stop words
We extracted nouns in the main text of posts selected using the morphological analyzer. It is easy to interpret the meaning of the topic modeling analysis results when nouns are tokenized in word units. The order of words was maintained in the process of converting the main text of posts into an ‘arrangement of nouns’, and redundant words were not eliminated. After extracting nouns, we selected ‘stop words’ that will not be used in the analysis and eliminated them. Since the posts were collected by using the site names as search words, site names and other words that indicate those green infrastructures appeared in high frequency. In addition, the administrative district in which the site is located or the name of the nearby subway station also frequently appeared. These words were disturbing the interpretation and analysis of results as they are general characteristics of each green infrastructure. Thus, we considered that ‘other words indicating the site and the official name of the site’ and ‘administrative district in which the site is located or the name of the nearby subway station or bus stop’ were stop words and thus eliminated them.
Analysis method
Lee and Chung (2014) claimed that blog posts are in the form of personal essays or records of places and everyday lives, and these posts have academic significance in studying spatial culture based on personal experience as a carrier of cognitive information about places with travel journals about certain places and regions. Recently, big data analysis is commonly used to analyze the cultural trends of users. Choo (2020) identified the trends in consumption markets with focus on keywords related to ‘solitary consumption’, and Kim and Han (2019) analyzed the leisure trend using a semantic network. Previous studies conducted quantitative and focused analysis through keyword frequency analysis (Park and Bae, 2018; Park and Oh, 2019; Park and Yeon, 2020; Shin et al., 2020; Woo and Suh, 2021), and supplemented qualitative and auxiliary analysis through related keyword analysis. Recently, studies are conducted using quantitative and qualitative analysis through topic modeling (Park and An, 2019; Park et al., 2021). Therefore, to quantitatively and qualitatively identify the cultural trend of use before and after COVID-19, it is necessary to analyze the frequency of main keywords as well as topic modeling and related keywords.
Keyword analysis
We came up with and compared the top 20 keywords that appear before and after COVID-19 and identified four types of culture of use. To this end, we selected 20 top frequency words for each park using the blog texts cleansed into a sequence of nouns and combined them by the four types of parks. Park and Oh (2019) selected top 30 keywords, and Park and Bae (2018) top 20 keywords; and thus study selected top 20 keywords considering the ratio of change and type.
We derived top 20 keywords for each type of green infrastructure combining the top 20 keywords of the green infrastructures that belong to each category of green infrastructure. Obtaining the simple sum of the frequency of each keyword has the issue of overestimating the keywords of sites with many words or posts collected. Therefore, we divided the frequency of each keyword by the sum of the frequency of all keywords for the park and converted it to ‘relative frequency’. When combining the keywords of two sites by type of green infrastructure, we derived the top 20 keywords for each type by adding this relative frequency instead of the keyword frequency. We performed this method before and after the outbreak of COVID-19 in Korea and examined how the culture of use for green infrastructure has changed compared to before COVID-19.
LDA topic modeling
We extracted post clusters showing the culture of use through topic modeling of posts in each type of green infrastructure and derived words that have a semantic relation to the word ‘coronavirus’ in the cluster. LDA is a technique used to find the distribution of themes in the documents and the distribution of topics in each post. There is a certain number of topics in LDA, and it is presumed that there are two distributions such as ‘probability distribution of words that will appear by topic’ and ‘probability distribution of topics that will appear by post’. LDA stochastically estimates the two distributions that supposedly exist through the analyst’s ‘distribution of words by post’.
Based on the arrangement (distribution) of words in each post create by collection and cleansing of posts, we conducted LDA topic modeling to find which topics exist in posts (documents) combined by park type and what keywords form each topic. Park and An (2019) selected 10 key topics, and Park and Bae (2018) selected 4 topics. This study classified the posts grouped by type of green infrastructure using the LDA model included in the python open source library Gensim into 5 themes for each type before and after COVID-19 and selected topics that reveal the pattern of use. Then, we conducted word embedding with the Word2Vec model included in the same library on the posts that are in the relevant topics. The word embedding model can be used to express words in vectors and obtain similarity among words by obtaining similarity among vectors, through which semantically related words can be derived. We came up with keywords with that have cosine similarity with the word ‘coronavirus’ in the posts of each type of green infrastructure.
Results and Discussion
Basic analysis
As a result of analyzing the changes before and after COVID-19, it was found that the number of posts on green infrastructure decreased by 2.64% from 2017 to 2018 but increased by 9.22% to 2019 and 8.98% to 2020 (Fig. 3). It shows the relative size of the sum of posts in the same period in later years with 100 as the number of posts from January 20 to July 31, 2017. The types that affected the increase in the total number of posts were forested G·I watershed G·I park G·I and walkway G·I in 2020, whereas the walkway G·I showed a relative decrease in posts. The three types with increased posts are green infrastructure with abundant naturalness and openness. In particular, forested green infrastructure showed the highest rate of increase year on year because online posts about mountains increased since COVID-19, and the increased interest from the young generation as the main online users, the rapid decrease in outbound travel demands and increase in inbound travel demands have led to increased demand for everyday leisure rather than tourism (Jeong and Lee, 2020), thereby shifting the hiking culture from long-distance to short-distance hiking.
The daily posts show that the number of posts rapidly increased in the forested G·I and watershed G·I as of January 19, 2020, showing an increase in 2020 after the outbreak of COVID-19 compared to 2019. The number of posts is continuously increasing even in the summer when visitors decrease. However, the walkway G·I is showing relatively insignificant changes after COVID-19, which is due to the psychological factor that social distancing is not easy or it is not nature-friendly because it is small, has many artificial elements, and is located in the urban center (Fig. 4).
Keyword analysis for each type of green infrastructure before and after COVID-19
Frequency analysis of all keywords
The top 20 keywords as a result of frequency analysis in each site are as shown in Table 2. A change was found in keywords about natural resources after COVID-19. Keywords related to culture such as ‘festival (8.59%)’, ‘culture (3.41%)’ and ‘performance (2.66%)’ before COVID disappeared, and along with the emergence of ‘coronavirus (5.85%)’, keywords related to nature such as ‘flower (13.12%)’, ‘tree (3.83%)’, and ‘spring (1.74%)’ appeared as the main keywords or in the upper ranks. Moreover, patterns of cultural consumption such as ‘cafe (5.29%)’, ‘coffee (1.89%)’, ‘tent (2.05%)’, and ‘bike (2.82%)’ showed that the culture of use has changed from nondaily routines of performances or events to daily cultural consumption patterns, and from cultural experience to enjoyment of nature.
Keyword analysis by type
In the forested G·I, festival (5.83%)’, ‘mountain climbing (3.15%)’, and ‘experience (2.71%)’ before COVID-19 disappeared, and ‘coronavirus (4.97%)’, ‘exercise (4.10%), and ‘cafe (2.63%)’ appeared after COVID-19 and ‘tree (5.66%)’, ‘hiking (4.59%), and ‘friend (4.77%)’ increased. The results show the pattern of interest in natural resources and health programs due to the shutdown of indoor exercise spaces or type of companion. In the watershed G·I, keywords related to massive culture such as ‘festival (12.39%)’, ‘event (3.40%)’, ‘fireworks (2.36%)’, and ‘performance (2.26%)’ disappeared, whereas keywords related to exercise such as ‘walk (6.53%)’ and ‘bike (6.50%)’ along with coronavirus (5.61%)’ and keywords related to small culture such as tent (6.49%)’ and ‘picnic (3.74%)’ increased. This shows that the cultural consumption pattern has been changed from large-scale/ community-based culture to small-scale/individual-based culture. In the park G·I, ‘photo (18.49%)’ showed high frequency with the emergence of ‘coronavirus (5.21%)’, along with keywords about nature such as ‘flower (12.33%)’ and ‘tree (6.48%)’. In the walkway G·I, the culture changed from ‘festival (4.84%)’, ‘performance (3.61%)’ and ‘culture (3.55%)’ to keywords related to nature such as ‘flower (11.82%)’ and ‘garden (4.24%)’ along with ‘coronavirus (2.76%)’ (Table 3).
Trend of topics related to ‘coronavirus’ through topic modeling after COVID-19
Analysis of the trend in topics related to ‘coronavirus’ by type of green infrastructure using topic modeling
Topic modeling was conducted on each type of green infrastructure after COVID-19, and 5 cultural trend topics comprised of 20 keywords were extracted. We analyzed the cultural trend in the same category with other categories using topics related to coronavirus. The results showed that Topics 1 and 2 related to coronavirus in the forested G·I were keywords on nature appreciation and activities as well as daily leisure activities. Topic 3 is focused on natural resources, whereas Topic 1 shows appreciation and activities based on nature. In the watershed G·I, Topic 1 was on prevention and control of coronavirus, and Topic 3 was on nature appreciation and activities. In the park G·I, Topic 3 was on nature appreciation and activities. In the walkway G·I, Topics 2 and 3 were on daily leisure and nature appreciation and activities. In sum, the cultural trend of use for green infrastructure after COVID-19 was nature appreciation and activities and also daily leisure (Table 4).
Analysis of related keywords in posts about ‘coronavirus’ after COVID-19
We analyzed keywords related to ‘coronavirus’ with focus on topics derived through topic modeling (Fig. 5). For significant types of green infrastructure in terms of data reliability, ‘mask’ and ‘social distancing’ were the top keywords in the forested G·I, and the cultural trend of physical/ mental health activities such as ‘weekend’, ‘exercise’, ‘health’, and ‘happiness’ was derived as related keywords. There were other time-related keywords such as ‘daily life’ and ‘day’. In the watershed G·I, ‘social distancing’, ‘confirmed’, and ‘spread’ were the top keywords, and there were other related keywords in leisure activities such as ‘picnic’, ‘outdoors’, ‘play’, ‘citizen’, ‘spring blossom’, and ‘safety’. There were positive words such as ‘sightseeing’, ‘picnic’, and ‘safety’ and negative words such as ‘worry’.
In sum, despite the negative words such as ‘social distancing’, ‘mask’, and ‘shutdown’ after coronavirus, people who experienced green infrastructure experienced the culture of health promotion in the forested G·I and the culture of leisure activities in the watershed G·I. Related keywords such as ‘happiness’, ‘composure’, and ‘safety’ showed that cultural activities in green infrastructure contributed to the mental aspects as well.
Conclusion
This study was conducted on Seoul using blog text mining analysis to identify the trends in each type of green infrastructure before and after COVID-19.
First, the number of posts on green infrastructure after COVID-19 increased in types with abundant naturalness and high openness such as the forested G·I and the watershed G·I, while the seasonal gap decreased. On the other hand, there was not much change in posts about types that have low openness, focused on facilities, and located in the urban center such as the walkway G·I. Therefore, for the forested G·I watershed G·I, efforts must be made to prevent deterioration of service quality due to many users, along with efforts in disease control through density management so that the time visitors stay in those areas can be adjusted.
Second, the cultural trend keywords before and after COVID-19 have changed from large-scale to small-scale, community-based to individual-based activities, and nondaily to daily culture. In particular, culture-related keywords changed to nature-related keywords such as flowers, trees, and nature in all types. Therefore, it is necessary to set the policy direction to create a new green infrastructure culture by developing new programs with resources and providing services considering the characteristics of users.
Third, topics and keywords related to coronavirus showed that the cultural trends were reflected on appreciation, activities, and dailiness based on natural resources, not just simple resources from nature. Moreover, considering that topics differentiated by each type of green infrastructure are derived, there must be a service policy on nature appreciation and activities along with unique creation and operation strategies based on the characteristics of each type.
In sum, the interest in green infrastructure in Seoul has increased after COVID-19, and the biggest difference was in the interest and trend in the forest G·I and watershed G·I with abundant natural green areas. The use of green infrastructure after COVID-19 represents the increased demand for experience that reflects the need and expectation for nature. This shows the expectation for biophilia, which is the innate human instinct to connect with nature, and raises the need for a policy to create a space for public leisure by preserving natural resources and operate differentiated and customized programs.