J. People Plants Environ Search

CLOSE


J. People Plants Environ > Volume 28(6); 2025 > Article
Lee and Sung: Evaluating the Applicability of Convolutional Neural Networks for Tree Classification in Donggwoldo

ABSTRACT

Background and objective: This study establishes a foundational modeling framework for quantitative research by developing and evaluating a baseline convolutional neural network (CNN) to classify tree representations in Donggwoldo, a 19th-century Korean court painting. Rather than focusing solely on performance optimization, the objective is to construct an initial methodology for identifying tree types based on their pictorial characteristics and converting traditional visual records into structured digital data. This research illustrates the potential of deep learning as a methodological bridge between artistic depictions and quantitative analysis within cultural heritage studies.
Methods: A dataset of 580 high-resolution tree images was extracted from the Dong-A University and Korea University versions of Donggwoldo. These were manually categorized into six types based on art-historical classifications. To address data limitations and imbalance, augmentation techniques—including grayscale conversion, zooming, horizontal flips, and random color adjustments—were applied to enhance diversity while preserving stylistic integrity. A transfer learning model based on ResNet50V2 was implemented, utilizing pretrained layers as feature extractors. Two fully connected layers with ReLU activation and dropout regularization were added to prevent overfitting, with early stopping employed to ensure stable convergence.
Results: The model achieved an overall classification accuracy of approximately 98% on a 150-image test set. Confusion matrix analysis indicated that rare misclassifications were primarily due to low resolution, background interference, and incomplete segmentation. Despite these challenges, the CNN effectively distinguished between diverse depiction styles, confirming the feasibility of applying deep learning to traditional paintings.
Conclusion: This study demonstrates that CNNs can effectively transform traditional pictorial information into structured digital data. By establishing a baseline model, the research provides a methodological foundation for the quantitative analysis of historical landscape imagery. These results highlight deep learning’s potential as a complementary tool for cultural heritage studies, with future research directions including automated segmentation and interdisciplinary integration with landscape architecture.

Introduction

Pictorial records are visual materials that realistically depict past events or spaces, serving as important historical evidence that conveys the landscape and spatial composition of the time, like modern photography (Lee, 2018). Among these, Gyehwa (界畵), a pictorial style that meticulously illustrates palaces, pavilions, and houses using rulers, flourished during the reign of King Jeongjo, with the support of royal patronage and the active participation of Chabidaeryeong painters (差備待令畵員) (The Academy of Korean Studies, 2025).
In the late Joseon period, Gyehwa incorporated parallel oblique compositions and Western linear perspective techniques to represent architectural structures and spatial depth realistically (Ahn, 2014). By emphasizing straight lines and minimizing omission, Gyehwa sought factual precision and accuracy in depiction (Ku and Lee, 1994). Owing to these realistic qualities, Gyehwa holds not only artistic value but also significant utility in architectural history, landscape history, and urban studies, as it provides empirical visual evidence for reconstructing and interpreting historical spaces and landscapes. Moreover, the depictions of trees, architecture, and topographical elements in Gyehwa offer high potential for interdisciplinary application when integrated with contemporary digital technologies. These visual materials can serve as spatial datasets for digital mapping and analytical modeling, bridging traditional art with data-driven spatial research.
Among these works, <Donggwoldo (東闕圖)>, a detailed depiction of the Changgyeonggung and Changdeokgung palace complexes based on actual landscapes, is recognized as a representative masterpiece that encapsulates the artistic achievements of Gyehwa in the 19th century (The Academy of Korean Studies, 2025). The painting’s meticulous rendering of the surrounding mountains and vegetation have drawn particular interest in landscape-architectural studies, leading to various interpretive analyses (Lee, 2024). Recent advancements in digital restoration and computer vision have further enabled quantitative approaches to exploring spatial and landscape information within such documentary paintings.
In particular, the use of Convolutional Neural Networks (CNN) for analyzing a rtistic and cultural heritage imagery has gained substantial validity in recent studies. CNN-based models have demonstrated strong performance in classifying traditional Chinese painting styles (Du and Cai, 2024), analyzing ancient mural images for cultural heritage research (Cao et al., 2025) and distinguishing artistic styles across diverse fine-art traditions (Bar et al., 2015). These studies collectively indicate that CNN are highly effective in capturing fine-grained pictorial features, supporting their applicability to the detailed and stylized visual language found in Gyehwa paintings.
Accordingly, this study aims to establish a foundational modeling stage for quantitative research by constructing and evaluating a baseline CNN model capable of identifying and categorizing tree depictions in <Donggwoldo>. Through this process, traditional pictorial information is converted into structured digital data, thereby laying the groundwork for subsequent research that seeks to analyze historical landscapes using quantitative and computational methods.

Research Methods

Research Scope

This study focuses on <Donggwoldo>, a representative Gyehwa painting from the late Joseon period (Fig. 1). <Donggwoldo> serves as a historical record that realistically depicts the Donggwol area, which corresponds to the present-day Changdeokgung and Changgyeonggung Palaces, during the 19th century. It provides detailed spatial information about palace buildings, various facilities, trees, and the surrounding terrain. Notably, the depiction of trees is based on realistic representations reflecting the morphological characteristics of actual trees, making this artwork a valuable resource for understanding the vegetation landscape of the Donggwol area during that time. Although this feature has limitations in identifying exact tree species, it is beneficial for understanding the general types and distribution patterns of the vegetation landscape.

Research Framework

This study applied a CNN-based image classification approach to categorize the tree images depicted in <Donggwoldo>. First, a classification system was reconstructed by referencing the tree-type classification criteria from previous studies and adapting them to the pictorial characteristics of <Donggwoldo>. Based on this framework, the tree images were cropped and categorized, and image augmentation techniques were employed to enhance the diversity and balance of the training data.
The constructed dataset was split into training, validation, and test sets for model training and performance evaluation. To build the model, a transfer learning approach based on ResNet50V2 was adopted, allowing efficient training with a relatively small dataset. The pre-trained model’s top classification layer was removed and utilized as a feature extractor, followed by the addition of a global average pooling layer, two fully connected (Dense) layers, and a dropout layer to prevent overfitting. The final output layer consisted of six classes corresponding to tree types, using a Softmax activation function to compute class probabilities.
During training, the variations in training accuracy, validation accuracy, and loss were continuously monitored. Early Stopping and ReduceLROnPlateau callbacks were applied to maintain optimal learning conditions. After training, the model’s performance was evaluated on the test dataset, and its accuracy and validity were assessed using a confusion matrix and a classification report (Fig. 2). All model training and evaluation were conducted on a GPU in a Google Colab environment using TensorFlow and Keras.

Classification Framework and Data Preparation

<Donggwoldo> is divided into two versions, one held by Korea University and the other by Dong-A University. Although both versions were derived from the same underlying sketches, they differ in their level of detail (Fig. 3).
In this study, high-resolution images of the collection from the Dong-A University Museum were utilized, as provided directly by the museum (Dong-A University Museum, 19th century). For the Korea University Museum collection, data were extracted from a PDF file (Cultural Heritage Administration, 1991). Each image was adjusted to include only a single tree type, and images in which the painting technique could not be reliably determined due to degraded quality were excluded from the analysis.
The categorization of tree representations was conducted based on previous studies (Kim and Sim, 2007; Cultural Heritage Administration, 2016). While the 2007 study analyzed tree types and their distribution, the classification criteria were not clearly defined, resulting in interpretative discrepancies among researchers. The 2016 study subsequently proposed 12 tree types for <Donggwoldo>, taking into account the 19th-century Joseon painting style and influences from Chinese painting manuals such as Jieziyuan Huazhuan (芥子園畫傳). Building on these prior classifications, this study refined the categories and constructed a dataset consisting of six representative tree types. During dataset preparation, images were extracted under conditions that ensured each sample contained only a single, clearly identifiable tree type. Images with low resolution or those in which multiple tree types appeared within a single crop were excluded to maintain dataset clarity and prevent noise during model training. As a result, a total of 580 images were included in the final dataset (Table 1).
To address the limitations and imbalance of the dataset, image augmentation techniques were applied. Image augmentation enhances the diversity of the training data by transforming the same images into multiple forms, thereby improving model generalization and preventing overfitting. In this study, augmentation methods were selected to preserve the pictorial characteristics of <Donggwoldo> while introducing morphological and chromatic variations without compromising the artistic integrity of the painting.
Accordingly, five augmentation techniques were applied: Grayscale conversion, Zoom, Horizontal Flip, Grayscale + Horizontal Flip, and Random Color Adjustment ( Fig. 4). The rationale for selecting each method is as follows. Grayscale conversion was applied to mitigate excessive reliance on color information, which can hinder the learning of shape-based features (Wang and Lee, 2021). Zoom was used to simulate relative size variations between trees and background elements, improving the model’s generalization performance under different scales (Tarasiuk and Szczepaniak, 2022).
Horizontal Flip and Random Color Adjustment are both simple yet effective augmentation techniques that have been widely validated in image classification tasks (Shorten and Khoshgoftaar, 2019). The former enables consistent feature learning despite directional variations in tree orientation, while the latter compensates for variations in illumination and color expression, thereby enhancing data diversity and reducing overfitting. The Grayscale + Horizontal Flip combination was also included in the augmentation process; however, its main effects are primarily explained by the individual methods mentioned above.
To evaluate the generalization performance of the model, a test set was constructed by randomly selecting 25 images from each tree type, which were excluded from the augmentation process. The remaining training data were augmented approximately sixfold, with 10% of the augmented data allocated to validation. Consequently, the dataset was divided into training, validation, and test sets at approximately 8:1:1.

Establishing a Convolutional Neural Network Model

Convolutional Neural Networks (CNN) are deep learning models that excel in image classification by extracting hierarchical features from input images. This extraction process involves the use of convolutional filters, max pooling, and sparse connectivity with shared weights. These elements work together to enable efficient learning of both low- and high-level structural patterns, as well as fine-grained image details (Alom et al., 2018). Once the feature maps are generated, they are fed into fully connected layers or global average pooling layers, where the final class probabilities are calculated using a softmax layer (Fig. 5). These characteristics make CNN particularly well-suited for analyzing complex visual information, such as trees depicted in traditional paintings.
In this study, model training was conducted in a GPU environment using Google Colab. Compared to CPU, GPU enable large-scale matrix operations and parallel processing, greatly enhancing the training speed and computational efficiency of deep learning models. However, while Google Colab provides free access to GPU resources, it has limitations in computational power and memory capacity, which can limit large-scale training. To overcome these constraints and ensure stable and reliable performance, a transfer learning strategy was employed, enabling efficient and accurate tree classification even with a relatively small dataset (Kartik et al., 2023).
Building an efficient CNN-based tree classification model requires sufficient training data and a computational environment capable of processing it. However, such conditions are not always available in typical research settings. Therefore, this study applied transfer learning using pre-trained models to enhance training efficiency and achieve stable performance under limited data and computational resources.
Transfer learning leverages the weights of models pre-trained on large-scale image datasets, enabling high accuracy to be achieved efficiently even with limited data. Representative pre-trained models include VGGNet, ResNet, SENet, GoogLeNet, and EfficientNet. In this study, seven pre-trained models suitable for GPU-based training were selected and evaluated using the same training dataset. Comparisons of training time and accuracy indicated that ResNet50V2 provided the best balance between efficiency and performance and was thus adopted as the final model (Fig. 6).
Prior to being input into the network, all images were resized and normalized to maintain consistency across the training dataset. The top classification layer of ResNet50V2 was removed, and the network was utilized as a feature extractor. The extracted feature maps were spatially compressed using Global Average Pooling, followed by two fully connected (Dense) layers with 128 and 64 neurons, respectively. ReLU activation functions were applied to each Dense layer, and dropout rates of 0.4 and 0.3 were introduced to prevent overfitting. The final output layer corresponded to six tree species classes, with a Softmax function used to compute class probabilities.
The model was trained using categorical cross-entropy as the loss function and the Adam optimizer with a learning rate of 0.0002. The batch size was set to 16, and the number of epochs was 100. Early stopping was implemented to halt training if validation performance did not improve. During training, both training and validation accuracy and loss were monitored, and the weights from the epoch with the best performance were saved.
To evaluate the generalization performance of the trained model, an unseen test dataset was used. During the testing phase, a confusion matrix was employed to analyze the classification results for each category. The model’s performance was measured using several evaluation metrics, including accuracy, precision, recall, and F1-score (Table 2). To reduce the impact of class imbalance and provide a comprehensive assessment of classification performance, the micro-average approach was applied (Opitz, 2024).

Results and Discussion

Training Preparation

Due to the training, the model’s training accuracy improved significantly. The training accuracy rose from an initial 17.5% to about 99.6%, while the validation accuracy increased from 57.9% to nearly 100%. Additionally, the training loss stabilized, dropping from 1.978 to 0.015, and the validation loss decreased from 1.492 to 0.0021, indicating that the model achieved a strong fit to the training data while maintaining excellent generalization on the validation data (Fig. 7).

Performance Evaluation

The performance of the ResNet50V2-based CNN model was evaluated using 150 test images. According to the confusion matrix, 3 of 25 images of type A were misclassified, while 5 of 25 images of type B and 2 of 25 images of type C were misclassified. In contrast, all 25 images of types D, E, and F were correctly classified (Fig. 8). These results demonstrate the effectiveness of the ResNet50V2-based transfer learning approach, which leveraged GPU-based training and optimized hyperparameters to improve feature extraction and generalization. The model’s high accuracy, particularly for tree types with complex visual patterns, indicates that it successfully captured subtle pictorial characteristics of the trees depicted in <Donggwoldo>.
Based on the evaluation metrics derived from the confusion matrix, the model achieved 97.8% accuracy, 93.3% precision, 93.3% recall, and 93.3% F1-score in the GPU environment (Table 3). Compared to the previous study conducted under a CPU environment (Lee and Sung, 2025), which reported an accuracy of 95.5%, precision of 86.6%, recall of 86.6%, and F1-score of 86.6%, all metrics showed marked improvement. These results indicate that the model achieved balanced classification performance across all tree species. The high accuracy and precision suggest that false positives were rare, and the model effectively learned even fine-grained features of the trees.
Among the 150 test images, 10 were misclassified. This outcome is interpreted as resulting from a combination of factors, including shape distortion due to reduced image resolution (Dodge and Karam, 2016), visual interference from background structures (Rosenfeld et al., 2018), and incomplete separation of tree regions during the preprocessing stage (Lee et al., 2018) (Fig. 9).

Conclusion

This study applied a CNN to classify tree images depicted in <Donggwoldo> by type, achieving a high classification accuracy of approximately 97.8%. This outcome demonstrates the potential to quantify tree depiction techniques in traditional paintings, thereby enabling the vegetation and landscape analyses of pictorial historical sources.
Furthermore, the study introduced a novel methodology for converting the visual information in historical paintings into digital data and using it for AI-based assessment. This approach highlights the applicability of artificial intelligence to the interpretation and digital archiving of cultural heritage and provides foundational data for future AI research in the field.
However, several limitations should be noted. First, the training dataset was limited to the two extant versions of <Donggwoldo> from Korea University and Dong-A University, which limits the method’s generalization to historical paintings from different periods or with various artistic styles. Second, the dataset extracted through manual preprocessing may involve some degree of subjective judgment by the researchers, limiting complete objectivity. Third, while the CNN model effectively quantifies visual patterns, it has limitations in directly interpreting the artistic context or stylistic intent of the paintings.
Future research should aim to minimize data bias by constructing larger-scale datasets that include a wider range of historical paintings and by incorporating automated object segmentation techniques. Additionally, integrating the analytical results of CNN models with art-historical and landscape-historical interpretations could lead to more comprehensive studies that holistically explore the aesthetic characteristics and landscape representations in traditional paintings.

Fig. 1
<Donggwoldo (東闕圖)>.
Source. Collections of the Korea University Museum, 19th century
ksppe-2025-28-6-907f1.jpg
Fig. 2
Research flowchart.
ksppe-2025-28-6-907f2.jpg
Fig. 3
Trees in the same location depicted in the <Donggwoldo>.
Source (From left). Dong-A University Museum, 19th century; Korea Heritage Service, 2025; Cultural Heritage Administration, 1991
ksppe-2025-28-6-907f3.jpg
Fig. 4
Examples of image agumentation.
ksppe-2025-28-6-907f4.jpg
Fig. 5
CNN model framework.
Source. MATLAB, 2025
ksppe-2025-28-6-907f5.jpg
Fig. 6
Comparison of accuracy and learning time by model.
ksppe-2025-28-6-907f6.jpg
Fig. 7
Training and validation accuracy and loss.
ksppe-2025-28-6-907f7.jpg
Fig. 8
Confusion matrix results.
ksppe-2025-28-6-907f8.jpg
Fig. 9
Misclassified image data.
ksppe-2025-28-6-907f9.jpg
Table 1
Tree representation techniques and data count
Label A (0) B (1) C (2) D (3) E (4) F (5) Sum
Type ‘sohonjeom’ (小混點) ‘gaejajeom’ (介字點) long oval ‘guyeob’ (勾葉) willow tree ‘ang-yeobjeom’ (仰葉點) flowering trees
ksppe-2025-28-6-907f10.jpg ksppe-2025-28-6-907f11.jpg ksppe-2025-28-6-907f12.jpg - ksppe-2025-28-6-907f13.jpg -
Example ksppe-2025-28-6-907f14.jpg ksppe-2025-28-6-907f15.jpg ksppe-2025-28-6-907f16.jpg ksppe-2025-28-6-907f17.jpg ksppe-2025-28-6-907f18.jpg ksppe-2025-28-6-907f19.jpg
Number of Samples 102 93 91 93 98 103 580

* The names of the Tree representation techniques in the table have been written according to ‘Korean pronunciation’

Table 2
Definitions of confusion matrix elements and evaluation metrics
Class Predicted Evaluation metrics
True False
Actual True TP (True Positive) FN (False Negative) Recall=TPTP+FN
False FP (False Positive) TN (True Negative) Specificity=TNTN+FP
Evaluation metrics Precision Precision=TPTP+FP Accuracy=TP+TNTP+TN+FP+FN F1-Score=2×Recall×PrecisionRecall×Precision
Table 3
Summary of evaluation metrics
Class Predicted Evaluation Metrics

Positive Negative
Actual True TP FN Recall
140 10 93.3%

False FP TN Specificity
10 740 98.7%

Evaluation Metrics Precision Accuracy F1-Score
93.3% 97.8% 93.3%

References

Alom, M. Z., T. M. Taha, C. Yakopcic, S. Westberg, P. Sidike, M. S. Nasrin, B. C. Van Esesn, A. A. S. Awwal, V. K. Asari. 2018 The history began from AlexNet: A comprehensive survey on deep learning approaches https://arxiv.org/abs/1803.01164.

Ahn, H. J. 2014. Paintiongs of old palaces. Daewonsa.

Bar, Y., N. Levy, L. Wolf. 2015. Classification of artistic styles using binarized features derived from a deep neural network. In: Lecture Notes in Computer Science. 8925:71-84. Springer; https://doi.org/10.1007/978-3-319-16178-5_5
crossref pmid
Cao, J., C. Peng, Z. Chen, Z. Yang. 2025. Classification of ancient murals based on improved ResNet deep learning. EMT. 48(1):186.

Cultural Heritage Administration. 1991. Donggwoldo Cultural Heritage Administration.

Cultural Heritage Administration. 2016. Analysis of major vegetation and restoration of partitions in Donggwol (Changdeokgung and Changgyeonggung) Daejeon, Korea: Author. Retrieved from https://www.khs.go.kr/main.html

Dodge, S., L. Karam. 2016 Understanding how image quality affects deep neural networks https://doi.org/10.48550/arXiv.1604.04004.

Du, X., Y. Cai. 2024. Design of Chinese painting style classification model based on multi-layer aggregation CNN. PeerJ Computer Science. 10:e2303. https://doi.org/10.7717/peerj-cs.2303
crossref pmid pmc
Kartik, K., T. Ahmed, S. Ghosh, R. Gupta, A. Tripathi. 2023. Transfer learning with CNNs in small ML datasets: Applying pre-trained CNN models and fine-tuning them for limited data scenarios. International Journal of Trend in Scientific Research and Development. 7(5):1087-1099.

Kim, H. J., W. K. Sim. 2007. Analysis of the status of plants and the characteristicsting on the Donggweol-do. Journal of Korean Institute of Traditional Landscape Architecture. 25(2):141-154.

Korea Heritage Service. 2025. October 27 Treasures Seonwonjeon Hall of Changdeokgung Palace. Retrieved from https://www.heritage.go.kr/heri/cul/culSelectDetail.do?&ccbaCpno=1121108170000

Ku, T. I., J. K. Lee. 1994. A basic research on the technique of Gye-hwa (border drawing) in oriental picture. Journal of Korean Institute of Traditional Landscape Architecture. 12(1):97-105.

Lee, G. Y., S. Y. Sung. 2025. Classification of trees in <Donggwoldo> using CNN deep learning-focusing on tree representation techniques. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. 48:785-790. https://doi.org/10.5194/isprs-archives-XLVIII-M-9-2025-785-2025
crossref
Lee, J. Y. 2018. Estimation of the three-dimensional vegetation landscape of the Donhwamun Gate area in Changdeokgung Palace through the rubber sheeting transformation of <Donggwoldo>. Korean Journal of Heritage: History and Science. 51(2):138-153. https://doi.org/10.22755/kjchs.2018.51.2.138
crossref
Lee, M. Y. 2024. A study on the painting characteristics of Gyehwa in Donggwoldo. Doctoral dissertation. Kyonggi University, Suwon.

Lee, S. J., K. D. Lee, S. W. Lee, J. G. Ko, W. Y. Yoo. 2018. Techonology trends and analysis of deep learningbased object classification and detection. Electronics and Telecommunications Trends. 33(4):33-42. https://doi.org/10.22648/ETRI.2018.J.330404
crossref
MathWorks. 2025. October 22 A Practical Guide to Deep Learning: From Data to Deployment. Retrieved from https://www.mathworks.com/

Opitz, J. 2024. A closer look at classification evaluation metrics and a critical reflection of common evaluation practice. Transactions of the Association for Computational Linguistics. 12:820-836. https://doi.org/10.1162/tacl_a_00675
crossref
Rashidi, H. H., S. Albahra, S. Robertson, N. K. Tran, B. Hu. 2023. Common statistical concepts in the supervised machine learning arena. Frontiers in Oncology. 13. https://doi.org/10.3389/fonc.2023.1130229
crossref pmid
Rosenfeld, A. 2018;The elephant in the room: Examining the importance of context for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); pp 876-885. https://doi.org/10.48550/arXiv.1808.03305.
crossref
Sathyanarayanan, S., B. R. Tantri. 2024. Confusion matrix-based performance evaluation metrics. African Journal of Biomedical Research. 27(4S):4023-4031. https://doi.org/10.53555/AJBR.v27i4S.4345
crossref
Shorten, C., T. M. Khoshgoftaar. 2019. A survey on image data augmentation for deep learning. Journal of Big Data. 6:60. https://doi.org/10.1186/s40537-019-0197-0
crossref
Tarasiuk, P., P. S. Szczepaniak. 2022. Novel convolutional neural networks for efficient classification of rotated and scaled images. Neural Computing and Applications. 34:10519-10532. https://doi.org/10.1007/s00521-021-06645-9
crossref
The Academy of Korean Studies. 2025. October 24 Gyehwa [Architectural painting]. Encyclopedia of Korean Culture; Retrieved from https://encykorea.aks.ac.kr/Article/E0003279

Unknown artist. 19th century. Donggwoldo (Eastern Palace Painting). Collections of Dong-A University Museum.

Unknown artist. 19th century. Donggwoldo (Eastern Palace Painting) Collections of Korea University Museum.

Wang, J., S. Lee. 2021. Data augmentation methods applying grayscale images for convolutional neural networks in machine vision. Applied Sciences. 11(15):6721. https://doi.org/10.3390/app11156721
crossref
TOOLS
Share :
Facebook Twitter Linked In Google+ Line it
METRICS Graph View
  • 0 Crossref
  •    
  • 140 View
  • 7 Download
Related articles in J. People Plants Environ.


ABOUT
BROWSE ARTICLES
EDITORIAL POLICY
AUTHOR INFORMATION
Editorial Office
100, Nongsaengmyeong-ro, Iseo-myeon, Wanju_Gun, Jeollabuk-do 55365, Republic of Korea
Tel: +82-63-238-6951    E-mail: jppe@ppe.or.kr                

Copyright © 2026 by The Society of People, Plants, and Environment.

Developed in M2PI

Close layer
prev next