Research Stories

View All

Research Stories

Development of the Next-Generation Transfer Learning Technology Solving High-Dimensional Data Analysis Challenges

Innovative Algorithm Design Overcoming Limitations of Existing Transfer Learning

Statistics
Prof. LEE, EUN RYUNG

  • Development of the Next-Generation Transfer Learning Technology Solving High-Dimensional Data Analysis Challenges
  • Development of the Next-Generation Transfer Learning Technology Solving High-Dimensional Data Analysis Challenges
Scroll Down

Professor Eun Ryung Lee (Department of Statistics), as a first author, has developed a new statistical methodology to overcome the limitations of high-dimensional analysis caused by data scarcity. In collaboration with Professor Seyoung Park (Yonsei University) and Professor Hongyu Zhao (Yale University), Prof. Lee successfully implemented a 'Transfer Learning Algorithm' that maximizes learning performance by selectively utilizing useful information, based on the insight that the contrast between target data and external source data exhibits a 'Low-rank' structure. This achievement paves the way for dramatically improving prediction accuracy in fields such as rare disease research and precision medicine, where analysis has been difficult due to small sample sizes, by effectively integrating external big data.


■ Innovative Algorithm Design Overcoming Limitations of Existing Transfer Learning This study focused on resolving the predictive uncertainty of 'Small Data', which persists even in the big data era, and the side effects of existing transfer learning. In high-dimensional regression problems like genomic analysis, accurate model estimation is difficult because the number of variables reaches tens of thousands while the target samples of interest are very few. To complement this, transfer learning utilizing external data has been attempted, but 'Negative Transfer' problems, where prediction performance degrades due to the indiscriminate use of data irrelevant to the target, have frequently occurred.


To solve these problems, the research team proposed a two-step estimation method that effectively controls the structural difference between the target model and the source model within a 'Low-Rank Regression' framework. In particular, the 'Forward Source Detection (FSD)' technique devised by the team sequentially detects only those information sources among numerous external datasets that practically help target analysis. This amplifies common signals between data and blocks unnecessary noise, enabling precise estimation without bias even in high-dimensional environments.


■ Proven Superior Prediction Performance and Theoretical Optimality Theoretical verification proved that the newly developed transfer learning methodology has a much faster statistical convergence rate than using target data alone and achieves optimal efficiency from a Minimax perspective. Its superiority was also confirmed in actual data application. The research team conducted an experiment predicting anticancer drug responses of specific lung cancer mutations (KRAS-mutant NSCLC), which had only 28 samples, using Cancer Cell Line Encyclopedia (CCLE) data. As a result, the proposed algorithm recorded significantly higher prediction accuracy compared to existing pooled analysis methods or simple marginal screening methods by effectively selecting and integrating data from other cancer types with similar genetic characteristics to lung cancer.


■ Applicability to Various Fields The 'Forward Source Detection Transfer Learning (FSD-Trans-NR)' technology of this study is designed to operate stably even in high-dimensional environments where the data dimension is much larger than the sample size, and can be flexibly applied to complex data situations where low-rank structures and sparse structures are combined. These characteristics are expected to be widely utilized for predictive modeling in various fields, such as financial risk analysis and new material development, where data acquisition is difficult and costly, as well as drug response prediction in the biomedical field.


This research was supported by the National Research Foundation of Korea (NRF) and the U.S. National Institutes of Health (NIH). This research outcome was published online in October 2025 in the Journal of the American Statistical Association (JASA), the world's most prestigious journal in the field of statistics.


※Title: Transfer Learning Under Large-Scale Low-Rank Regression Models

※Journal: Journal of the American Statistical Association (JASA)

※DOI: https://doi.org/10.1080/01621459.2025.2555057



COPYRIGHT ⓒ 2017 SUNGKYUNKWAN UNIVERSITY ALL RIGHTS RESERVED. Contact us