Topic evolution analysis in scientific archives is a critical domain of research that focuses on understanding how scientific topics evolve, emerge, or change over time.
This thesis aims to take advantage of advanced deep learning techniques to analyze large collections of time-stamped scientific documents. By tracking the temporal progression of semantically similar documents and their citation relationship, we contribute to this line of research by identifying evolving topics, detecting emerging areas of interest, and observing shifts in scientific paradigms.
Our objectives include establishing new baselines for fundamental concepts like "topic" and "evolving topics," developing novel methods for detecting emerging topics and paradigm shifts, and creating improved evaluation metrics for topic models. By addressing challenges such as the difficulty in defining complex notions, lack of standard categorization, and absence of reliable evaluation metrics, this research seeks to push the boundaries of scientific discovery and innovation, providing more nuanced and accurate insights into the evolution of scientific knowledge.
This analysis provides valuable insights into the dynamic nature of scientific knowledge, revealing patterns such as emerging topics, integration of insights from different fields, and the rise or decline of specific research areas.
The experiments conducted throughout this research have led to the development of several software tools, each designed to address specific challenges in topic modeling and topic evolution analysis.