conv

ML Research Radar is a basic tool to understand the process of topic modeling using a live data source. In this example it identifies research papers and gives an overview of the segmentation of topics.

Using Streamlit’s new experimental connection feature, the app seamlessly interfaces with the ArXiv API and any model selection from the transformers library. It retrieves a specified number of the latest machine learning papers based on selection and efficiently summarises their content using a lightweight BERT model that can run on CPU.

This basic implementation can give an understanding to how this experimental connection can be used on an ML focused project.

It also can give an idea of how NMF works from a very high-level.

Features

  • Data Fetching: ML Research Radar utilizes the ArXiv API to fetch the latest machine learning papers.

  • Data Processing: The application processes the summaries of fetched papers to determine the most discussed topics in machine learning research. This involves transforming paper summaries into tf-idf vectors, fitting an NMF model to extract topics, and mapping each paper to its topic.

  • Data Visualization: The processed data is then visualized using Plotly, showing the distribution of topics over time. Users can choose between different visualizations based on either count or proportions.

  • Experimental Connections: ML Research Radar takes advantage of Streamlit’s new feature, st.experimental_connection, to connect to a language model that can summarize paper abstracts.

Code Structure

This application is divided into several classes, each responsible for a particular task:

  1. ArxivAPI: This class is responsible for fetching papers from the ArXiv API.

  2. ProcessingData: This class is used to process a DataFrame of papers. It transforms paper summaries into tf-idf vectors, fits an NMF model to extract topics, and assigns each paper to its topic.

  3. Visualize: This class creates a visualization of the distribution of topics over time.

  4. LanguageModelConnection: This class demonstrates the usage of Streamlit’s st.experimental_connection. It is used to connect to a language model, specifically a question-answering model.

For a more detailed understanding on how models can be integrated within st.experimental_connection, feel free to take a look at the code