Skip to content
#

corpus-builder

Here are 11 public repositories matching this topic...

EU AI Act RAG — End-to-end retrieval-augmented generation pipeline: SPARQL corpus builder, Cloudflare Workers AI backend, and Streamlit playground for querying Regulation (EU) 2024/1689

  • Updated Feb 26, 2026
  • Python

A Scrapy package based web scraper for collecting Kurdish text data from websites. The tool recursively crawls specified domains, extracts article content using Trafilatura, and filters results by language using Facebook's FastText language identification model.

  • Updated Mar 29, 2026
  • Python

Improve this page

Add a description, image, and links to the corpus-builder topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the corpus-builder topic, visit your repo's landing page and select "manage topics."

Learn more