corpus-builder

EU AI Act RAG — End-to-end retrieval-augmented generation pipeline: SPARQL corpus builder, Cloudflare Workers AI backend, and Streamlit playground for querying Regulation (EU) 2024/1689

Updated Feb 26, 2026
Python

c0ntradicti0n / CorpusCookApp

Star

App and Scripts working with the corpus-builder CorpusCook, to have a corpus updated with corrected wrong predictions

amp python3 twisted corpus-linguistics nlp-machine-learning corpus-builder kivy-application

Updated Mar 20, 2020
Python

mrsumitbd / sieve

Star

software-engineering code-generation corpus-builder large-language-models ai-generated-code

Updated Mar 31, 2026
Python

cw-l / eml-contrib-ng

Star

CLI tool to redact and publish spam/phishing emails as a public research corpus.

python3 eml-files corpus-builder email-security cli-tool security-research pii-redaction

Updated Mar 11, 2026
Python

A Scrapy package based web scraper for collecting Kurdish text data from websites. The tool recursively crawls specified domains, extracts article content using Trafilatura, and filters results by language using Facebook's FastText language identification model.

Updated Mar 29, 2026
Python

Little-learning-station / ResearchYouTube_Comments2Corpus_Comments2Networks

Star

English-corpus building through YouTube comments and viewing comment networks in Gephi

research-tool corpus-builder youtube-api-v3 youtubecomment

Updated Mar 31, 2026
Python

Improve this page

Add a description, image, and links to the corpus-builder topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the corpus-builder topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

corpus-builder

Here are 11 public repositories matching this topic...

adbar / trafilatura

google / corpuscrawler

jhlopesalves / CorpusAid

CristinaGHolgado / vikitext

tubone24 / askfm-qa-crawler

ARAS-Workspace / eu-ai-act-rag

c0ntradicti0n / CorpusCookApp

mrsumitbd / sieve

cw-l / eml-contrib-ng

cikay / kurdish_scrapy

Little-learning-station / ResearchYouTube_Comments2Corpus_Comments2Networks

Improve this page

Add this topic to your repo