Data Ingestion

Connect your data sources and import content into ragistry

Overview

ragistry supports multiple data sources to populate your knowledge base. Each source is automatically processed, chunked, and indexed for semantic search.

Supported Data Sources

Website Crawler

Automatically crawl and index your website content

Automatic sitemap detection
Intelligent content extraction
Respects robots.txt and meta tags

Cloud Storage

Sync documents from cloud storage services

OneDrive, Google Drive, Dropbox
Automatic sync on changes
Supports PDF, DOCX, TXT, and more

SQL Databases

Connect to relational databases

PostgreSQL, MySQL, SQL Server
Structured data querying
Real-time data access

Manual Upload

Upload documents directly

Drag-and-drop interface
Batch upload support
Instant processing

Processing Pipeline

Once data is ingested, it goes through the following steps:

Extraction: Text is extracted from documents
Chunking: Content is split into meaningful sections
Embedding: Each chunk is converted to a vector
Indexing: Vectors are stored in the database