Data Ingestion
Connect your data sources and import content into ragistry
Overview
ragistry supports multiple data sources to populate your knowledge base. Each source is automatically processed, chunked, and indexed for semantic search.
Supported Data Sources
Website Crawler
Automatically crawl and index your website content
- Automatic sitemap detection
- Intelligent content extraction
- Respects robots.txt and meta tags
Cloud Storage
Sync documents from cloud storage services
- OneDrive, Google Drive, Dropbox
- Automatic sync on changes
- Supports PDF, DOCX, TXT, and more
SQL Databases
Connect to relational databases
- PostgreSQL, MySQL, SQL Server
- Structured data querying
- Real-time data access
Manual Upload
Upload documents directly
- Drag-and-drop interface
- Batch upload support
- Instant processing
Processing Pipeline
Once data is ingested, it goes through the following steps:
- Extraction: Text is extracted from documents
- Chunking: Content is split into meaningful sections
- Embedding: Each chunk is converted to a vector
- Indexing: Vectors are stored in the database