Overview
Web sources allow you to connect your workflows directly to external websites, giving your AI access to live, publicly available information. Whether it's your company blog, product documentation, support articles, or industry resources, web sources ensure your responses stay current with the latest online content.
Where to Add Web Sources
You can add web sources in multiple locations within Eloquens:
- During Workflow Creation - Add web sources while setting up a new workflow
- Knowledge Manager - Centralized location for organizing all your web sources across workspaces
- Existing Workflows - Add or manage web sources for workflows you've already created
- Workflow-Specific Management - Direct access to web source management from within individual workflows
Setting Up Web Sources for Workflows
Step 1: Access the Web Tab
Navigate to the "Knowledge" section within your workflow and select the "Web" card. This is specifically designed for adding websites as knowledge sources.
Step 2: Create a New Web Source
Click the "+ New source URL" button to add a new website.
In the "Add Web Source" dialog:
- Name: Enter a descriptive name for the web source (e.g., "Company Blog", "Product Documentation")
- Root URL: Enter the main URL of the website (e.g.,
https://blog.example.com
orhttps://docs.example.com
). This is the starting point for discovering pages. - Click "Create"
Step 3: Select Pages to Include
- Find Your Source: The new web source will appear under "Source URLs" - click on it
- Browse Discovered Pages: Under "Pages Found", you'll see a list of pages discovered on the website
-
Select Relevant Pages:
- Expand sections using the arrows to see all available pages
- Check the boxes next to specific pages you want your workflow to use
- Use "Select All Pages" or "Select All Children" for bulk selection
Step 4: Train and Save
- Start Training: Click the "Train selected URLs" button to process the selected pages
- Verify Selection: Ensure the web source is selected on the left side under "Source URLs"
- Save Changes: Click "Save changes" to apply the web sources to your workflow
Page Processing and Status
Training Status Indicators
The "Trained URLs" section shows the processing status of each page:
- PENDING - The page is queued for processing
- SUCCESSFUL - The page has been processed successfully and is ready for use
- FAILED - There was an error during processing
Only pages with a "SUCCESSFUL" status are actively used by the workflow.
Monitoring Progress
- Processing typically takes 30 seconds to 2 minutes per page
- You can view all trained URLs in the "Trained URLs" table below the page selection area
- URLs that fail training can be retried by selecting them again and clicking "Train selected URLs"
- Continue with other workflow setup while pages process in the background
Requirements and Limitations
Website Accessibility:
- Websites must be publicly accessible on the internet
- Sites must allow web crawlers to access their content
- Password-protected or restricted sites cannot be used
Content Restrictions:
- Only static content is captured during training
- Dynamic content (live chat, forms, user-specific data) is not included
- JavaScript-generated content may not be fully captured
Best Practices
Choose Strategic Root URLs:
- Use specific section URLs rather than entire website homepages
- Target relevant content areas (e.g.,
/docs/
,/blog/
,/support/
) - Avoid overly broad root URLs that discover irrelevant pages
Content Management:
- Regular Updates: Periodically re-train URLs for dynamic content that changes frequently
- Quality Control: Review failed training attempts and verify page accessibility
- Selective Training: Only train pages containing valuable information for your workflows
- Organized Naming: Use descriptive names for web sources to easily identify them later
Troubleshooting:
- Failed Training: Check if pages are still accessible or have been moved
- Missing Content: Verify the website allows web crawlers to access content
- Slow Processing: Large or complex pages may take longer to process
- Outdated Information: Re-train URLs when website content is significantly updated
Douglas Ho
Comments