Turning feed contents into search results, Url patterns, Trusted ip lists – Google Search Appliance Feeds Protocol Developers Guide User Manual

Page 30: Adding feed content

Advertising

Google Search Appliance: Feeds Protocol Developer’s Guide

The success message indicates that the feedergate process has received the XML file successfully. It
does not mean that the content will be added to the index, as this is handled asynchronously by a
separate process known as the “feeder”. The data source will appear in the Feeds page in the Admin
Console after the feeder process runs.

The feeder does not provide automatic notification of a feed error. To check for errors, you must log
into the Admin Console and check the status on the Content Sources > Feeds page. This page shows
the last five feeds that have been uploaded for each data source. The timestamp shown is the time that
the XML file has been successfully uploaded by the feedergate server.

You can automate the process of uploading a feed by running your feed client script with a cron job.

Turning Feed Contents Into Search Results

URL Patterns and Trusted IP lists defined in the Admin Console ensure that your index only lists content
from desirable sources. When pushing URLs with a feed, you must verify that the Admin Console will
accept the feed and allow your content through to the index. For a feed to succeed, it must be fed from
a trusted IP address and at least one URL in the feed must pass the rules defined on the Admin Console.

URL Patterns

URLs specified in the feed will only be crawled if they pass through the patterns specified on the
Content Sources > Web Crawl > Start and Block URLs page in the Admin Console.

Patterns affect URLs in your feed as follows:

•

Do Not Follow Patterns—If a URL in the feed matches a pattern specified under Do Not Crawl
URLs with the Following Patterns, the URL is removed from the index.

•

Follow Patterns—When this pattern is used, all URLs in the feed must match a pattern in this list.
Any other URLs are removed from the index.

Entries in duplicate hosts also affect your URL patterns. For example, suppose you have a canonical host
of foo.mycompany.com with a duplicate host of bar.mycompany.com. If you exclude
bar.mycompany.com from your crawl using patterns, then URLs on both foo.mycompany.com and
bar.mycompany.com are removed from the index.

Trusted IP Lists

To prevent unauthorized additions to your index, feeds are only accepted from machines that are
included in the List of Trusted IP Addresses. To view the list of trusted IP addresses, log into the Admin
Console and open the Content Sources > Feeds page.

If your search appliance is on a trusted network, you can disable IP address verification by selecting
Trust all IP addresses.

Adding Feed Content

For web feeds, the feeder passes the URLs to the crawl manager. The crawl manager adds the URLs to
the crawl schedule. URLs are crawled on the schedule specified by the documentation on the
continuous crawler.

Advertising