Crawl urls, Retrieving crawl urls, Updating crawl urls – Google Search Appliance Administrative API Developers Guide: Protocol User Manual

Page 12

Advertising
background image

Google Search Appliance: Administrative API Developer’s Guide: Protocol

12

Crawl URLs

Retrieve and update crawl URLs for a search appliance using the crawlURLs entry of the config feed.

Retrieving Crawl URLs

To get the crawl URLs information for a search appliance, send an authenticated GET request to the
config feed URL:

http://Search_Appliance:8000/feeds/config/crawlURLs

The following example requests the current crawl URLs values from a search appliance:

<?xml version=’1.0’ encoding=’UTF-8’?>
<entry xmlns=’http://www.w3.org/2005/Atom’

xmlns:gsa=’http://schemas.google.com/gsa/2007’>
<id>http://gsa:8000/feeds/config/crawlURLs</id>
<updated>2008-12-12T07:49:32.957Z</updated>
<link rel=’self’ type=’application/atom+xml’

href=’http://gsa:8000/feeds/config/crawlURLs’/>

<link rel=’edit’ type=’application/atom+xml’

href=’http://gsa:8000/feeds/config/crawlURLs’/>

<gsa:content name=’entryID’>crawlURLs</gsa:content>
<gsa:content name=’startURLs’>http://www.example.com/</gsa:content>
<gsa:content name=’doNotCrawlURLs’>.xls$</gsa:content>
<gsa:content name=’followURLs’>http://www.example.com/</gsa:content>

</entry>

Updating Crawl URLs

To update Crawl URLs information for a search appliance, send an authenticated PUT request to the
config feed URL:

http://Search_Appliance:8000/feeds/config/crawlURLs

The following example overwrites the crawl URLs specified in the entry to update:

<?xml version=’1.0’ encoding=’UTF-8’?>
<entry xmlns=’http://www.w3.org/2005/Atom’

xmlns:gsa=’http://schemas.google.com/gsa/2007’>
<id>http://gsa:8000/feeds/config/crawlURLs</id>
<gsa:content name=’entryID’>crawlURLs</gsa:content>
<gsa:content name=’startURLs’>http://www.example.com/</gsa:content>
<gsa:content name=’doNotCrawlURLs’>.xls$</gsa:content>
<gsa:content name=’followURLs’>http://www.example.com/</gsa:content>

</entry>

Property

Description

doNotCrawlURLs

Do not crawl URLs with the following URL patterns.

followURLs

Follow and crawl only URLs with the following URL patterns.

startURLs

Start crawling from the following URL patterns.

Advertising