Content sources, Xml response formats – Google Search Appliance Administrative API Developers Guide: Protocol User Manual

Page 11

Advertising
background image

Google Search Appliance: Administrative API Developer’s Guide: Protocol

11

XML Response Formats

Depending on the API request, the search appliance Administrative API returns XML responses. The XML
response is a Google Data Atom entry. The <entry> must contain at least one <gsa:content>. All the
search appliance related information are put in <gsa:content> XML tag. For example, the following list
defines a GSAEntry response as an XML document that contains information about the crawl URLs. The
client libraries convert this XML response into a GSAEntry object.

<?xml version=’1.0’ encoding=’UTF-8’?>
<entry xmlns=’http://www.w3.org/2005/Atom’

xmlns:gsa=’http://schemas.google.com/gsa/2007’>
<id>http://ent1:8000/feeds/config/crawlURLs</id>
<updated>2008-12-08T20:11:58.342Z</updated>
<link rel=’self’ type=’application/atom+xml’

href=’http://gsa:8000/feeds/config/crawlURLs’/>

<link rel=’edit’ type=’application/atom+xml’

href=’http://gsa:8000/feeds/config/crawlURLs’/>

<gsa:content name=’entryID’>crawlURLs</gsa:content>
<gsa:content name=’crawlURLs’>http://yourdomain.com/</gsa:content>
<gsa:content name=’startURLs’>http://yourdomain.com/</gsa:content>
<gsa:content name=’doNotCrawlURLs’>

http://yourdomain.com/not_allow

</gsa:content>

</entry>

Content Sources

The sections that follow describe how to configure the Content Sources features of the Admin Console:

“Crawl URLs” on page 12

“Data Source Feed” on page 13

“Feeds Trusted IP Addresses” on page 16

“Crawl Schedule” on page 17

“Crawler Access Rules” on page 18

“Host Load Schedule” on page 21

“Freshness Tuning” on page 22

“Recrawl URL Patterns” on page 23

“Connector Managers” on page 24

“OneBox Settings” on page 27

“OneBox Modules” on page 28

“Crawl Status” on page 29

“Document Status” on page 30

Advertising