Configuring crawl of controlled-access content – Google Search Appliance Getting the Most from Your Google Search Appliance User Manual

Page 19

Advertising
background image

Google Search Appliance: Getting the Most from Your Google Search Appliance

Crawling and Indexing

19

The following table lists the access-control methods that the search appliance supports and whether
the methods are supported for crawl, serve, or both.

Configuring Crawl of Controlled-Access Content

If the content files you want crawled and indexed are in a location that requires a login, create a special
user account on your network for the search appliance. When you configure crawl on the Admin
Console, provide the user name and password for that account. The search appliance presents those
credentials before crawling files in that location.

Configure a search appliance to crawl controlled-access content by performing the following steps with
the Admin Console:

1.

Configuring the crawl as described in “Configuring Crawl of Public Content” on page 17, but also
providing the search appliance with URL patterns that match the controlled content.

2.

Specifying access credentials for each URL pattern by using the appropriate Admin Console pages.
The means by which you provide these credentials is different for each kind of authentication:

For HTTP Basic and NTLM HTTP, use the Crawl and Index > Crawler Access page

For HTTPS web sites, the search appliance uses a serving certificate as a client certificate when
crawling. Upload a new certificate by using the Administration > Certificate Authorities page

The following figure shows the Crawl and Index > Crawler Access page.

Method

Crawl

Serve

HTTP Basic

X

X

NTLM HTTP

X

X

LDAP (Lightweight Directory Access Protocol)

X

Forms Authentication

X

X

x.509 Certificates

X

X

Integrated Windows Authentication/Kerberos

X

X

SAML Service Provider Interfaces (SPIs)

X

Connectors

X

X

Advertising