Checking the crawl status – Google Search Appliance Installing the Google Search Appliance User Manual
Page 13

Google Search Appliance: Installing the Google Search Appliance
13
3.
In the Follow and Crawl Only URLs with the Following Patterns field, copy all start URLs from
the Start Crawling from the Following URLs field.
If you enter the URL pattern for a directory, the URL must terminate in a forward slash (/). Use only
the server part of the URL. If a URL refers to a specific page, only that page is crawled. For more
information on URL patterns, click the Help link or see Administering Crawl.
4.
In the Do Not Crawl URLs with the Following Patterns field, scroll through the list of patterns
that can be blocked from being crawled.
Many file formats are excluded from the crawl by default, including common graphic formats such
as .jpg. If you want a particular format crawled, remove the format from the list or comment the
format out using the comment symbol (#). If you do not want a particular document type to be
crawled, remove the comment symbol from the corresponding pattern. For example, if you do not
want any Microsoft Word files (.doc) crawled, remove the # sign that is in front of “.doc$” and no
.doc files will be crawled. You can also add specific URL patterns to this area to prevent the URLs
that match the patterns from being crawled.
5.
Click Save URLs to Crawl.
6.
In the left-hand menu, click Status and Reports > Crawl Status.
7.
Click Resume Crawl.
The search appliance starts to crawl the URLs according to the URL patterns you entered. When the
search appliance software is crawling content, the graphic on the page shows multicolored balls in
motion. You do not have to pause the crawl before making changes on the Crawl URLs page.
Checking the Crawl Status
You can check the progress of the crawl from the Home page.
To check the crawl status:
1.
In the side menu, click Home.
The Home page is displayed, showing the Crawl Status graph. The graph automatically refreshes to
show crawling activity. If the page does not refresh automatically, click any link, and then return to
this page. You can also click the browser’s Refresh button.