Google Search Appliance Protocol Reference User Manual

Page 92

Advertising
background image

Google Search Appliance: Search Protocol Reference

Dynamic Result Clustering Service /cluster Protocol

92

The top-level entries are described in the following table.

The dynamic result clustering service’s default JavaScript client ignores the <document> element and
does not use the <doc> array. The XML response is very basic, and does not use any validations such as
a DTD or XML.

The following DTD defines the XML rules, however the XML output is not validated against these rules:

<?xml version="1.0"?>
<!ELEMENT toplevel (Response, t_fetch, document+)>
<!ELEMENT Response (algorithm, t_cluster, cluster)>
<!ELEMENT cluster (gcluster+)>
<!-- each gcluster element is an alternate query and its location indexes from the
top results -->
<!ELEMENT gcluster (label, doc+)>
<!-- each document element is search result, complete with url, title, and snippet
-->
<!ELEMENT document (url, title, snippet)>
<!ELEMENT algorithm EMPTY>
<!ELEMENT t_fetch EMPTY>
<!ELEMENT label EMPTY>
<!ELEMENT doc EMPTY>
<!ELEMENT url EMPTY>
<!ELEMENT title EMPTY>
<!ELEMENT snippet EMPTY>
<!ATTLIST algorithm

data (Concepts)>

<!ATTLIST t_cluster

int CDATA #REQUIRED>

<!ATTLIST label

data CDATA #REQUIRED>

<!ATTLIST doc

int CDATA #REQUIRED>

<!ATTLIST url

data CDATA #REQUIRED>

<!ATTLIST title

data CDATA #REQUIRED>

<!ATTLIST snippet

data CDATA #REQUIRED>

Entry

Description

<cluster>

The output from different clustering algorithms. There is only one supported cluster
algorithm, so the value of <algorithm> must be Concepts.

The <cluster> category consists of:

A series of <algorithm> and subordinate <gcluster> pairs.

The subordinate <gcluster> is a series of <label> statements and the array of
<doc> elements that have that label.

The label is a query suggestion. The <doc> statements are indexes into the
<document> section that follows.

Each <label> provides an alternative query, and each <doc> array provides the
document location indices.

<document>

A sequence of the URL, title, and snippet for each of up to 100 top search results
from a search query. The search appliance creates the <doc> arrays from the
<document> list.

Advertising