Element name | Description | Mandatory/
optional | Can be empty? | Multiple entries? |
DOCUMENT | The root element that contains an <IDENTIFIER> element followed by either a <DELETE/> tag or a set of metadata elements. | Mandatory | No | No |
IDENTIFIER | Contains a document ID, a unique alphanumeric string that identifies the document. This tag has to be present for each document and appears only once. It cannot be empty. The <HTTPURL> or <NATIVEURL> elements may be good candidates for the <IDENTIFIER>.
Note: A semicolon is not a valid character for the <IDENTIFIER>. | Mandatory | No | No |
AUTHOR | Contains the name of the person who created the document. If the author is unknown, the tag can be left empty. Each <AUTHOR> element can contain only one name, but multiple occurrences of the element are allowed. For example, if the authors are Paul White and Jake Black, then the <DOCUMENT> element contains:
- <AUTHOR>Paul White</AUTHOR>
- <AUTHOR>Jake Black</AUTHOR>
Authors are used to generate affinities. | Mandatory | Yes | Yes |
UPDATEDBY | Contains the name of the person who last modified/saved the document. This tag follows the same rules as <AUTHOR>. In addition, it has the added requirement that it cannot be empty when the <AUTHOR> tag is filled. If people who modified the document are unknown, copy authors into <UPDATEDBY> elements. | Mandatory | Yes | Yes |
CREATED | Contains the date/time when the document was first created. All time/dates must be GMT, which means the XML spider performs no conversion. The date/time should be presented in the YYYY-MM-DD-HH.MM.SS format, for example: 1999-03-30-21.19.26
Each <DOCUMENT> has to contain exactly one <CREATED> tag, unless the document is being deleted. This element cannot be empty. If the creation date is unknown, default to the current time. | Mandatory | No | No |
LASTREAD | Contains the date/time when the document was last accessed. This tag follows the same rules as the <CREATED> element. If the time of the last access is unknown, copy the <CREATED> time into this tag. | Mandatory | No | No |
MODIFIED | Contains the date/time when the document was last modified. This tag follows the same rules as the <CREATED> element. If the time of the last access is unknown, copy the <CREATED> time into this tag. | Mandatory | No | No |
REVISIONS | Contains a set of <REVISION> elements. Only one <REVISIONS> per document is permitted. | Mandatory | No | No |
REVISION | This child element of <REVISIONS> contains a date/time when the document was modified. This tag follows the same rules as the <CREATED> element, but multiple <REVISION> elements are permitted. If revision information is unknown, copy the <CREATED> time into this tag. | Mandatory | No | Yes |
FIELD | Optional element. Reserved for future use. | Optional | Yes | Yes |
TITLE | Contains the title of the document. Same requirements as for the <SUMMARY> element below. If omitted, Discovery Server attempts to generate a title using document subject, file name derived from the <HTTPURL>/<NATIVEURL> elements, or the first line of the document body. If unsuccessful, Discovery Server defaults to "[Untitled]." | Optional | No | No |
SUBJECT | Contains the subject of the document. Same requirements as for the <SUMMARY> element below. If subject is omitted, Discovery Server attempts to generate one. | Optional | No | No |
SUMMARY | Provides a short description of the document. If present, the element should not be empty. Only one <SUMMARY> per document is permitted. If this tag is omitted, Discovery Server tries to create a summary, providing that the document is written in a supported language. See the list of supported languages later in this article. Summary should be 256 characters or shorter. | Optional | No | No |
KEYWORDS | Contains a set of <KEYWORD> elements. This element is optional, but if present, it should contain at least one <KEYWORD>. Only one <KEYWORDS> element per document is allowed. All keywords combined should be MAX_LEN characters or shorter, where MAX_LEN is computed according to the formula: MAX_LEN = 256 - (number of <KEYWORD> elements). | Optional | Yes | No |
KEYWORD | A child element of <KEYWORDS>. Same requirements as for <SUMMARY>, except that multiple <KEYWORD> elements are allowed. | Optional | No | Yes |
BODY | Contains the document body. This tag can be empty. Only one <BODY> element per document is allowed. | Mandatory | Yes | No |
APPLICATION | Provides the name of the application that created the document. If present, this tag should not be empty.
Note: This setting does not affect how the K-Map and the K-Map Editor display the document. Non-Domino files are viewed in the application registered by the operating system for the document's file extension. | Mandatory | No | No |
LANGUAGE | Contains the default language of the document represented in the two-letter ISO 639 language abbreviation followed by an optional ISO 3166 country code. In the ISO 639/ISO 3166 convention, language names are written in lowercase, while country codes are written in uppercase, for example, en-US.
If this tag is empty, Discovery Server guesses the correct language by examining the document body. This information is used to create a document summary and a list of keywords. Only one <LANGUAGE> element per document is allowed.
In addition to the two-letter ISO 639 language abbreviations, use bk for Bokmal and ny for Nynorsk. | Mandatory | Yes | No |
NATIVEURL | Contains the Notes or File URL of the document, for example:
<NATIVEURL> is used by the K-Map and the K-Map Editor to display the document. If the document cannot be retrieved, the <HTTPURL> tag is used instead. For Notes documents, native URL is used when the <USENOTES/> tag is present. | Optional |  | No |
HTTPURL | Contains the HTTP URL of the document, for example:
HTTP://epr.acme.com/Wsj.nsf/0/00E5CFF068B33B1A852569760021DDDF?OpenDocument
URLs should be represented in absolute form and be consistent with Uniform Resource Identifiers (URI): Generic Syntax and Semantics, RFC 2396. | Mandatory |  | No |
ACL | Optional tag that contains a document's access control list. Contains a collection of ALLOW and DENY elements. If empty or missing, it is assumed that everyone with repository access can view the document. Only one access control list per document is permitted.
In addition to the document ACL, Discovery Server takes into account the repository ACL supplied in the Database.xml file. The document is exposed to a user only if the user is included in the repository and document ALLOW lists and is not included in the DENY list.
Ensure that user identities defined in external repositories map to user identities Discovery Server uses to grant access to the K-map (via HTTP authentication and user identity and password contained within the DS Directory - Person record) | Mandatory | Yes | No |
ALLOW | This child element of <ACL> contains a group or a user who can read the document. This element is optional, but if present, it should not be empty. Each <ALLOW> element can contain only one name or group, but multiple occurrences of the element are allowed.
If <ALLOW> tags are not present, Discovery Server assumes that all users with repository access can view the document, except for those explicitly stated in the DENY list. NT users and groups should be listed in the DOMAIN/USER_NAME format. Currently, access checking of users in external NT domains is not supported. Names are case-insensitive. | Optional | No | Yes |
DENY | This child element of <ACL> contains a group or a user who is denied reader access. Same requirements as for <ALLOW>. | Optional | No | Yes |
LINKS | Contains a collection of <LINK> elements. This element is optional, but if present, it should contain at least one <LINK>. Only one <LINKS> element per document is allowed. | Optional | No | No |
LINK | This child element of <LINKS> describes a link contained in the document. The URL attribute is an address that points to the destination anchor, that should be represented in absolute form, and that should be consistent with Uniform Resource Identifiers (URI): Generic Syntax and Semantics, RFC 2396. Multiple occurrences of the element are allowed. | Optional | Yes | Yes |
USENOTES | Specifies a preferred viewer for a Lotus Notes document. If this tag is present, the document opens in a new Lotus Notes window. If the Notes client is not available, the default Internet browser becomes the fallback viewer. If the tag is omitted, the document is displayed in a new browser window. This element is always empty. | Optional | Yes | No |
INCLUDE | Optional tag. Only one <INCLUDE> tag per document is allowed. If this element is present, Discovery Server merges the included file and the container. The FILEPATH attribute should not be empty. The resulting document is registered with the Discovery Server. The merge happens according to the following rules:
- Always use text (document body), keywords, and summary from the included file.
- Use container's metadata. If container is missing an author or title, then get omitted information from the included file.
- Use container's ACLs.
Attribute:
- FILEPATH - location of the attached file.
| Optional | Yes | No |
ATTACHMENT | Optional tag. If this element is present, the attributes should not be empty. For multiple attachments in a document, Discovery Server processes the attached file in a manner similar to how attachments are processed for Domino. Attachments are registered with Discovery Server independently from the container. Rules used to process attachment metadata:
- If the attachment is missing an author, title, summary, or keywords, then get omitted metadata from the container.
- Combine the title/subject of the container with the attachment name. For example, if container's title is Container1, and attachment is named file1.doc, the new title is Container1(attachment - file1.doc).
- Use container's ACLs.
Attributes:
- FILEPATH - Specifies the location of the attached file
- IDENTIFIER - Contains a unique ID for the attachment
- NATIVEURL - Specifies the Notes or File URL of the attachment
- HTTPURL - Specifies the HTTP URL of the attachment
- USENOTES - Specifies a preferred viewer for the attachment
| Optional | Yes | Yes |
DELETE | Used to remove previously processed documents from the data repository. If this tag is present, the document is deleted. This element is always empty.
Note: If the document is new and has not been registered during a previous run of the spider, a message is logged. |  |  |  |