|
A.
B.
C.
D.
E.
F.
G.
H.
I.
J.
K.
L.
M.
N.
O.
P.
Q.
R.
S.
T.
U.
V.
W.
X.
Y.
Z
-------- A --------
Abstract
A brief (50 words or less) paragraph displayed below the
search result link on the
search results page. This text is taken from the spidered web page referenced by the link. The
META description tag contents is displayed if it exists. Otherwise, the first 50 words of text -- excluding HTML and scripting language -- is used.,
Account ID
(see Password)
-------- B --------
Basic Authentication
Some websites require the user to enter a username and password before pages can be viewed. The spider can index these pages if the website owner supplies the username and passwords prior to indexing.
Background color
A value which defines the background color of the
search results page. If used in conjunction with a
background image, allows you to specify the background color that will be used if a user has automatic loading of images turned off. (see
Site Administration)
Background image
Specifies an image that should appear behind the text of a page (the image will be tiled to fit the page). URL specifies the image file that contains this image (either a GIF or a JPEG). Note: the smaller the image file, the quicker it and the page will load. Any image file can be used, but it should be one that can be tiled, and it should be as small as possible. A background image (if specified) takes precedent over the
background color. (see Site Administration)
Boolean
Refers to boolean logic.
Queries can be constructed as Boolean expressions by advanced searchers. Boolean expressions consists of operators and operands where search words are the operands and operators include terms such as AND, OR, NOT,
XOR, and MOST OF. For example, the following is a boolean search expression. (apple AND pie) AND NOT corporation OR (steve AND jobs).
Button color
The color of the search button on the
search results page. (see
Site Administration)
Button text color
The color of the text (SEARCH) on the search button. (see
Site Administration)
-------- C --------
Character Set
IndexMySite recognizes the ISO 8859-1 (Latin-1) character set which represents most western European languages. IndexMySite also allows users to include their own stopword list.
(see character code table)
Clustering
Clustering describes the process used to group similar documents together. IndexMySite uses a clustering algorithm to group query results. Similar documents are placed into folders which are then given descriptive names. These names are chosen based upon the contents of the document abstracts.
To see an example of how Clustering can provide a great search experience, click
examples and search
the White House website
-------- D --------
Default Data Directory
The directory which contains your web sites' default (home) page.
Default document
The filename of the default web page for your site. This is typically a filename such as index.html or default.html
Document
Refers to a document page. HTML files larger than 10K bytes are divided into smaller document pages (split at named anchor, or
jump tags: <A Name="tagname">). This enables searches which jump to the correct section of a larger document. PDF documents are divided at page breaks. Searches jump to the specific document page containing the search terms. All other urls are considered single documents.
Domain name
The unique name that identifies an Internet site. Domain Names always have 2 or more parts, separated by dots. The part on the left is the most specific, and the part on the right is the most general. A given machine may have more than one Domain Name but a given Domain Name points to only one machine. For example, the domain names:
- tippecanoe.com
- www.tippecanoe.com
- mail.tippecanoe.com
can all refer to the same machine, but each domain name can refer to no more than one machine.
Usually, all of the machines on a given Network will have the same thing as the right-hand portion of their Domain Names (tippecanoe.com in the examples above). It is also possible for a Domain Name to exist but not be connected to an actual machine. This is often done so that a group or business can have an Internet e-mail address without having to establish a real Internet site. In these cases, some real Internet machine must handle the mail on behalf of the listed Domain Name.
-------- E --------
Exclusion string (Exclusion mask)
A string of characters used to limit
spidering of your web site. During the spidering process, the
URL of each page at your site is evaluated. If the URL contains an Exclusion string, it will not be indexed. Some care should be exercised in the use of exclusion strings. A string of asp would exclude all .asp files, but also jasper.html.
-------- F --------
File size (Maximum)
The maximum file (document) size the indexer will index is 5 Megabytes.
Fonts
The user can specify character FONTS to use on the Search Results Page. However, the actual fonts used to display the page will depend on the fonts installed on the computer which requests the search. The
Site Administrator can therefore choose a set of fonts for the browser to choose from, listed in preferential order.
Formats
The Indexer can read and index a variety of document formats including but not limited to:
- Portable Document Format (PDF)
- HTML
- Microsoft Word
- WordPerfect
- Excel
- Powerpoint
- Postscript
- Rich Text Format (RTF)
- Powerpoint
- Text
-------- G --------
GIF (Graphical Image Format)
The image format used widely on the World Wide Web. The Site administrator can supply GIF images to be used for a Company
Logo or a Background Image on the search results page.
-------- H --------
HTML
Hypertext Markup Language. The basis of the World Wide Web HTML is a Document Type Definition (DTD), or subset, of the Standard Generalized Markup Language (SGML).
Hyperlink
A link from one document to another, or to any resource, or within a document. The hyperlinked text is highlighted in some fashion. The default is usually blue, underlined text, but your display may vary depending on your browser and indexmysite settings.
-------- I --------
Inclusion string (Inclusion mask)
A string of characters used to limit
spidering of your web site. During the spidering process, the
URL of each page at your site is evaluated. If the URL does not contains an Inclusion string, it will not be indexed.
Index
A database which contains information about words found within a web site. IndexMySite creates a search index consisting of the site vocabulary and which web site pages contain each word.
ISO 8859-1
See Character Set
-------- J --------
JPEG
The image format used widely on the World Wide Web. The Site administrator can supply JPEG images to be used for a Company
Logo or a Background Image on the search results page.
Jump-tag Precision
The name attribute of the Anchor tag (<A ) is used to create a named anchor. These named anchors are often referred to as jump tags. Using jump tags you can create links that can jump directly into a specified section of a page. Jump tags help the user find information quickly. The Indexmysite spider, divides webpages at these jump tags, treating each partial page as a separately indexed document. Subsequently, when searching, the list of results may contain links directly to these partial documents.
-------- K --------
Keywords
Keywords are words used in a
search expression to search the index.
-------- L --------
Language of site
The indexer handles English and non-english sites differently. For English sites, a stopword list is used. This list can be edited by the user. In addition, a companion Soundex index is generated which can be used for fuzzy (sounds like) searches. Also, the phrase feature is enabled for English sites. This feature allows the user to request proximity by enclosing the query terms in quotation marks. e.g "apple & banana".
Linkback message
The Search Results page contains a hyperlink back to the users website. The text displayed in this link is the Linkback message. This text can be specified by the user and changed at any time. (see
Site administration).
Linkback URL
The Search Results page contains a hyperlink back to the users website. The URL in this link is the Linkback URL. This URL can be specified by the user and changed at any time. (see
Site administration).
Logo image URL
The Search Results page may contain a Company or Product Logo. The user provides the URL to a GIF or JPEG image which is display on the search results page. This URL can be changed at any time. (see
Site administration).
-------- M --------
Meta description
The HTML syntax allows for
META tags which can provide useful information to the spider. One of these is the META Description tag. If found on a web page, the contents of this tag is used as the search
Abstract.
Meta keywords
The HTML syntax allows for META tags which can provide useful information to the spider. One of these is the META Keyword tag. If found on a web page, the contents of this tag may be used as the search Abstract if no META Description tag is found.
Meta robot tags
The HTML syntax allows for META tags which can provide useful information to the spider. One of these is the META Robot tag. If found, this tag can direct the spider to prevent spidering in specified directories.
Meta tags
The HTML syntax allows for META tags which can provide useful information to the spider. These tags have the following syntax:
<META NAME='name of tag' CONTENT = 'content of meta tag'>
Some examples include:
<META NAME = 'DESCRIPTION' CONTENT = 'This text will be used as a search Abstract' >
<META NAME = 'KEYWORDS' CONTENT = 'apple, banana, pie, pastry' >
<META NAME = 'ROBOT' CONTENT = 'subdir1, subdir2' >
Most of
The Most OF operator is exclusive to the IndexMySite search engine.
The concept is simple yet extremely useful. The MOST OF operator takes a
list of search words. The IndexMySite search engine will return documents
which contain 'most of' (or approximately 60 percent) of the search words.
To see how this compares to other boolean operators, consider a query with 10
search words. Using an AND operator would return 0 documents if just one
of the search terms is not present. Using the OR boolean operator will
return a document if just one term matches. This could result in thousands
of matches, not necessarily a good result.
The MOST OF operator would return documents which match 6 or more search terms
-- documents which more probably match your intent.
-------- N --------
-------- O --------
-------- P --------
Page limit
The maximum page limit for IndexMySite indexes is 25,000 pages per account This is the maximum number of unique URL's which may be contained in a single
search index.
Page size limit
See File size
Password
An alphanumeric string of no more than 16 characters. A unique user password is required to access the indexmysite features including site indexing and re-indexing.
Primary URL
This is the starting point for the
spidering process. Normally, this is the top-most page in the web site hierarchy, and from which, one could access every page at the website without leaving the website. The indexmysite spider will follow any link (URL) within the same
domain. The spider derives the root domain by stripping off page names and parameters from the URL.
For example, if the primary URL entered is http://www.widget.com/mypagedir/page1.asp?param1, the root would be http://www.widget.com/mypagedir/. The spider would only follow links which contained this root.
( see Re-direction ).
-------- Q --------
Query
A collection of keywords and optional boolean operators submitted to the IndexMySite search engine (see
Search Expression)
-------- R --------
Ranking
An indication of how well a search result matches a query. IndexMySite evaluates each search result based on its ranking criteria and orders the results from the most likely (highest ranking) to the least likely (Lowest ranking).
Re-index (Re-spider)
To re-create the search index by re-spidering the web site.
Re-Direct
A request for a specific URL is sometimes re-directed to another page by the web server. For example, a request for http://www.myhomepage-members.com might be resolved as http://www.members.com/~myhomepage/. To check if your pages are being re-directed, enter the URL into your browser's address box. Once the page is displayed, check the address box. Did the URL change? If so, then your URL was re-directed. This is the URL you should submit for indexing
Results Title string
A Results link on a Search Results page consists of a hyperlink and a Results Title text string. Potential candidates for the Results Title string are (in order of precident) HTML Title tag, HTML H1 tag, "No title found ".
Robot text file
A text file found in the web sites
default data directory with the name robot.txt and containing directions for web site spiders. (see
A Standard for Robot Exclusion )
-------- S --------
Script Iteration
Web sites often use CGI scripts to generate web pages on the fly. These scripts are often designed with the ability to generate a number of different pages -- the specific page displayed depending on user input. HTTP syntax provides a mechanism (environment variables) to pass information to these scripts. The QUERY STRING environment variable is found at the end of a URL following the question mark. (e.g. http://www.mysite.com/myscript.cgi?query_string) The spider can be directed to handle each page a script can generate as a separate web page, by evaluating the QUERY SCRIPT environment variable.
Search button
The web site visitor clicks on the search button to initiate a search. Indexmysite provides either a standard gray form button or a graphic image button.
Search button code
The HTML coding which produces the search button, the search text entry box, and the necessary instructions to the IndexMySite search engine.
Search engine
The system (hardware and software) residing at Tippecanoe Systems, Inc. which receives the
Query (search expression) and generates a
search result.
Search Expression
The combination of keywords and optional operators which constitute the "question " submitted to the search engine.
Search field
The box displayed on the users web page into which the visitor types the query. The box is generated using the standard HTML <INPUT> tag.
Search results
A web page generated by the IndexMySite search engine which contains a
relevancy ranked list of web pages which contain words which match the query. Each page is listed as a hyperlink to the page, and an optional page
Abstract.
Search Result Link
Specifically, the list of hyperlinks on the Search Results Page generated as a response to a submitted Query. A result hyperlink consists of a URL to s specific page relating to the Query, and the Results Text String
Secondary URL
A URL which would not be found by spidering the Primary URL. For example, URL's in a different Domain, URL's in a higher directory, or URL's which are not linked to from any other page.
Here's an example:
A website with two domains -- www.domain1.net and mysite.members.com. You would enter one of these as the
Primary URL, and the other as the Secondary URL
Another example:
A website may have some of its pages in www.abcd.net/redpages/ plus other pages in www.abcd.net/bluepages/. Entering www.abcd.net as the Primary URL would get all the pages.
If the user enters www.abcd.net/redpages/ as the Primary URL. The spider would recognize www.abcd.net/redpages/ as the root URL and follow only links which contained this root. None of the web pages in the /bluepages/ would be indexed.
By specifying www.abcd.net/bluepages/ as a Secondary URL, the spider would find and index all of the pages found in /bluepages/ as well.
Site Administration
The Site Administrator is a system which allows the user to administer his or her various site indexes. Protected by password security, the user can
re-index their sites at any time, and change various options such as
Search Results page appearance,
Secondary URL's, and
exclusion list items.
Site Map
This is a web page which displays a hierarchical view (tree) of the users web site based on the spidering results.
Soundex
A technology which allows searching by keywords that sound like (are phonetically similar) to words found on the users web site.
Spider
An application which reads each page on a website looking for
hyperlinks. Each hyperlink found is followed in turn and the process if repeated recursively until all pages at the site have been read. The IndexMySite spider feed words it finds to an
indexing application which builds the search index.
Stemming
The process of finding roots of word roots by removing English syntax ending variations (ed, ing, tion, s ...). For example, The root (creat) of the following words are all the same. (creation, creator, creating, .created, creatively). The indexing application saves only the word roots. Subsequent searching for any of these variations would therefore find any page which contained any of the other word variants. The IndexMySite indexer uses the Porter algorithm.
Stop list
Th eprocess of discarding words which add little to search relevance. Words such as and, or, if, of, but, because, while are not indexed.
Summary
(see Abstract)
-------- T --------
-------- U --------
URL (Uniform Resource Locator)
The Uniform Resource Locator is a "standard" way of easily expressing the location and data type of a resource. URL's provide the basis for the Hyperlinks found on web pages.
Usage Reports
Users can view Usage Reports on demand. These reports provide information on query activity important for managing a website.
User ID (UID)
Identifies a single web site search index. A user may have multiple user ID's in a single Account. A user ID specifies a single index. An
Account
identifies a single IndexMySite subscriber.
-------- V --------
-------- W --------
Word endings
(see Stemming)
-------- X --------
-------- Y --------
-------- Z --------
|