Will the spider follow javascript links?
Ask a question
Why doesn't the IndexMySite spider find all my pages?
These are several reasons why the spider won't find a page.
- The
spider will not follow a link if it does not contain the
Primary URL or a
Secondary URL.
Suppose you submit the following: http://www.mysite.com/myhome/. The spider will only follow
links which contain the string www.mysite.com/myhome/
- If you use an ISP service, the
URL you use to access your website, may be redirected by your service provider.
For example, you may use the URL http://www.mysite.com/myhome to get to your page, but it is redirected to http://members.home.com/myhome. The spider would not find this page because its URL does not contain the string www.mysite.com/myhome
- Some of your pages may be generated by scripts (programs) such as ASP, java, etc. Use the VIEW SOURCE button on your browser. Look for HTML anchor tags (<A ). The spider will only follow these links. To get around this, you can include additional URL's using IndexMySite's
Secondary URL's feature.
- If your site uses a doorway page, submit the URL you go to when you click on your Enter my site link instead of your doorway URL.
Can I control the text displayed in the search results abstracts?
Yes. An abstract for each page on your site is developed during indexing. The spider looks for <META ... tags on each page. If it finds one with NAME = 'Description it will use the text there as your abstract.
Otherwise, The spider will produce an abstract by taking the first 50 words in finds (ignoring HTML tags and scripts);
Why doesn't the Stylemaker program change my results page?
This is a cache problem. To speed up surfing on the internet, browsers save pages you visit in a directory on your hard drive. When you vist these same pages, the browser may fetch one of these old pages if the internet is too slow, or if it looks like the page hasn't changed.
While it may look like your search results page hasn't changed, it really has. To get around this problem, change your browser's caching setting.
In Internet Explorer go to Tools->Internet Options->Settings and select Every visit to the page.
In Netscape go to Edit->Preferences->Advanced->Cache and select Every Time
Can I control the Link text displayed on my search results page?
Yes. A TITLE for each page on your site is developed during indexing. The spider looks first for a <TITLE> tag and then any <H1> heading tags. The text it finds in these tags is used for
search result title text.
Can I control the link text displayed on my Site Map?
Yes. The Sitemap uses the same Title text as does the Search Results page. See the answer to the previous question.
Can I cause the search results page to display abstracts when first opened?
Yes. Insert the following HTML code into your
search button code, between the <FORM> and the </FORM> tags.
<INPUT TYPE=hidden NAME='abs' VALUE='on'>
How can I prevent the spider from indexing pages I don't want it to?
There are several ways to acomplish this.
- You can put pages you don't want spidered in a subdirectory above or at the same level as the directory which holds your Primary or any
Secondary URL's.
For example, if your
Primary URL is in www.mysite.com/mypages/, put the pages you don't want spidered in www.mysite.com/myotherpages/
- Add pages you don't want spidered to the
Exclude list. For example, if you don't want pages with .cgi extensions, add .cgi to the Exclude list.
- Add the directory you don't want spidered to your Exclude list. For example, if you add the Exclude mask /frogs/ to your Exclude mask list, the spider will exclude URL's which contain /frogs/.
- The opposite of the Exclusion list is the Inclusion list. Add text that MUST appear in the URL to the
Include list. For example, if you only want pages with .cgi extensions, add .cgi to the Include list.
- Add a Robot META tag to your pages. For example:
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
- Add a
Robot.txt file to your server. This file contains instructions to any visiting spider. For example:
User-agent: *
Disallow: /scripts/ #This is a cgi directory
Disallow: /cgi-bin/ #This is a cgi directory
Disallow: /gifs/ #bitmaps
Can I turn the soundex feature on and off?
Yes. Insert the following HTML code into your
search button code, between the <FORM> and the </FORM> tags.
<INPUT TYPE=hidden NAME='fuzzy' VALUE='on'>
Can I change the number of matches listed on the search results page?
Yes. Insert the following HTML code into your
search button code, between the <FORM> and the </FORM> tags.
<INPUT TYPE=hidden NAME='hits' VALUE='5'>
This tag sets the display count to 5 hits. For other values, change the VALUE parameter accordingly.
Will the indexmysite spider follow javascript links on my page?
No. Indexmysite doesn't speak javascript. However, this is easily remedied.
Simply put a comment block in your HTML coding. Within the comment block, list the urls in clear text that you want the spider to follow.
Make sure you use the entire url.
What HTML code do I need on my own search page?
<FORM METHOD='POST'
ACTION='http://www.indexmysite.com/cgi-bin/indexmysite/ndx_search.pl'>
<INPUT TYPE='hidden' NAME='uid' VALUE='99'>
<INPUT TYPE='submit' NAME='cmd' VALUE='Search'>
<INPUT TYPE='hidden' NAME='cmd' VALUE='Search'>
<INPUT TYPE='text' NAME='qury' SIZE='16' MAX='256'>
</FORM>
Substitute your UID number for this site for the number in
RED above. The <FORM tag is shown above on two lines in order to fit on your display. When you put the code in your own page, the <FORM tag and all of its parameters should be on a single line.
Optional Code
Abstracts/Summaries
To open the search page with abstracts turned on, add:
<INPUT TYPE='hidden' NAME='abs' VALUE='on'>
Number of Matches displayed
To open the search page with other than 10 matches diplayed, add:
<INPUT TYPE='hidden' NAME='hits' VALUE='10'>
substituting for the value 10 the number of matches desired.
Search Results page target
When you search, a search results page is generated and displayed
You can control where the Search Page opens by inserting a TARGET parameter into your FORM statement.
To open Search Results page in parent window
<FORM TARGET='_parent' ...
To open Search Results page in top-most window
<FORM TARGET='_top' ...
To open Search Results page in its own browser window
<FORM TARGET='_blank' ...
To open Search Results page in frame or iframe
<FORM TARGET='frame_name' ...
Search Results document target
When you click a document link on the search results page, the document is opened.
You can control where the document opens by inserting the following HTML code between the <FORM ...> and </FORM> tags.
To open the document in the
parent window
<input type='hidden' name='targetwindow' value="_parent"/>
To open the document in the
top-most window
<input type='hidden' name='targetwindow' value="_top"
/>
To open the document in its
own browser window
<input type='hidden' name='targetwindow' value="_blank"
/>
To open the document in a frame or
iframe
<input type='hidden' name='targetwindow' value="frame_name"
/>
Search Button Text
You can change the search button text and text box length to fit the design of your page. Valid search button text is any of the following: (Q, Query, Submit, Ask, Search, Go, or Find)
Searching multiple indexes
If you have more than one index, you can search them together by combing their uid`s in the search button code. For example, to search index uid 99, index uid 100 and index uid 101, you simply combine the uids with the underscore character, like this 99_100_101. Substitute this combination for the number in
RED above. Note that the style of the search results page will be governed by your settings for the first uid in the combination (uid 99 in this example).