(If you do not see the frame version of this page with a table of contents in the left frame, click here.)

Indexing and Searching a farBook

 

Introduction

farVIEW provides tools to index topics by keywords and keyphrases. It also provides a tool to search by keywords and keyphrases. (to simplify this description, I'm going to lump keywords and keyphrases together and call them symbols.)

Here are some simple symbol samples:

    family
    friends
    Baltimore MD
    Sarah
    XML Information
    C++ Code
    Sept 12, 2003

When you index a topic using a symbol, you make it possible for farVIEW to identify that topic when you use the symbol in the Search tool. Indexing a topic causes farVIEW to store the symbol in the dictionary part of the farBook. When you search the farBook, farVIEW looks in the dictionary for matches.

Before farVIEW stores a symbol in the dictionary, it squeezes all the spaces out, then truncates the symbol on the trailing end so that its length is no more than the value of sizeKey in the [Books] section of the farview.ini file.

You can set sizeKey to any reasonable value. Reasonable would mean at least eight characters and less than 128 characters, but that's just my opinion: the range is not enforced. One thing to keep in mind is that once you create a farBook, changing the value of sizeKey will have no effect on that farBook. The value only applies when a farBook is created.

farVIEW does not perform full-text searches, though that could be accomplished with an appropriately coded farSlang module (which I haven't gotten around to writing. Such a module would probably include the ability to search using regular expressions, since it could easily use Philip Hazel's Perl-Compatible Regular Expressions library, Copyright (c) 1997-2000 University of Cambridge, which I have incorporated into farVIEW in the file PCRE.DLL.)
 

Indexing

farVIEW provides four tools to index topics.

Indexing with the topic title

When you create a new topic, farVIEW automatically adds the title to the farBook dictionary. This means that if you search the farBook using the title, you will obtain the corresponding topic(s).

Indexing using the Add Alias dialog

You can use the Add-Alias dialog to associate a word or phrase with the selected topic. Click the RMB on the topic, then click on the Add Alias menu item. farVIEW will display this dialog.

You can enter a symbol in the edit window, and you can optionally check the Unique Alias checkbox.

Your entry indexes the selected topic only if you do not check the checkbox. A "Unique Alias" identifies a topic to provide default properties and events for other topics that identify the topic by its unique alias in their type property. This should be explained in detail somewhere else, but probably isn't yet.

Assuming that you don't check the Unique Alias checkbox, farVIEW adds your symbol in the edit window to the farBook dictionary, which makes it available to be used when you search the farBook.

Indexing using RMB drag and drop

You can also index a topic by dragging it onto another topic using the RMB, and selecting the Index item from the context menu that appears when you drop the dragged topic. farVIEW associates the title of the target topic with the dragged topic in the farBook dictionary. Coupled with the ability to create topics in a farBook, this simple tool allows you to create a "categories" or index-like subtree for your farBook. This roughly corresponds to the index in a book, but is more useful, because you create, organize, and apply it yourself, so it should make sense to you.
 

Indexing in a Text Window

By marking text using the mouse while holding down the RMB, you will obtain a marking menu when you release the mouse button. Choose the Add Alias menu item to turn the marked text into a key phrase for searching.
 

Searching

farVIEW provides a full-expression search tool through the File/Search menu item. The search dialog appears as

If you enter one or more words, with "Match ANY keyword" checked, the farVIEW search engine will look for matches on any word in your list. If you check "Match ALL keywords", the search engine will look for topics aliased by all the words in your list.

If neither Match checkbox is checked,  you can use search expressions such as

   family or friends

You can use search arguments (symbols) of more than one word, as in

   friends and "Baltimore MD"

You can also use parentheses, as in

   (family or friends) and "Baltimore MD"

The search tool also supports a butnot operator, as in

   (family or friends) butnot "Baltimore MD"

If you don't feel like typing out the operator names, you can use the special-character representations, according to the following table.
 

*, &
AND
+, |
OR
- (minus)
BUTNOT

Using the special-character representations, the last search expression above appears as

    (family | friends) - "Baltimore MD"

Case is ignored, as are spaces, in the actual search. Note that and, or, and butnot are reserved words. You must separate search arguments and reserved words by spaces, by parentheses, or by the special-character operators listed above. Search arguments comprising more than a single word must be in quotes, but you can use matching pairs of single quotes, double quotes, or back quotes as needed.

Partial symbols are not detected. That means that if

   "XML Information"

is a topic title, you can't find the topic with a search argument of

   "XML" or "Information",

but you can find it with

    "XML Information"

or

    xmlinformation

One final, if subtle point: The search is applied to the farBook associated with the selected topic. That means that you can search a remote farBook as easily as you can search the local one. If no topic is selected, then the search is applied to the farBook in which you began your farVIEW session. That means, for example, that if a text window has focus when you begin a search, the session farBook is searched.
 

Full-Text Searching

As I observed in the introduction, the farVIEW search engine does not provide full-text search. Hand-in-hand with this is that it doesn't provide any form of automatic indexing beyond topic title indexing. There are several approaches to implement such "advanced" features, ranging from the relatively simple algorithm of simply searching the content of each farBook topic to the more complex automatic pre-indexing of the salient keywords and phrases of each farBook topic.

The content-search approach would tend to result in slow searches. The content of each topic is be searched for each argument in the search expression and the search result is composited from the results of each argument. Three search arguments, for example, would require that the content of each topic be searched three times. You might get impatient if the farBook is large, but this may be perfectly acceptable if the farBook is reasonably small. Phrases would not be an issue, though you would have to consider phrases that span more than one line of text, which complicates the search algorithm somewhat and slows it even more. This approach does allow generalized searching using regular expressions, since it can test against multiple-word search arguments. It could also support regular expressions.

The second approach, which is at the other extreme, tends to provide fast search times. This approach requires that each topic is indexed automatically before effective searching takes place. Each keyword that indexes a topic in the farBook is added to the dictionary, which increases the size of the farBook. Phrases are problematic and depend on the domain within which the farBook content extends. If there is no single domain, automatically indexing phrases would probably not be very cost-effective. Because of the design of farVIEW, regular expressions are not possible when the dictionary is searched instead of when the content is searched directly. This approach is probably worthwhile when the farBook is created once, and read many times, as in an eBook application. In that case, both approaches, or some blend of the two, could be provided.

With regards to automatic indexing, if a farBook extends over a single domain of interest, it may be more effective to index the farBook using a list of domain-dependent keywords. If the farBook is not limited to a single domain, then a list of stop words would be preferable. (Stop words are words to ignore while indexing.)

The first approach adds the least to the size of the farBook, tends to result in slow searches, and is relatively easy to implement. The second approach has a number of options, creates a large farBook, tends to result in fast searches, and is relatively difficult to implement. However, with careful design, either should be reusable to different farBooks and different domains

And, keep in mind that a farBook reader can extend the search-ability of a "canned" farBook freely.