Tags

A Scripting Language for Text

Tags

Introduction

Tags is a scripting language for processing text. You can write simple Tags scripts to process plain and delimited text. You can also easily extract information from HTML and XML documents obtained from websites, use ODBC to support SQL queries, and use simple commands to manipulate folders and text files.

You can write a valid Tags script in a single line of text. But you could also write a Tags script that spans many files to implement, for example, a complex document and software generation library. I know because I have.

Here is the traditional "Hello World" written as a Tags script:

<hello>
Hello World.
</hello>

A Tags script is always embedded within an XML document as text (a text node, in XML language). A trivial Tags script is simply the text contained in the document element of the script document - in this case, the text within the hello element. Tags' default action is to output the text it finds as it processes the script to the standard output. If you replaced the "Hello World." text with the text of a book, Tags would output the entire text of the book. But a Tags script can do much more.

Here are some simple sample scripting snippets:

* Load the Top Stories RSS document from the CNN website into a Tags variable, and then save it in a file.

<topstories script="/topstories/text()">
  $#in topstories(http://rss.cnn.com/rss/cnn_topstories.rss)
  $#out topstories(topstories.xml)
</topstories>

Tags commands are identified by the leading "$#" character sequence (but there can be leading spaces as in the sample). The Tags language also supports variables, and in this example, topstories is a Tags variable. The $#in command reads the document identified by the URL in the parentheses into the topstories variable. The $#out command writes the contents of the topstories variable to the file named topstories.xml.

* Read text lines from a file, and write them to the standard output.

<copy script="/copy/text()">
$#forEach line(myfile.txt)
$?forEach$$
$#end
</copy>

The line-variant of the $#forEach command (there are several other variants as you will see later) reads each text line from myfile.txt, placing the text line in the forEach variable where you can reference it in the part of the script between the $#forEach and the following $#end commands. In the sample script, the line following the $#forEach command references the forEach variable. Tags variables are referenced by preceeding the variable name with the "$?" character sequence, and following the variable name with the "$$" character sequence. (You can change the characters that Tags will expect within the script using the marks attribute in the document element, but it's probably not worth doing.) In this example, the $?forEach$$ reference causes the contents of the forEach variable to replace the variable reference and to be written to the standard Tags output file.

* Select records from a database, and write them to the standard output.

<select script="/select/text()">
$#set dsn(customerDSN)
$#set username(myusername)
$#set password(mypassword)
$#forEach SQL(select * from customer)
$?forEach$$
$#end
</select>

Tags uses the ODBC interface to support the SQL query. To run this script, you need a database table, called customer, and you need to have defined a DSN (Data Source Name), called customerDSN, to provide the interface information to the ODBC driver. You must also preset the dsn variable to the DSN name. You may also need to set the username and password variables if they are needed. The SQL-variant of the $#forEach command issues the SQL select statement to the ODBC driver, and then places each resulting record in the forEach variable, where you can reference it. In this sample, as in the previous sample, the $?forEach$$ reference causes the contents of the forEach variable to be written to the standard Tags output file.

The Tags scripting language supports XPath and regular expressions to allow considerable scripting power. And its simple but comprehensive command set is easy for anyone with scripting or programming experience to learn and use. If you aren't familiar with XPath, here is a place to start. And if you don't know regular expressions, you could start here.

How to Execute a Tags Script

You can execute a Tags script from the command line, from within a batch file, from a program, or from a WSH script (JavaScript or VBScript). The Tags command line takes the following parameters:

The name of the Tags script file to execute,
Any parameters needed by the Tags script.

There are also several pre-defined flag-parameters that you can use:

-V

Plays the ok.wav file on success, and the error.wav file on failure, if the files are available.

-X

Displays this manual in the default browser, if both are available.

-Z

Saves variables to files when they are loaded with the $#in command (for debugging).

-n

(n is a number) Adjusts the time Tags sleeps to share CPU cycles between commands. Not usually required for short runs. Mostly useful for running a Tags script in the "background.

Example:

> tags hello.xml -v >hello.txt

This command causes Tags to execute using one of the sample files included in this release. On completion, it plays the ok.wav file if successful, or the error.wav file if not successful (assuming that the wav-files are present.)

Several sample scripts are included with the release.

When you install the Tags files by downloading and unzipping the tags.zip file from http://paul.medlock.com/tags.zip, you should also add an environment variable, called tagsPath, and set it to contain the path to the folder where you installed Tags.

Some Basics

The elements, attributes, and text of a script file are wholly determined by the application. Since you make up the element and attribute names, along with the structure of the script file, to fit your application, there is no DTD or schema that describes a valid Tags script.

The text in the Tags script is free-form and can contain any ordinary text and special characters except for the standard five XML predefined characters:

instead of "<", use <
instead of ">", use >
instead of "&", use &
instead of "'" (apostrophe), use '
instead of """ (quote), use "

If your text contains any of these characters, you may need to convert them to the equivalent XML entity reference.

On the other hand, you can choose to embed your text in CDATA-sections instead. You can use a CDATA-section anywhere you could write text, and you can even mix them together, since Tags treats CDATA-sections as if they were text. A CDATA-section begins with the string "<![CDATA[" and ends with "]]>". Here is an example:

<element><![CDATA[
put your <marked> up text & commands here
]]></element>

The text may also contain white-space: viz., spaces, tabs, and new-lines. Since these characters are preserved in the text, you will find that they will frequently appear in the output of your script unless you control their use..

Here's a useful idea: If you aren't using the CDATA option and you choose to convert the special characters to entity-references when performing a search-and-replace, be sure to replace the ampersands with & first. Otherwise, you will never find the ampersands later to fix them.

The examples in this manual may not use the XML entities when they should so that they are easier to read. But don't forget that you will have to deal with that issue before you can use your script in Tags. The characters that Tags uses for markup were chosen so as not to infringe on XML's markup.

Here is another useful idea: You can check an XML document for being well-formed using Internet Explorer 5+, Netscape 6+. Mozilla, Sea Monkey, FireFox, etc; To use IE, for example, just drag the name of the file you want to check onto the IE shortcut on your desktop. IE will recognize the XML file name extension and display the document. If the document contains an error, your browser will report the line and column numbers where the error was detected. Of course, if you use a different file name extension, e.g., myscript.tags, the browser may not recognize the file as XML.

XML is case-sensitive, and, consequently, XPath expressions are case-sensitive. Tags is partly case-sensitive. Command names are not, but variable names are.

A Tags script file must be a well-formed XML document. Usually the bulk of the file is the text that you want in the output. Here is the Hello.xml example again:

<hello script="/hello/text()">
Hello world.
</hello>

and you can run it with the command line

> tags hello.xml >hello.txt

The document element of a Tags script document should contain the script attribute, which identifies to the Tags interpreter where the script is within the document using an XPath expression. In the example, the value of the script attribute is "/hello/text()". This is an absolute XPath expression. It's a good idea to always use an absolute XPath expression to locate the script. The script attribute is optional, but only if the Tags script is the sole occupant of the document element, as in this case. We need the script attribute in more complex script documents, since the script probably will not be in such an obvious place, so you are probably better off by getting in the habit of using it.

By default, Tags writes the text generated by the script to the standard output file, but at least one of the sample scripts we have already discussed demonstrates how to direct Tags output to other files.

About those pesky whitespace characters. If you look carefully at the contents of the output file from the Tags run above, you will notice that there is a blank line, followed by the "Hello world." line. This blank line resuts from the newline that follows the <Tags> element - the "Hello world." line is on the next line down. You can remove that extraneous line from the output in two ways. You could rewrite the script as

<hello script="/hello/text()">Hello world.
</hello>

or you could use a join-command:

<hello script="/hello/text()">$\j
Hello world.
</hello>

The join-command ($\j) joins with the next line, and is one of several special text output control commands. Another command is the newline-command, which breaks lines, and is written as $\n. It causes the text of the line that follows the command to be written as the next line. In the following line of text, the newline-command causes the one line to be output as two lines.

this is the first line$\nthis is the second line

There are other output control commands, but I'll explain them later in the manual.

As we saw in the second sample script, you can redirect output to files other than the standard output using the $#out command. Let's modify the hello.xml file by using the $#out command to redirect its output to another file:

<hello script="/hello/text()">$/j
Hello world.
$#out (hello.txt)
</hello>

After you run this example, you will find the output of the script in hello.txt. Note that this version of the $#out command does not identify a variable as the output source as did the RSS document load sample in the first section of this manual. A variable name is not needed because Tags can emit text to a default variablet (its name is output, if you want to reference it), and the $#out command in this example is outputting the text from the default variable to the file. (note: the $#out command flushes the default variable as a side effect. )

In most programming languages, the text information is usually marked off from the other elements of the language with special marks, such as quotation marks, etc., while the language commands are not marked. In Tags, it's the other way around: text is written simply as text. It is the special Tags commands that are marked.

There are two kinds of Tags symbols: commands and referencers. Commands occupy a single line of text, and are identified by a $-sign followed by a #-sign followed by the command name. Spaces are not allowed to separate these three parts, but commands do not have to start in the beginning of the line: there may be leading spaces. Lines that begin with the "$#" identifier that are followed by a space or do not have a recognized command name are considered comments and are ignored.

You use referencers to modify the outputs that your Tags script generates. You can reference the text and attributes of the Tags script document, other XML documents that you load, and variables whose values you set. Referencers begin with a $-sign followed by an explanation-point ("!"), a question-mark ("?"), or a caret followed by an expression of some kind, followed by two $-signs. Referencers can appear pretty much anywhere within your text as you need them, but they must be complete on the same line on which they start. On the other hand, their resolved value may span as many lines as desired. The file copy and the ODBC samples both used the $?forEach$$ variable referencer.

Commands and referencers will be discussed in more detail in subsequent sections, but here are some examples:

Tags commands:

$#out (myfile.txt)
$#text class(myclass)
$#if (true)
$#end
$#debug (on)
$#get objectname(Enter the name of the new object:)
$# this is a comment (because of the space after the $#-prefix)

Tags referencers:

$!/model/help/text()$$
prompt="$!@prompt$$"$\j
&lt;map name="Action" value="$?line{1}$$" info="$?line{2}$$"/&gt;

The effect that these commands and referencers might have on the output of a script depends on the context in which they operate. Different data at the locations specified by the referencer expressions will result in different outputs. And, since there is no difference between data and program in Tags, any referencer could obtain text that contains commands and referencers that Tags would also process in a recursive fashion. That's how Tags provides something akin to the subroutine paradigm that programmers are familiar with, though not exactly, since Tags does not provide a specifically defined facility for passing parameters to "subroutines".

More Samples

Here are a couple of sample scripts of more complex activities you can implement in a few Tags script lines:

* Query a database table, called customer, to obtain customer information, write the information into a text file, and then display the results in notepad. The script assumes that a DSN, called customerDSN, has been created for the database table access. Check the link given earlier for information about ODBC.

<db2text script="/db2text/text()">
$#set dsn(customerDSN) assumes that the DSN customerDSN was previously declared
$#set sqlcolumns(,CustNo,Name,Street,City,State/Prov,ZipCode,Country,Phone)
$#forEach SQL(select * from customer) {
$?CustNo$$,$?Name$$,$?Street$$,$?City$$,$?State/Prov$$,$?ZipCode$$,$?Country$$,$?Phone$$
$#end }
$#out (cust.txt)
$#exec (notepad cust.txt)
</db2text>

If you provide the names you want to assign the columns in the result records to the Tags interpreter using the sqlcolumns variable, you can access the columns as variables by their name, which I do in this example. In order to use ODBC, you must first have set up the ODBC link for the specific database, as I described earlier. It is beyond the scope of this manual to explain that, but you can get more help by following this sequence of steps in Windows XP: Windows Start -> Control Panel -> Administrative Tools -> Data Sources (ODBC). Here is a Google search link to a number of tutorials. Ok, now you are on your own.

* Here is another script that builds on the earlier script to read the same RSS document from the CNN news site, extract features from the document to create an HTML document, and then display it using your default internet browser. Note the use of the CDATA-section to escape all the HTML tags.

<rss2html script="/rss2html/text()"><![CDATA[
$#in contextNode(http://rss.cnn.com/rss/cnn_topstories.rss)
<html>
<head>
<h2>$!/rss/channel/title/text()$$</h2>
</head>
<body>
$#forEach node($!/rss/channel/item$$) { list all the items in the feed
  $#set contextNode($?forEach$$)
<h3>$!title/text()$$</h3>
<p>$!description/text()$$</p>
<p><a href="$!link/text()$$">Link</a></p>
$#end }
</body>
</html>
$#out ($?currentPath$$/topstories.htm)
$#open (file://$?currentPath$$/topstories.htm)
]]></rss2html>

This example obtains the Top Stories RSS document from CNN, as in the earlier sample, and creates an HTML document using the <item> objects in the document, writes the result to a file called topstories.htm, and then opens the default browser to display the file. Note that the URL in the $#in command must begin with "http://" so that Tags will know to look for the object on the web.

Tags Referencers

Tags referencers may be coded virtually anywhere within the text of the script, and have the form

$reftype symbol { subscriptor } $$

There are three reference types, distinguished by the single-character reftype:

! (exclamation-mark) indicates an XPath expression. In most circumstances, Tags replaces the referencer with the value obtained by evaluating the XPath expression. There is a modified version of this, which allows to specify the context node within the expression. That modification uses an "@" before the "!". The samples below present an example of this.

? (question-mark) indicates a variable reference. Tags replaces the referencer with the value of the variable specified by the symbol. Variables are discussed later.

^ (caret) indicates a stack pop. Tags replaces the referencer with the value of the variable at the top of the stack specified by the symbol. Stack variables are used primarily as a way to pass parameters to sub-routines as well as a way to return results back to sub-routine callers. Multiple parameters can be passed and multiple results can be returned by using the $#push command and the $#pop command or the $^pop$$ referencer.

Referencers may be used anywhere within the script where they make sense. A referencer may also contain referencers, and may result in text that contains other referencers, which are also resolved until only unmarked text is left. As already mentioned, a referencer cannot be split across two or more lines: it must lie wholly within a single text line.

Examples:

$!//config/tag$$	an XPath expression reference that identifies all the <tag> elements in all <config> elements in an XML document.
$?forEach$$	a reference to the Tags variable that contains the local value within a $#forEach statement.
$?3$$	a reference to the third parameter on the command line
receiver->SetSource("$!@source$$");	an XPath referencer to the source attribute embedded in some text in the script document. (Note that the quotes are part of the output, not part of the referencer.)
$?$?index$$$$	a reference to the variable identified by the value of the referenced index variable (a nested reference)
$#set x($!$x+1$$)	a Tags $#set command using an XPath expression reference to increment the variable x by one. (Note that you can reference a Tags variable within an XPath expression using only the $-leadin character as documented in the XPath specification. Writing ($!$?x$$+1$$) would also work, except that the Tags interpreter resolves the reference instead of the XPath interpreter, so you may have to place it in quotes if it resolves to a string constant.)
$@script!/myscript/mysubroutine$$	This special form of the XPath referencer allows to specify a node to use as a reference point when processing the XPath expression. The example is using the script variable, which is initialized by Tags to the document element of the script document itself. If no context node is specified, Tags uses the contents of the contextNode variable as the reference point.
$# forEach node($!/dep/mod[match(@name, "$?forEach$$.[cC]")]/ref/@name$$)	This example demonstrates an XPath referencer that contains a variable referencer.
$#set str($!lower-case("A STRING")$$)	This example demonstrates how to use an XPath string function. Note that the string parameter must be in quotes.

Subscriptors

When the type of a resolved referencer is a string, a list of strings, or a nodeset (an XPath object), you can use an optional trailing subscriptor to obtain a portion of the resolved referencer value. A subscriptor is annotated as an open curly-brace, followed by one number, or two numbers separated by a comma, followed by a close curly-brace, and is appended to the end of the referencer before the trailing dual markers (the $$ tail); eg., $?var{4,5}$$. The subscriptor, itself, can also incorporate one or more referencers, but they must resolve to one or two numbers (integer). When subscripting a string, the first number (the index) can be prefixed with a "C", "F", or a "W". The letter determines whether the subscript is by character, field, or by word, respectively.

Subscriptors with a single numeric parameter {index}

The resolved value of the subscriptor has effect when it is either positive or negative. If it is zero, the value of the resolved subscripted referencer is left unchanged, i.e., it is not subscripted.

If the resolved value of the subscripted referencer is a string, and the index is preceded by a "c", as in {c5}, Tags treats the string as a series of characters. The index references the beginning of the character(s) to be extracted, beginning with one as the first character in the string.

If the resolved value of the subscripted referencer is a string, and the index is preceded by an "f", or nothing, as in {f5}, Tags treats the string as a series of fields preceded by a delimiting character. Any character can act as a delimiting character, and it is (by definition) identified by virtue of being the first character in the string. If the first character is a comma, the field delimiting character is a comma. If the first character is the letter "A", then the field delimiting character is the letter "A". (Notice in the example below that the string is prefixed with a comma to identify the comma as the delimiting character.) In this manual, strings treated as a set of fields are referred to as a delimited string, or as a string record .

,Lincoln,Abraham,Springfield,Illinois

If the resolved value of the subscripted referencer is a string, and the index is preceded by a "w", as in {w5}, Tags treats the string as a series of words separated by a space. Processing the words in a string follows the same rules as processing the fields in a string. Note that punctuation is not removed, and if connected to a word with no intervening space, it will be counted as part of the word.

If the value of the resolved referencer is a nodeset, Tags obtains the node in the nodeset corresponding to the subscriptor value, ndx, counting the first node as node one. I.e., the first node in a nodeset variable is identified as $?nodeSet{1}$$, where nodeSet is the name of the variable. If the ndx value is negative, the nodes are counted from the last node, which is counted as -1.

If the value of the subscriptor is larger than the number of objects (fields, strings, or nodes), then the value of the subscripted referencer is empty. If the type of the resolved subscripted referencer is not a string, string list, or a nodeset, the subscriptor is ignored and the resolved value is not subscripted.

Subscriptors with two numeric parameters {index,length}

TBD: Note that, in all uses of subscriptors, a positive index is counted left to right, where the first sub-entity is indexed as one, when the index is negative, the index is counted right to left, with the last entity (character, word, field, node, etc,) being indexed by -1. This form is used only to provide a substring function for strings, and currently has no implementation for any other Tags type. A negative or zero length is treated as if it was absent, and the effect of the subscriptor reverts to that of a subscriptor with a single parameter..

Examples of using the subscriptor notation:

$#text pres(,Lincoln,Abraham,Springfield,Illinois)

$# accessing the fields of a string as a string record
$#text city($?pres{3}$$) sets city to "Springfield"
$#text first($?pres{f-3}$$) sets first to "Abraham"

$# accessing the characters (substrings) of a string as a collection of characters
$#text last($?pres{c2,7}$$) sets last to "Lincoln"
$#text state($?pres{c-8,8}$$) sets state to "Illinois"
$#text comma($?pres{c1,1}$$) sets comma to the first comma 

$#set record(,$!@xyz$$) note the leading comma
$#set field($?record{5}$$) sets field to the value of the fifth field in the xyz attribute

$#txt a(this tests tags substring stuff)
$#txt b($?a{f4}$$) yields "ags subs"
$#txt c($?a{5}$$) yields "ring s" (defaults to field)
$#txt d($?a{w2}$$) yields "tests"
$#txt e($?a{c12,4}$$) yields "tags"

Tags Variables

Tags supports variables that can be referenced and assigned values. Each variable has a name and a value. Unless it violates some other Tags rule, any alphanumeric string can be a variable name. Values may be of any Tags type (as described in the next section), or they may be empty. Variable names are case-sensitive. You set the value of a variable using one of several Tags commands, and you obtain the value by using the $?varname$$ referencer form.

Tags provides several variables that contain information about the processing environment of the script. For example, the command-line parameters are available as variables whose names are the numbers corresponding to the positions of the parameters that they contain. For example, the first parameter is available in the variable referenced as "$?1$$", the second parameter is available in the "$?2$$" variable, and so on. In the example command line given in the introduction, $?0$$ contains "Tags", and $?1$$ contains "help.xml".

Tags also allows you to access the command-line flag-parameters (annotated in the command-line using the form -letter{letter}). Examples of command-line flag-parameters are -D, -C, -a, etc. Flag-parameters are preserved as Tags variables having the letter as both their name and their value. The names are always capitalized, regardless whether the flag-parameter is or not. Variables named "$?a$$" and "$?A$$" are different variables, and only the second could represent a flag-parameter. The flag-parameter variables make it easy for the user to communicate special conditions to the script. By the way, notice that there is no provision for referencing numeric flag parameters as Tags variables.

You can also reference an environment variable by appending its name to "env.". If you reference an environment variable, such as PATH, as a variable (e.g., as in $?env.path$$), the value of the environment variable is returned. Tags does not currently change the values of environment variables, it only allows you to access their values in your script. This might change.

Tags pre-defines a number of variables to provide a means of communicating between the Tags interpreter and your Tags script. Some of these variables are associated with specific Tags commands. But there are several which have meaningful values for the duration of the execution of a script. Following is a list of Tags variables that have special meaning in the Tags language:

columns, getColumns, lineColumns, SQLColumns, regXColumns

Used by various variants of the forEach command to parse the forEach input into fields.

command line flags

Command line flags are referenced by their letter value using the notation $?x$$, where x is the actual upper-case letter value of the flag. Tags interprets any command-line parameter that is immediately preceeded by either a minus sign or a slash as a command line flag group. Each letter in the group is a flag. Only letters can be used as flags in Tags. The value of a flag variable is the name of the variable. For example, if you code -AbC on the command line, Tags will create three variables called $?A$$, $?B$$, and $?C$$, with respective values of "A", "B", and "C".

command line parameters

Command line parameters are referenced by their position using the notation $?n$$, where n is the index of the parameter in question. The first parameter is indexed as one. Parameters are always strings. Command line flags as described above are not counted and are handled in their own way.

contextNode

Used by XPath references to identify the default root of an XPath search (string). Set by Tags during initialization to reference the root element of your Tags script. You set it according to need. Tags provides an enhanced form for an XPath referencer expression that allows you to use any variable as the context node for the expression. The form is $@var!xpath$$. Note that $@contextNode!expression$$ is the same as $!expression$$. The variable should contain an XML node.

currentPath

Contains the absolute path to the current directory. Tags sets this to the directory from which you are running your Tags script.

date and time

Tags provides date and time information to your Tags script through several variables, which are updated before the interpreter processes each script command. $?time$$ (string - format is hh:mm:ss), $?day$$ (number - day of the month), $?dayOfWeek$$ (string - name of the week day), $?dayOfYear$$ (number - Julian day), $?month$$ (number - month of the year), $?monthName$$ (string - name of the month), and $?year$$ (number - all four digits).

dsn

ODBC data source name used by SQL interface (string). You must set this before using the SQL-variant of the $#forEach command.

empty

Convenience variable set by Tags to contain absolutely nothing. Use it to clear other variables to empty as in $#set var($?empty$$).

environment variables

Variables whose name starts with "env." is interpreted as an environment variable, and Tags will attempt to return the value of the corresponding environment variable, if defined. Otherwise, the value of the referencer is empty. Note that you cannot change the value of an environment variable in a Tags script. Note that, unlike other Tags variables, environment variable names are not case sensitive. For example, reference the Path environment variable as $?env.path$$.

error

Set by Tags as the result of the $#exec and $#open commands. It contains the value returned by the executed program.

file variables

$?fileDrive$$ (drive:), $?fileName$$, $?filePath$$ (path\), $?fileInfo$$. These variables are set by the file-variant of the $#forEach command, which is described later in this document.

HTTP variables

TBD: $?HTTPHeaders$$ and $?HTTPResponseHeaders$$.

grep

Set this with a regular expression before using a $#forEach command to provide a filter in selecting objects to present in the $?forEach$$ variable. It is not required for proper forEach operation, but it can improve the performance of your script in many cases. Even when you set the variable outside the $#forEach loop, it appears empty inside the loop. But, once set, it retains its value outside the loop. This means that, unless you change its value, yourself, it will have the same value for two consequtive $#forEach loops, which might not be what you want. So you should set it or clear it as needed before each loop. $#forEach variants that apply the grep variable are the Field, Line, Lineb, Str, Strb, and the default variants. Regular expressions in Tags are compatible with the rules of Perl 5, and are implemented using the PCRE software.

last

Set by the $#forEach command to the index of the last object in the object set being processed by the command (number). The value is not known in some variants of the $#forEach command, and is set to zero in those cases.

output

Container in which Tags collects output text that is otherwise undirected, and is automatically dumped to the standard output if not otherwise used. You can clear the output using the $#set output() command.

password

ODBC password used by the SQL interface (string). Not all database accesses require this, but when they do, you must set the value before using the SQL-variant of the $#forEach command.

position

Set by the $#forEach command to the index of the current forEach value (number). The first object is indexed as one.

regex variables

After performing a $#match or regex version of an $#if or $#ifn, a set of variables contain the matching substrings. The $?regXCount$$ variable specifies the number of matched substrings, and the $?regXi$$ variables contain the matched substrings; e.g., the third matched substring is in the variable named $?regX3$$ while the original matched string is in the variable named $?regX0$$. TBD: $?regXColumns$$.

script

Set by Tags during initialization to the root of the Tags script (XPath node). Use this to implement subroutines by writing XPath expressions referencing other elements within the same Tags script, as in the following example. $@script!/myscript/mysubroutine/text()$$.

sqlcolumns

Set this before using the SQL-version of the $#forEach command to define the fields of the record set you expect to obtain via your select statement. Alternately, set it to empty before using the $#forEach command to obtain the column names from the database as part of the SELECT request. See also the ODBC example given earlier.

tab

Convenience variable set by Tags to contain the tab character 0x09 (\t) for general use.

tagsPath

Set by Tags to the tagsPath environment variable if present. Otherwise set to the path of the Tags executable (string).

userName

ODBC user name by the SQL interface (string). Not all database accesses require this, but when they do, you must set the value before using the SQL-variant of the $#forEach command.

You should remember that, except for environment variables, your script can set the value of any variable, and you can lose valuable information by overwriting the values of certain variables. For example, you will lose the value of the variable $?script$$ by setting it to some other value. On the other hand, you may well overwrite the value of $?contextNode$$ frequently when you are using XPath expressions.

Properties of Variables

When a Tags variable is defined, it has a value, and it also has three additional properties that can be ascertained using the $?#, $?%, and $?? prefixes. Note that these are the regular $? prefix with an additional #, % and ? appended, respectively.

Number of Fields

Use the $?# prefix to obtain the number of fields in the contents of the specified variable. Note that this value only makes sense when the variable contains a string.

Length

Use the $?% prefix to obtain the length of the contents of a specified variable. For a string, it returns the number of characters. For a string list such as output, for example, it returns the number of lines (strings), and for a node list, it returns the number of nodes.

Type

Use the $?? prefix to obtain the type of the contents of the specified variable. There are a number of types that Tags values might have. Here is a list of the types along with the meaning of the length property for that type in parentheses.

string (number of characters)
number (number of digits)
string_list (number of strings)
node_list (number of nodes)
element_node (1)
attribute_node (1)
text_node (1)
cdata_section_node (1)
entity_reference_node (1)
entity_node (1)
processing_instruction_node(1)
comment_node (1)
document_node(1)
document_fragment_node (1)
notation_node (1)

Value types must be compatible with the context. An XPath node or XPath nodeset value resulting from the resolution of a Tags referencer discovered in ordinary text is converted to text, or may be an error. String type values are acceptable everywhere. When XPath expressions obtain boolean or numeric values, Tags converts them to strings.

$#set x( this is a string)
$#set t($??x$$) t is set to "string"
$#set f($?#x$$) f is set to 4
$#set i($?%x$$) i is set to 17

After the four $#set commands are processed, x contains " this is a string", t contains "string", f contains the number of fields in x, which is four, and i contains the length of x, which is 17. If x was set to a node list, its length is taken as the number of nodes in the list and the number of fields is set to zero. If x was set to a string list, its length is defined as the number of strings in the list. And so on.

Actions

Two additional marks can be used to tidy up the resolved value of a string-type variable reference: The first is the trim action (-) the minus-sign, which removes leading and trailing spaces, if any, and the second is the trim-and-capitalize action (+) the plus-sign, which trims and capitalizes the first letter of the resolved value.

$#set x(    this is a string    )

$#set trimmed($?-x$$) trimmed is set to "this is a string"
$#set capped($?+x$$) capped is set to "This is a string"

Commands

Here are some general comments about Tags commands.

A Tags command may be coded virtually anywhere within the text of the script, but must be the sole occupant of the text line. Tags commands have the following form:

$#commandName argument1 (argument2 ) commentable area to the end of the line

Unlike variable names, the commandName is not case-sensitive. While all commands have a commandName , not all commands have argument1 and argument2, and no command has argument1 without having argument2. In all commands that have argument2, the parentheses are required.

In most cases where it is used, argument1 is processed differently than argument2 , Argument1 is usually resolved to a string, while argument2 is resolved only as far as needed. On the other hand, argument2 can resolve to a nodelist, or a SQL result set in the forEach command, for example. This should be fairly intuitive in each case. (yeah right - I'll try to clarify this more as I work more on the manual.)

When Tags parses a command, it must be able to isolate the two arguments. This can occasionally conflict with the characters that the two arguments must use. Specifically, Tags uses the following characters to parse a command:

quote ("""),
apostrophe ("'"),
open parenthesis ("("), and
close parenthesis (")")

If these characters are paired within the command arguments, then Tags should have no trouble. But if they are not paired, Tags will fail to understand the command. You can help Tags out by "hiding" unmatched characters by immediately preceeding the characters with the backward-apostrophe (`) up by the tilde (~). (By the way, it is harmless, though unnecessary, to hide any character in a command argument in this way.)

Here is an example:

$#match $?s$$(.*() will fail to parse, but
$#match $?s$$(.*`() will work fine

There are three basic categories of commands:

Conditional commands
The forEach command
Additional commands

Conditional commands perform the same function they do in any scripting or programming language, they let the script make decisions, and vary its behaviour according to the conditions it encounters.

The forEach command provides the ability to repeat specified functionality over a set of objects, such as nodes in a nodeset, text lines in a file, inputs from a user, fields in a text record, etc.

A number of commands that I don't categorize further fall into the additional commands group. These include several debugging commands, an output director command, several variable setters and a variable loader, an include command, and a number of others. A bit of a hodge-podge.

Conditional Commands

Tags provides a set of commands that conditionally control the inclusion or exclusion of text and/or other commands.

$#if (expression)

Is false if the expression evaluates to false , and is true otherwise.

$#ifn (expression)

Is true if the expression evaluates to either empty, to the value zero (0), or to the string "false" (case ignored), and is false otherwise.

$#elif (expression)

Is false if the expression evaluates to empty, to the value zero (0), or to the string "false" (case ignored), or if a previous conditional command was true, and is true otherwise.

$#elifn (expression)

Is false if the expression evaluates to non-empty, is not the value zero (0) and is not the string "false" (case ignored), or if a previous conditional command was true. Is true otherwise.

$#else

Is false if a previous conditional command was true, and is true otherwise.

$#end

Required to terminate a conditional command sequence. Also required to terminate a forEach command, discussed below.

Expressions must resolve to strings to be properly evaluated. Tags automatically converts XPath boolean and numeric results into strings, so boolean true and false are converted to their string equivalents. XPath and variable expression results that are nodes or nodesets are converted into strings before they are evaluated according to these rules.

These expression values are recognized as false:

the value is empty
the value is zero (0)
the value is "false"
the value is "off"
the value is "no".

All other values are taken as true .

Examples:

$#if ($!$?position$$ = $?last$$$$)
  "$!text()$$",
$#else
  "$!text()$$"
$#end

This example shows an $#if-command, which might be coded within a $#forEach loop, and is a test to determine if the last object is being processed to decide whether to terminate the line with a comma. The $#forEach command is explained in some detail below.

$#if ($?A$$)
   do something big deal here...
$#end

The second example tests to determine if the command-line flag A is present by testing if the variable, named "A", contains a value other than empty.

Additional commands that depend on boolean values also evaluate expressions according to the same rules as the conditional commands.

Regular Expressions

Scripting in Tags sometimes requires the need for regular expressions. Four of the conditional commands have additional forms that support the use of regular expressions in decision making.

$#if string(regular-expression)

Is true if the string matches the regular-expression, and is false otherwise. If true, subsequent $#elif and $#elifn statements are ignored.

$#ifn string(regular-expression)

Is true if the string does not match the regular-expression, and is false otherwise. If true, subsequent $#elif and $#elifn statements are ignored.

$#elif string(regular-expression)

If evaluated, is true if the string matches the regular-expression, and is false otherwise. If evaluated and true, subsequent $#elif and $#elifn statements are ignored.

$#elifn string(regular-expression)

If evaluated, is true if the string does not match the regular-expression, and is false otherwise. If evaluated and true, subsequent $#elif and $#elifn statements are ignored.

Each conditional command matches the regular-expression with the string. (Note that it MUST be a string. Anything else will fail.) If the regular-expression matches the string, and it contains sub-match expressions (i.e., expressions coded within parentheses in the regular expression), Tags sets variables to the matched portions of the string. These variables have names that correspond to the positions of the sub-match expressions within the regular-expression. The sub-match variable names have the form $?regXi$$, wherei is the index of the sub-match expression that corresponds to the variables.

Here is an example:

$#if $?date$$(^([0-9]*)-([0-9]*)-([0-9]*) ([0-9]*):([0-9]*):([0-9]*))
  $#set date($?regX2$$/$?regX3$$/$?regX1$$ $?regX4$$:$?regX5$$:$?regX6$$)
$#end

This fragment reformats a date from y-m-d h:m:s to m/d/y h:m:s. (Just a reminder: Note that the parentheses are all paired in this example, so that Tags can find the beginning of the expression by matching the pairs. If the parentheses do not match, you must use the back-quote character (`) to escape the unmatched parentheses.) If the value of the date variable is "2005-12-09 14:21:15", then the match generates the following six sub-match variables:

$?regX1$$ = 2005
$?regX2$$ = 12
$?regX3$$ = 09
$?regX4$$ = 14
$?regX5$$ = 21
$?regX6$$ = 15

One additional variable, called $?regXCount$$ , contains the number of sub-matched expressions. In the example, its value is six.

If a conditional command evaluates to false, the $?regXCount$$ variable is set to zero. If a conditional command results in fewer sub-match variables than the last match, only the variables for the sub-matches of the latest match survive. Sub-match variables are not managed in any other way.

This is not an explanation for regular expressions. You can find out more by following this link.

See also the $#match command, which is described later.

The $#forEach and the $#forElse Commands

$#forEach type(argument)

Processes the commands and text that fall between the $#forEach command and its matching $#end command once for each object identified in the $#forEach argument. For each object, the variable, $?forEach$$, is set to contain the object to allow the text within the loop to reference the object, while the $?position$$ variable is set to its index. Note that Tags handles $#forEach nesting so that the $?forEach$$ variable is maintained according to its context.

You can use the $?forEach command in one of two ways as shown below:

$#forEach type(argument)
...commands...
$#end

or

$#forEach type(argument)
...commands...
$#forElse
...commands...
$#end

The $?forElse command permits to perform commands if the $#forEach argument comes up empty-handed.

While processing the $#forEach loop, the variable $?last$$ is set to the index of the last object to be processed in the loop .

The type can be either empty or it can be one of the types listed below. If empty, the $#forEach argument should resolve into zero or more text lines (each line is taken as a$#forEach object.) If it is not a list of strings, it is converted to a list of strings. If, for example, it is a list of nodes, each node is converted into a string according to the rules of XPath.

Char

If the type field specifies "char", then the $#forEach argument should resolve to a string. The $#forEach logic performs the loop once for each character in the argument, setting the $?forEach$$ variable to each character in turn.

Count

If the type field specifies "count", then the $#forEach argument must resolve into a number. The $#forEach logic performs the loop once for each value from one to the argument value, incrementing by one for each pass.

Field

If the type field specifies "field", then the $#forEach argument should resolve to a string record, with its first character identifying the field separator character. The $#forEach logic loops for each field in the argument, setting the $?forEach$$ variable to each field in turn.

File

If the type field specifies "file", then the $#forEach argument should resolve to a string record having the form, |directory|mask|type, where directory is the path to the directory of interest, mask is a filename expression, and type may be any "sum" of dir, tree, data, or any. Combine them using the plus-sign (+). The mask expression can use the plus-sign (+) and the minus-sign (-) to include or exclude ambiguous or absolute file names. E.G., *.cpp+*.h-s* includes cpp files and header files except those that start with the letter s. The $?forEach$$ variable contains the full pathname of each file that the forEach command finds per each iteration..

Get

If the type field specifies "get", then the $#forEach argument should resolve to a prompt string that is displayed in the console window. User input is accepted, and when the user presses the Enter-key, the $#forEach loop is performed. During the pass, the user response is available in the$?forEach$$ variable. The $#forEach loop is terminated when the user presses the Esc-key.

Line, Lineb

If the type field specifies "line" or "lineb", then the $#forEach argument resolve to a file name. The $?forEach$$ variable contains the text of each consequtive text line in the specified file. Because of the relative complexity of this $#forEach option, it is discussed in more detail under its own heading below.

Node

If the type field specifies "Node", then the $#forEach argument must resolve to a node list, and the $#forEach loop is performed once for each node in the node list. The $?forEach$$ variable will contain each node in turn.

SQL

If the type field specifies "SQL", then the $#forEach argument must resolve to a SQL query, which is performed against the DSN named in the $?dsn$$ variable. The $#forEach loop is performed once for each row in the result set of the query, with the $?forEach$$ variable containing each row in turn. Because of the relative complexity of this $#forEach option, it is discussed in more detail under its own heading below.

Str, Strb

If the type field specifies "str" or "strb", then the $#forEach argument should resolve to the name of a list of strings. The $#forEach logic performs the loop once for each string in the argument, setting the $?forEach$$ variable to each string in turn. If the type field specifies"strb", empty strings are ignored. The str specification differs from the line specification by expecting a list argument instead of a file argument. Except for that difference, the discussion below about lines applies equally to the string specification.

Word

If the type field specifies "word", then the $#forEach argument should resolve to a string. The $#forEach logic performs the loop once for each word in the argument, setting the $?forEach$$ variable to each word in turn. Note that special characters that might trail a word are trimmed.

XML

If the type field specifies "XML", then the $#forEach argument must resolve to a file containing a list of one or more well-formed XML documents. The $#forEach loop is performed once for each XML document in the file, with the type of the $?forEach$$ variable being "document_node". The $?forEach$$ variable can be accessed using XPath expressions.

Variables Associated with the forEach Command

These variables have a special relationship with the $#forEach command. As the command initializes, it saves the value of the variables, and restores their values at the end of the loop. Note that some variables are inputs to the $#forEach command while others are output by the $#forEach command. And at least one (SQLColumns) can have a value going in and/or have a value coming out.

columns, getColumns, lineColumns, SQLColumns

Set these variables to cause the $#forEach command to parse the value of the forEach variable into a set of variables containing its fields. If the columns variable is not empty, the parse is applied whenever the forEach variable is a string. This can happen for the input-type, the SQL-type, and for the default-type of the $#forEach command. For all types, the format of the .columnsvariable can have one of the following two forms:

1. ,name1,name2,...,nameN
2. ,name1{size1},name2{size2},...,nameN{sizeN}

Use the first form when the forEach value is a string record, and use the second form if the forEach value is a record comprised of a set of fixed-length fields. If a name is omitted, the field is skipped and no variable is created for that field. While the forms shown above use the comma as the field delimiter, any special character is acceptable.

In the first form, if the forEach value is not a proper string record, i.e., does not start with a non-alphanumeric character, the field delimiter of the columns variable is assumed to be appropriate for the forEach value as well.

columns is used by the Str/Strb and the anonymous types of forEach.
getColumns is used by the Get type,
lineColumns is used by the Line/Lineb type.
SQLColumns is used by the SQL type. If you do not preset it (i.e., you leave/set it empty) before issuing a $#forEach SQL command, it will be filled with the names of the columns selected by the forEach command.

forEach

Variable set by the $#forEach command to contain each object, in turn, that is contained in the forEach argument. For example, if the forEach argument is a nodeset, then the $?forEach$$ variable will contain a node. When Tags begins, it initializes $?forEach$$ to reference the script text.

position

Variable set to the index of the current object processed by the $#forEach command. (the position of the first object is one, the second object is two, etc.) When Tags begins, it initializes $?position$$ to zero.

last

Variable set to the index of the last object processed by the $#forEach command. This variable is not valid during an input -type or SQL-type $#forEach loop. When Tags begins, it initializes $?last$$ to zero.

contextNode

Unless you use the $@var!xpathExpression$$ form, you must set this variable before using any XPath expression to search an XML document. When Tags begins, It initializes $?contextNode$$ to reference the script document node.

Example:

Here is an example using some of the variables provided by the $#forEach command:

$#forEach ($!//event$$)
  $#set contextNode($?forEach$$)
  $#if ($!$?position$$ =$?last$$$$)
&quot;$!text()$$&quot;,
  $#else
&quot;$!@name$$&quot;
  $#end
$#end
$# At this point, after the above forEach command is
$# processed, the value of both the forEach and
$# the contextNode variables revert to the values held before
$# the forEach command was encountered.

Here, the XPath expression "@name " is to be applied to each of the <event> elements in the script document. In this example, the script writer has set the $?contextNode$$ variable to let Tags know where to look for the text() and name="" attribute by setting the $?contextNode$$ variable to contain the current <event> element object. Note that the $?contextNode$$ variable is not set automatically.

The values of the $?forEach$$ ,$?position$$, $?last$$, and $?contextNode$$ variables are saved before processing a $#forEach loop, and, at the completion of the $#forEach$$ loop, are reset to their saved values. Note that while you generally would not $#set the $?forEach$$, $?position$$ , and $?last$$ variables, you should $#set the $?contextNode$$ variable to control the context of your XPath search expressions within the $#forEach context.

Another example:

Assuming that the following Tags script is stored in a file, called letter.xml, it can be processed with the following command line:

> tags letter.xml >letter.txt

Tags script in the file, letter.xml:

<letter script="/letter/body/text()">
<body>
$#!/letter/data/salute/text()$$$\j
$!/letter/data/firstname/text()$$$\j
$!/letter/data/lastname/text()$$
$!/letter/data/street/text()$$
$!/letter/data/city/text()$$,$\j
$!/letter/data/state/text()$$$\j
Dear $/letter/data/salute/text()$$:

I am looking for fresh wood for my sawmill. I am especially
looking for Eastern hardwoods. Do you have any on hand? I will
be happy to remove it and pay you a fair price for the opportunity.

Sincerely,
Paul B.
</body>
<data>
<salute>Mr</salute>
<firstname>George</firstname>
<lastname>Washington</lastname>
<street>123 Cherry Lane</street>
<city>Mt Vernon</city>
<state>Virginia</state>
</data>
</letter>

There are several variables associated with the SQL Query interface, which are discussed in the next section.

Using the forEach File Interface

The form of the forEach argument is

|directoryName|fileMask|searchType

The directory name can be any ambiguous or non-ambiguous path given the value of the $?currentPath$$ variable. The file mask can be a logical expression comprised of ambiguous and non-ambiguous file names concatenated with either the plus sign (implements union) or the minus sign (implements difference). The valid searchTypes can be one from the set { root | tree } and one from the set { data | dir | any } where the defaults are root and data.

The variables that the $#forEach command sets are

$?fileInfo$$ is a string record having the form

|fileName|createDate|createTime|createSecs|modificationDate|modificationTime|modificationSecs|size|"dir" or "data"

$?fileDrive$$ is the drive letter followed by a colon,

$?filePath$$ is the path followed by a forward-slash, and

$?fileName$$ is the file name and extension, if any.

Note that file paths can use the forward-slash or backward-slash.

Using the forEach Line/Lineb Interface

The Tags interpreter opens the specified file, and then performs the loop once for each text object (line) it finds in the file. The$?position$$ variable is incremented to reflect which object is being processed. Since the number of objects within the file is not known during the loop, the $?last$$ variable is not valid.

You can use the Lineb variant to ignore blank text lines.

There are two kinds of text objects that Tags recognizes: XML elements, and simple lines of text terminated by either a newline or a return, or both, in any combination.

If the first non-whitespace character in a line is a "<", then the object is assumed to be a valid XML element. The Tags interpreter locates the end-tag for the element, and then loads the element into a DOM and stores its reference in $?forEach$$. If Tags is unable to load the document into the DOM, then Tags quits with prejudice.

If the first non-whitespace character is not a "<", then the text line is read and loaded into $?forEach$$. Tags can handle text lines as long as 4095 characters. Any longer than that, and Tags terminates. This variant of the $#forEach command provides the ability to convert each text line into a set of variables through the use of the columns variable, which is discussed in the next section.

When all objects in the file have been processed, the file is closed.

Example:

The file input.txt contains a list of numbers followed by names that are associated with the numbers. Here are a few lines from that file:

0,UNKNOWN
1,CREATE TABLE
2,INSERT
3,SELECT

Suppose the problem is to reformat each line so that the output looks like this:

<map value="0" name="UNKNOWN"/>
<map value="1" name="CREATE TABLE"/>
<map value="2" name="INSERT"/>
<map value="3" name="SELECT"/>

The reform script in the file input.xml uses the input type of the $#forEach command to accomplish this:

<reform><![CDATA[$\j
$#forEach line(input.txt)
  $#text line(,$?forEach$$)
<map value="$?line{1}$$" name="$?line{2}$$">
$#end
]]></reform>

This example uses subscripting to obtain the individual fields in each line. Notice the comma in the $#text command. The comma is combined with the text line to form a value of, for example, ",1,CREATE TABLE ", which is stored in the line variable. The leading comma informs the Tags parser that the fields are separated by commas.

These files are included in this release. Use the following command line to run this example:

> Tags input.xml >map.xml

Using the forEach SQL Query Interface

Tags provides a SQL query interface through the SQL variant of the $#forEach command. For example, assuming that there is an accessable dataset, called name-and-address, on your computer, the following $#forEach command implements a simple query to that table:

$#forEach SQL(select name, street, city, state, zipcode from name-and-address)
..etc
$#end

Note that the SQL argument is simply a SQL Select statement.

Generally, the result of a SQL query is what is called a result-set: a set of rows (records) that satisfy the query. Tags repeats the forEach loop once for each row in the result-set, setting the $?forEach$$ variable to each row in the result-set, in turn.

By itself, the $#forEach command given above does not provide enough information to perform the query. The ODBC system requires additional information, such as the name of the database in which the name-and-address table resides, the name of the server computer, and the name of the ODBC interface driver needed to interface to the specific database server.

To communicate this information, the ODBC interface provides an encapsulation object, called a DSN, or Data Service Name, which is maintained by the system as a Registry key, and its associated entries in the Registry at HKEY_LOCAL_MACHINE/ SOFTWARE/ ODBC/ ODBC.INI/ dsnkey; where dsnkey is the name of the DSN. (Use your Registry Editor to examine some DSNs, but be careful not to make any changes to the Registry unless you know what you are doing - standard warning) These entries usually identify the database name, the server name, and the ODBC driver name. Depending on the type of database, other information may be stored there as well.

While there are several ways to create a DSN, the easiest is by using the ODBC Data Source Administrator tool at Start/Settings/Control Panel/Administrative Tools/Data Sources (ODBC) . This tool is available in all 32-bit and 64-bit Windows operating systems, as far as I know.

Many database servers require that a query is accompanied by a username and a password, which the database administrator sets up beforehand, though not all database interfaces require a username and a password.

Be that as it may, the Tags SQL Query implementation needs this additional information to pass on to the ODBC interface. You provide the information to Tags before the $#forEach SQL command through specific Tags variables. These variables are named as follows:

$?dsn$$
$?username$$
$?password$$

The $?dsn$$ variable is always required, but, depending on the specific ODBC interface, the $?username$$ and $?password$$ may not be required. For example, generally they are required if you are querying a Microsoft SQL Server, MySQL or Oracle database, but are not likely to be required if you are querying a FoxPro table.

As I mentioned earlier, the result of a successful query is a result-set, and Tags presents each row in the result-set as a delimited string in the $?forEach$$ variable. To access the specific columns (fields) in the row (record) contained in the $?forEach$$ variable, you can use the subscripting feature as in the following example:

<Tags script="/Tags/text()">
 $#set dsn(Tags-customer-dsn)
 $#forEach SQL(select * from cust)
$?forEach{1}$$, $?forEach{2}$$, $?forEach{3}$$, $?forEach{4}$$, (and so on)
$#end
</Tags>

In this example, the code assumes that a DSN, called Tags-customer-dsn, exists in the Registry.

Tags provides another way of identifying the columns of the row that does not use the subscripting method. You can provide the column names as a string record in a Tags variable, called SQLColumns . Tags not only places the column values in the forEach variable, it also places the values into variables named in the SQLColumns variable. Here is an example where the programmer has set the SQLColumns variable:

<Tags script="/Tags/text()">
 $#set dsn(Tags-customer-dsn)
 $#set SQLColumns(,CustNo,Name,Street,City,State/Prov,ZipCode,Country,Phone)
 $#forEach SQL(select * from cust)
$?CustNo$$, $?Name$$, $?Street$$, $?City$$, $?State/Prov$$, $?ZipCode$$, $?Country$$, $?Phone$$
 $#end
</Tags>

And, if you pre-clear the SQLColumns variable, Tags will obtain the column names from the SQL interface, and store them as a string record in the SQLColumns variable. If the SQL interface has no name for the column, Tags substitutes a fill-in name of "SQLCol1" for the first missing name, "SQLCol2" for the second, and so on. You can always set the SQLColumns variable to empty ($#set SQLColumns()), and then access it within the $#forEach loop to find out the names, yourself. This behaviour is somewhat different than that described for the SQLColumns variable in the earlier section describing the variables associated with the $#forEach command.

The while Commands

while (expression), while string(regularexpression)

Repeats the Tags script between the $#while command and the terminating $#end command while the expression (or regular expression) is true.

whilen

Repeats the Tags script between the $#whilen command and the terminating $#end command while the expression (or regular expression) is not true.

last (expression), last string(regularexpression)

Terminates a $#while or $#forEach loop if the expression (or regular expression) is true.

lastn (expression), lastn string(regularexpression)

Terminates a $#while or $#forEach loop if the expression (or regular expression) is false.

next (expression), next string(regularexpression)

Skips the remainder of the Tags script before the terminating $#end command if the expression (or regular expression) is true.

nextn (expression), nextn string(regularexpression)

Skips the remainder of the Tags script before the terminating $#end command if the expression (or regular expression) is false.

Additional Commands

$#add name(value), $#adds name(value), $#addt name(value), $#addu name(value)

Inserts/Appends a string or string list to the named string list. If the specified variable is not a string list, it is converted to one before the value of the expression is appended to it. The suffixes -s, -t, and -u allow to insert the value in its sorted position (s), at the top of the list (t), or inserted uniquely in its sorted position (u). In the latter case, the value is added only if that value is not already in the list. If a string list is being inserted as sorted or unique, each item in the string list is inserted according to the add suffix. Otherwise, the entire list is inserted as a group. If the resolved value is empty, no add takes place.

$#console (value)

Opens the console window if the value is true, closes the console window if the value is false. This may not be available in some versions of Tags.

$#debug (value)

Activates or deactivates the debug facilities of Tags according to the specified value . If the value is false, debugging is turned off. Otherwise, debugging is turned on. Use this script command instead of the -Z command-line flag for debugging a short section of script.

When debugging is on, the contents of a variable, when being loaded by the $#in command, is written to a file. The name of the file is set to the name of the variable followed by ".dbg".

Also, the XML element or text line read by each iteration of the input type of $#forEach command is written to a text file, called "nextelement.dbg " or "nextline.dbg", respectively. Use the $#pause command to examine these files between iterations.

$#defer name(value)

Sets the variable specified by name to the unresolved form of the specified value. E.g., if the value is a string containing a referencer, the string is stored in the variable without resolving the referencer. If the variable is subsequently referenced, its contents will be resolved at that time.

$#drop (name)

Removes the variable specified by name from the system, releasing any resources the variable may own, such as its contents. You don't usually need to use this command, since the system manages its resources automatically, but it can be used to improve memory usage when variables containing very large files are no longer needed. This might only be an issue in advanced circumstances.

$#exec wait(command-line)

Sends the specified command-line to the operating system for execution. If the word "wait" is present, Tags waits for its completion. otherwise it does not.

$#get name(prompt)

Asks the user to enter a value to assign to the variable having the specified name . If the console window was not specified from the command-line (using the -W flag-parameter), nor by the $#console command, the console window is opened, the prompt is displayed, and the user may enter a response, followed by the Enter key. The console window is left open.

$#in name(file-name), $#inb name(file-name), $#inp name(file-name)

If the file-name is a URL (starts with "http://"), Tags loads the document from the internet. Otherwise, it loads the local document identified by the file-name into the variable having the specified name. If the file-name does not specifiy a path, the TagsPath environment variable is used to determine the directories to search for the file, if the variable is present. Otherwise, the path of the Tags command you specified in the command line is used. If you did not specify a path, then Tags will look for the file in the current directory.

If the file extension is "htm" or "html", then the file is assumed to be an HTML file. Otherwise, if the first non-blank character in the file is a "<", then the file is assumed to be a well-formed XML document. If the document is determined to be an HTML document, then it is "tidied up" to make it well-formed in the XML sense before it is loaded into a DOM, since the DOM can only handled well-formed documents. An XML document is loaded into a DOM straight away.

At the completion of the $#in command, the named variable will contain the document node representing the parsed document. If Tags is unable to load the document into a DOM, then Tags quits with prejudice.

If the first non-blank character is not a "<", then the file is assumed to be a simple text (or binary) file and it is loaded into the named variable as simple text. If the file is binary, you cannot manipulate it with Tags.

Note: if the XML file is local, and if it contains $#include commands, these are resolved as explained in the $#include description above before the document is parsed.

the $#inp form uses the contents of the $?HTTPHeaders$$ variable to send an HTTP POST request to the specified URL. You must set up the contents of the variable before using this command. (Hmmm... Seems to me I overlooked the data part of the Post. Need to check that out.)

$#include (file-name)

Tags processes the $#include command as a script or XML data object is loaded. Immediately after loading the script document specified as the first command-line parameter, Tags processes all $#include commands embedded in the document. Because of this, the file-name can only reference variables that contain command-line parameters.

Be sure that included files do not affect the well-formedness of the XML document when they are inserted into the script document at the include points. After all includes have been performed, the document must still be well-formed XML.

****************************************************************************************

Included files can also contain $#include commands. Be careful about circular references: If some file, say file-A, includes another file, say file-B, and file-B includes file-A, you have an infinite loop. They are not detected by Tags, and will cause the program to run until it fills up memory and crashes. If the full include file path is not specified, Tags looks in the directory where the script file was found.

$#include commands are also detected and processed by the $#in command, but are not handled by the $#forEach command.

$#match string(regular-expression)

Matches the regular-expression with the string . If the regular-expression matches the string, and it contains sub-match expressions (i.e., expressions within parentheses), Tags sets variables to the matched portions of the string. These variables have names that correspond to the positions of the sub-match expressions within the regular-expression. The sub-match variable names have the form $?regXi$$, wherei is the index of the sub-match expression that corresponds to the variables.

Here is an example

 $#set string(2005-12-09 14:21:15)
 $#set expression(^([0-9]*)-([0-9]*)-([0-9]*) ([0-9]*):([0-9]*):([0-9]*))
 $#match $?string$$($?expression$$)

This match generates six sub-match variables:

$?regX1$$ = 2005
$?regX2$$ = 12
$?regX3$$ = 09
$?regX4$$ = 14
$?regX5$$ = 21
$?regX6$$ = 15

One additional variable, called $?regXCount$$ , contains the number of sub-matched expressions. In the example, its value is six.

If a $#match command fails, the $?regXCount$$ variable is set to zero. If a $#match command results in fewer sub-match variables than the last previous $#match command, only the variables for the sub-matches of the latest $#match command are valid. Sub-match variables are not managed in any other way.

This is not an explanation for regular expressions. You can find out more by following this link.

$#message (message), $#msg (message)

TBD

$#open wait(file-name)

Obtains the name of the program that is set by the operating system to open the type of the specified file-name, and then executes that program passing the file-name as the sole parameter. If the word "wait" is present, Tags waits for its completion. otherwise it does not.

$#out name(file-name), $#outa name(file-name)

Writes/Appends the contents of the named variable to the file to the specified file-name. Line management commands are processed at this time (line management commands are described in a later section of this manual). If a Tags script contains no $#out command, the accumulated results of the Tags script are written to the file Tags.txt at the end of the run. If the variable is not identified (no name specified), then the accumulated results of the processing of the Tags script are written/appended to the specified file.

$#pause (message)

After displaying the pause message in the console window, if it is open (the -W command line flag, or the $#console command), Tags waits for you to press a key before continuing. The pause command is ignored if the console window is not open.

$#play (wave-file-name)

Plays the wave-file specified by the wave-file-name. If the wave-file-name does not specify the path, the TagsPath environment variable is used to determine the directories to search for the file if available. Otherwise, the path of the Tags command you specified in the command line is used. If you did not specify a path, then Tags will look for the wave-file-name in the current directory.

$#push, $#pop

Pushes and pops variables on and off a stack variable. Any variable can be used as a stack without affecting its contents. If the stack variable does not already exist it is created on first use. This is also true for the pop variable.

$#push mystack(myvar) pushes myvar onto mystack
$#pop avar(mystack) pops the top of mystack into avar

or
$#set avar($^mystack$$) also pops the top of mystack into avar.
Any properties specified on mystack are applied to the tos.

The pop command also has an equivalent referencer: "$^". You can request the properties of the variable at the top of the stack by requesting them of the stack variable itself. For example, $^?mystack$$ returns the type of the variable at the top of the stack.

$#sleep (time)

Relinquishes control of the CPU for the specified time, which is in milliseconds. Useful for scripts you want to run in the "background." An example might be a script that watches for change in a page at some URL.

$#set name(value), $#text name(value), $#xml name(value)

Sets the variable having the specified name to the resolved value. The value is resolved to its "natural" type, such as nodeset, node, or string.

$#stop (message)

Aborts Tags processing, placing the specified resolved message as the last line of the output file. If the console window is visible, Tags displays the stop message there, and waits for the user to press a key before terminating the run.

$#trace (value)

Activates or deactivate the trace facilities of Tags according to the resolved value . If the value is false, tracing is turned off. Otherwise, tracing is turned on. Tracing causes Tags commands to be written to the output as Tags executes them. Each traced Tags command line is appended with the state and depth of the condition stack. Use this script command instead of the -Y command-line flag to limit the trace to a portion of your script. If the console window is open, Tags displays the trace information there as well.

$#translate (arg-char-set,fun-char-set)

Translates output characters such that characters matching characters in the arg-char-set (the characters before the comma) are translated to corresponding characters in the fun-char-set (the characters after the comma). This command works like the XPath translate command.

All characters following the $#translate command are translated until either there is no more output or until a $#translate command is encountered that has no argument. It is an error if the number of characters in the argcharset is greater than the number of characters in the funcharset. Any characters in the funcharset beyond the number of characters in the argcharsetare, however, ignored.

For example, if the argcharset contains the three characters "{}|", and if the funcharset contains the three characters <>", the command is written as

$#translate ({}|,&lt;&gt;&quot)      -- no embedded spaces!

$#translate command are translated to '<', all '}' characters in the output are translated to '>', and all '|' characters in the output are translated to '"' (quotes). Thus,

{element attribute=|value|/}

becomes, after translation,

<element attribute="value"/>.

Restrictions apply: You cannot use a comma nor a close-parenthesis in either the argcharset or the funcharset. Otherwise, the program could not parse the command. Watch out: Spaces within the parentheses are subject to the translating rules.

$#vars (message)

Lists the variables sorted by name at the point in Tags processing when the command is encountered. Tags displays the resolved message before the variable list. The variable list is also displayed in the console window if it is open. You could use the following command sequence to display the variables in the console window to help you debug::

$#console(on)
$#vars (Here is a list of the
variables)
$#pause (Press any key)
$#console (off)

Except for the $#defer command, all commands resolve their arguments before applying them. Tags expects that you will use Tags referencers prolifically, both in text and in expressions. Using the $#defer command, you can store commands in variables for later reference.

Any command not in the above list is taken by Tags as a comment. Comments are not written to the output file, but they are displayed in the console window if it is active (use the -W command-line flag, or the $#console command).

A Tags command occupies a single text line, and can be indented; as long as only whitespace preceeds the command, since leading whitespace is ignored. Tags also ignores any text on the same line following the command.

Here's a helpful idea: if your editor has the ability to match braces, you can put an open-brace after each $#if ,$#ifn and$#forEach command, and a close-brace after each matching $#end command, and then you can use your editor match-braces commands to match up begin and end parts of Tags command sequences.

Examples:

$#set $?converter$$($?converter$$)

$#ifn ($?$?converter$$$$)

$#trace (on)...stuff to debug
$#trace (off)

The first example sets a variable whose name is the value of the variable converter to the value of the variable converter. I.e., the name and the value of the variable are the same.

The second example evaluates the value of the variable named by the converter variable. If the variable doesn't exist, or its name and value is false, then the text within the $#ifn is processed.

The third example traces a section of script, then turns the debug off.

XPath functions available in Tags

Tags does not support arithmetic and string expressions, but XPath does. XPath also provides a number of useful functions that are available to the Tags script writer using the $!xpath-expression$$ XPath expression syntax.

The Line Output Management Commands

When you direct Tags to output your text to a file or a pipe using some form of the $#out command, you can manage how it processes your raw output using these line management commands. Simply embed them in your output streams.

The Tab Command ($\t{i})

This output command allows you to format the output text by aligning on specific offsets. Tags supports a variable, called $?tabs$$, which you can set to contain a table of line offsets as adelimited string, which are indexed by the tab command subscript. If you do not provide the tab table, then Tags uses the tab command subscript, itself, as the line offset. Here is an example using a tab table:

<Tags>
Customer Report Using The Tags ODBC Facility (December 10, 2004)
$#set dsn(Tags-customer-dsn)
$#set tabs(+9+39+62+78+95+104+124)
$#set columns(,CustNo,Name,Street,City,State/Prov,ZipCode,Country,Phone)
CustNo$\t{1}Name$\t{2}Street$\t{3}City$\t{4}State/Prov$\t{5}ZipCode$\t{6}Country$\t{7}Phone
------$\t{1}----$\t{2}------$\t{3}----$\t{4}----------$\t{5}-------$\t{6}-------$\t{7}-----

$#forEach SQL(select * from cust)
$?CustNo$$$\t{1}$?Name$$$\t{2}$?Street$$$\t{3}$?City$$$\t{4}$?State/Prov$$$\t{5}$?ZipCode$$$\t{6}$?Country$$$\t{7}$?Phone$$
$#end
</Tags>

In this example, the offset associated with $\t{1} is 9, and the offset associated with $\t{4} is 78.

The Join-Line Command ($\j )

Sometimes you need to control the output of parts of a single output line. Use the special symbol $\j at the point on a line where you want to concatenate the next piece of the output line. Here is an example:

<$!@name$$ $\j
$#if ($!@prompt$$)
 prompt="$!@prompt$$" $\j  (notice the space before"prompt")
$#end
$#if ($!@default$$)
 default="$!@default$$" $\j (notice the space before "default")
$#end
$#if ($!@value$$)
> $\j
$!@value$$ $\j
</$!@name$$>;
$#else
/>;
$#end

Notice that the prompt="" and the default="" attributes and the value may not be required. Supposing that only the prompt="" attribute is present, the output would appear as below:

<name prompt="Please say hello"/>

The space on the line before "prompt" puts the space between "name" and "prompt." Note that, as in the example, text beyond the $\j concatenator operator is discarded.

The New-Line Command ($\n)

You can split a line into two output lines using the $\n command. The text before the $\n is written to the output, then a newline is written, and then the text after the $\n command is written to the output.

The CData Commands ($\c and $\d)

The $\c command generates a <![CDATA[ begin-tag in the output stream, while the $\d command generates a closing ]]> end-tag in the output stream.

The Blank-Line Command ($\b{i})

The $\b{i} command removes all subsequent groups of blank lines from the raw output, replacing them with the number of blank lines specified by i. E.g., if your raw output contains groups of blank lines, and you specify $\b3, then each subsequent single group of blank lines is replaced by three blank lines in the "cooked" output.

Errors

All errors detected by Tags result in immediate termination of the resolution process. An error message is generated and appended to the output file. While native Tags errors are explained with a short phrase or sentence, XPath errors are given as an number. You can translate (?) the number using these tables:

XPath parser errors

These errors are the result of a badly-formed XPath expression.

2850	XPE_UNKNOWNENTITY
2851	XPE_BADENTITY
2852	XPE_DOUBLECOLONEXPECTED
2853	XPE_QNAMEEXPECTED
2854	XPE_LPARENEXPECTED
2855	XPE_RPARENEXPECTED
2856	XPE_RPARENNOTEXPECTED
2857	XPE_RBRACKETEXPECTED
2858	XPE_VARNAMEEXPECTED
2859	XPE_LITERALEXPECTED
2860	XPE_UNEXPECTEDEND
2861	XPE_EQUALSIGNEXPECTED
2862	XPE_UNKNOWNOPERATOR
2863	XPE_TOOMANYCOLONS

XPath evaluator errors

These errors are the result of context errors. The expression parsed successfully.

2800	XPE_UNDERRUN
2801	XPE_NODEEXPECTED
2802	XPE_NODESETEXPECTED
2803	XPE_STRINGEXPECTED
2804	XPE_NUMBEREXPECTED
2805	XPE_BOOLEANEXPECTED
2806	XPE_OPNOTEXPECTED
2807	XPE_AXISNAMEUNKNOWN
2808	XPE_WRONGNRARGUMENTS
2809	XPE_PROCINSTEXPECTED
2810	XPE_STACKEMPTY
2811	XPE_STACKNOTEMPTY
2812	XPE_FUNCTIONUNKNOWN
2813	XPE_BADOPERANDTYPE
2814	XPE_EMPTYRESULT
2815	XPE_CONTEXTEXPECTED
2816	XPE_PATHEXPECTED
2817	XPE_DIVIDEBYZERO
2818	XPE_NOVARS

Tags Download

Here is my Electronic License Agreement cribbed from others that I have seen:

This is a legal Agreement between you and Paul J Medlock, Jr. (hereinafter referred to as "I" or "me"). The terms of this Agreement govern your use of the software in the Tags package and any other materials on this website. By downloading and installing the software in the Tags package, or other materials on this website, you are agreeing to be bound by this Agreement. If you do not agree to the terms of this Agreement, please do not download and install the software onto your computer. You are free to use the Tags software on your machine and/or other machines on a LAN in your home and/or at your office at no cost. You are not free to give copies to others. If others are interested in it, direct them to this site instead. You may not sell the software in any form, no matter how well you hide it. Nor can you claim that you wrote it. I did. All materials that are copyrightable are copyrighted by me.

I make no warranty for your use of this software. Nor do I promise that it does what I claim it does. If the documentation makes an outlandish claim of functionality, test the software before assuming that it actually does what's claimed. If you have any problem, or if Tags causes you any loss: personal, financial, hardware, emotional, or otherwise, I am in no way responsible, and I am not liable for any damages whatsoever. If you violate patents, trademarks, or copyrights with the use of this software, I am not a party to that violation, and I won't help you in court. Don't forget, it's free.

In other words, use it at your own risk.

Here is a zip-file containing the latest release of the Windows version of Tags and its supporting and example files. If you download the software, it means you are willing to abide by the terms and conditions of the License Agreement above.

If you use Tags, please be kind enough to give me some feedback: bugs, ideas for features, comments, etc. If you are interested in a version for Linux or Unix, let me know.