Tags

A Scripting Language for Text
  

Introduction

Tags is a scripting language for processing text. You can write simple Tags scripts to process plain and delimited text. You can also easily extract information from HTML and XML documents obtained from websites, use ODBC to support SQL queries, and use simple commands to manipulate folders and text files.

You can write a valid Tags script in a single line of text. But you could also write a Tags script that spans many files to implement, for example, a complex document and software generation library. I know because I have.

Here is the traditional "Hello World" written as a Tags script:
<hello>
Hello World.
</hello>
A Tags script is always embedded within an XML document as text (a text node, in XML language). A trivial Tags script is simply the text contained in the document element of the script document - in this case, the text within the hello element. Tags' default action is to output the text it finds as it processes the script to the standard output. If you replaced the "Hello World." text with the text of a book, Tags would output the entire text of the book. But a Tags script can do much more.

Here are some simple sample scripting snippets:

* Load the Top Stories RSS document from the CNN website into a Tags variable, and then save it in a file.
<topstories script="/topstories/text()">
$#in topstories(http://rss.cnn.com/rss/cnn_topstories.rss)
$#out topstories(topstories.xml)
</topstories>
Tags commands are identified by the leading "$#" character sequence (but there can be leading spaces as in the sample). The Tags language also supports variables, and in this example, topstories is a Tags variable. The $#in command reads the document identified by the URL in the parentheses into the topstories variable. The $#out command writes the contents of the topstories variable to the file named topstories.xml.

* Read text lines from a file, and write them to the standard output.
<copy script="/copy/text()">
$#forEach line(myfile.txt)
$?forEach$$
$#end
</copy>
The line-variant of the $#forEach command (there are several other variants as you will see later) reads each text line from myfile.txt, placing the text line in the forEach variable where you can reference it in the part of the script between the $#forEach and the following $#end commands. In the sample script, the line following the $#forEach command references the forEach variable. Tags variables are referenced by preceeding the variable name with the "$?" character sequence, and following the variable name with the "$$" character sequence. (You can change the characters that Tags will expect within the script using the marks attribute in the document element, but it's probably not worth doing.) In this example, the $?forEach$$ reference causes the contents of the forEach variable to replace the variable reference and to be written to the standard Tags output file.

* Select records from a database, and write them to the standard output.
<select script="/select/text()">
$#set dsn(customerDSN)
$#set username(myusername)
$#set password(mypassword)
$#forEach SQL(select * from customer)
$?forEach$$
$#end
</select>
Tags uses the ODBC interface to support the SQL query. To run this script, you need a database table, called customer, and you need to have defined a DSN (Data Source Name), called customerDSN, to provide the interface information to the ODBC driver. You must also preset the dsn variable to the DSN name. You may also need to set the username and password variables if they are needed. The SQL-variant of the $#forEach command issues the SQL select statement to the ODBC driver, and then places each resulting record in the forEach variable, where you can reference it. In this sample, as in the previous sample, the $?forEach$$ reference causes the contents of the forEach variable to be written to the standard Tags output file.

The Tags scripting language supports XPath and regular expressions to allow considerable scripting power. And its simple but comprehensive command set is easy for anyone with scripting or programming experience to learn and use. If you aren't familiar with XPath, here is a place to start. And if you don't know regular expressions, you could start here.


How to Execute a Tags Script

You can execute a Tags script from the command line, from within a batch file, from a program, or from a WSH script (JavaScript or VBScript). The Tags command line takes the following parameters:
  1. The name of the Tags script file to execute,
  2. Any parameters needed by the Tags script.
There are also several pre-defined flag-parameters that you can use:

-V

Plays the ok.wav file on success, and the error.wav file on failure, if the files are available.

-X

Displays this manual in the default browser, if both are available.

-Z

Saves variables to files when they are loaded with the $#in command (for debugging).

-n

(n is a number) Adjusts the time Tags sleeps to share CPU cycles between commands. Not usually required for short runs. Mostly useful for running a Tags script in the "background.

Example:
> tags hello.xml -v >hello.txt
This command causes Tags to execute using one of the sample files included in this release. On completion, it plays the ok.wav file if successful, or the error.wav file if not successful (assuming that the wav-files are present.)

Several sample scripts are included with the release.

When you install the Tags files by downloading and unzipping the tags.zip file from http://paul.medlock.com/tags.zip, you should also add an environment variable, called tagsPath, and set it to contain the path to the folder where you installed Tags.


Some Basics

The elements, attributes, and text of a script file are wholly determined by the application. Since you make up the element and attribute names, along with the structure of the script file, to fit your application, there is no DTD or schema that describes a valid Tags script.

The text in the Tags script is free-form and can contain any ordinary text and special characters except for the standard five XML predefined characters:
If your text contains any of these characters, you may need to convert them to the equivalent XML entity reference.

On the other hand, you can choose to embed your text in CDATA-sections instead. You can use a CDATA-section anywhere you could write text, and you can even mix them together, since Tags treats CDATA-sections as if they were text. A CDATA-section begins with the string "<![CDATA[" and ends with "]]>". Here is an example:
<element><![CDATA[
put your <marked> up text & commands here
]]></element>
The text may also contain white-space: viz., spaces, tabs, and new-lines. Since these characters are preserved in the text, you will find that they will frequently appear in the output of your script unless you control their use..

Here's a useful idea: If you aren't using the CDATA option and you choose to convert the special characters to entity-references when performing a search-and-replace, be sure to replace the ampersands with &amp; first. Otherwise, you will never find the ampersands later to fix them.

The examples in this manual may not use the XML entities when they should so that they are easier to read. But don't forget that you will have to deal with that issue before you can use your script in Tags. The characters that Tags uses for markup were chosen so as not to infringe on XML's markup.

Here is another useful idea: You can check an XML document for being well-formed using Internet Explorer 5+, Netscape 6+. Mozilla, Sea Monkey, FireFox, etc; To use IE, for example, just drag the name of the file you want to check onto the IE shortcut on your desktop. IE will recognize the XML file name extension and display the document. If the document contains an error, your browser will report the line and column numbers where the error was detected. Of course, if you use a different file name extension, e.g., myscript.tags, the browser may not recognize the file as XML.

XML is case-sensitive, and, consequently, XPath expressions are case-sensitive. Tags is partly case-sensitive. Command names are not, but variable names are.

A Tags script file must be a well-formed XML document. Usually the bulk of the file is the text that you want in the output. Here is the Hello.xml example again:
<hello script="/hello/text()">
Hello world.
</hello>
and you can run it with the command line
> tags hello.xml >hello.txt
The document element of a Tags script document should contain the script attribute, which identifies to the Tags interpreter where the script is within the document using an XPath expression. In the example, the value of the script attribute is "/hello/text()". This is an absolute XPath expression. It's a good idea to always use an absolute XPath expression to locate the script. The script attribute is optional, but only if the Tags script is the sole occupant of the document element, as in this case. We need the script attribute in more complex script documents, since the script probably will not be in such an obvious place, so you are probably better off by getting in the habit of using it.

By default, Tags writes the text generated by the script to the standard output file, but at least one of the sample scripts we have already discussed demonstrates how to direct Tags output to other files.

About those pesky whitespace characters. If you look carefully at the contents of the output file from the Tags run above, you will notice that there is a blank line, followed by the "Hello world." line. This blank line resuts from the newline that follows the <Tags> element - the "Hello world." line is on the next line down. You can remove that extraneous line from the output in two ways. You could rewrite the script as
<hello script="/hello/text()">Hello world.
</hello>
or you could use a join-command:
<hello script="/hello/text()">$\j
Hello world.
</hello>
The join-command ($\j) joins with the next line, and is one of several special text output control commands. Another command is the newline-command, which breaks lines, and is written as $\n. It causes the text of the line that follows the command to be written as the next line. In the following line of text, the newline-command causes the one line to be output as two lines.
this is the first line$\nthis is the second line
There are other output control commands, but I'll explain them later in the manual.

As we saw in the second sample script, you can redirect output to files other than the standard output using the $#out command. Let's modify the hello.xml file by using the $#out command to redirect its output to another file:
<hello script="/hello/text()">$/j
Hello world.
$#out (hello.txt)
</hello>
After you run this example, you will find the output of the script in hello.txt. Note that this version of the $#out command does not identify a variable as the output source as did the RSS document load sample in the first section of this manual. A variable name is not needed because Tags can emit text to a default variablet (its name is output, if you want to reference it), and the $#out command in this example is outputting the text from the default variable to the file. (note: the $#out command flushes the default variable as a side effect. )

In most programming languages, the text information is usually marked off from the other elements of the language with special marks, such as quotation marks, etc., while the language commands are not marked. In Tags, it's the other way around: text is written simply as text. It is the special Tags commands that are marked.

There are two kinds of Tags symbols: commands and referencers. Commands occupy a single line of text, and are identified by  a $-sign followed by a #-sign followed by the command name. Spaces are not allowed to separate these three parts, but commands do not have to start in the beginning of the line: there may be leading spaces. Lines that begin with the "$#" identifier that are followed by a space or do not have a recognized command name are considered comments and are ignored.

You use referencers to modify the outputs that your Tags script generates. You can reference the text and attributes of the Tags script document, other XML documents that you load, and variables whose values you set. Referencers begin with a $-sign followed by either an explanation-point ("!") or a question-mark ("?")  followed by an expression of some kind, followed by two $-signs. Referencers can appear pretty much anywhere within your text as you need them, but they must be complete on the same  line on which they start. On the other hand, their resolved value may span as many lines as desired. The file copy and the ODBC samples both used the $?forEach$$ variable referencer.

Commands and referencers will be discussed in more detail in subsequent sections, but here are some examples:

Tags commands:
$#out (myfile.txt)
$#text class(myclass)
$#if (true)
$#end
$#debug (on)
$#get objectname(Enter the name of the new object:)
$# this is a comment (because of the space after the $#-prefix)
Tags referencers:
$!/model/help/text()$$
prompt="$!@prompt$$"$\j
&lt;map name="Action" value="$?line{1}$$" info="$?line{2}$$"/&gt;
The effect that these commands and referencers might have on the output of a script depends on the context in which they operate. Different data at the locations specified by the referencer expressions will result in different outputs. And, since there is no difference between data and program in Tags, any referencer could obtain text that contains commands and referencers that Tags would also process in a recursive fashion. That's how Tags provides something akin to the subroutine paradigm that programmers are familiar with, though not exactly, since Tags does not provide a facility for passing parameters to "subroutines".


More Samples

Here are a couple of sample scripts of more complex activities you can implement in a few Tags script lines:

  * Query a database table, called customer, to obtain customer information, write the information into a text file, and then display the results in notepad. The script assumes that a DSN, called customerDSN, has been created for the database table access. Check the link given earlier for information about ODBC.
<db2text script="/db2text/text()">
$#set dsn(customerDSN) assumes that the DSN customerDSN was previously declared
$#set sqlcolumns(,CustNo,Name,Street,City,State/Prov,ZipCode,Country,Phone)
$#forEach SQL(select * from customer) {
$?CustNo$$,$?Name$$,$?Street$$,$?City$$,$?State/Prov$$,$?ZipCode$$,$?Country$$,$?Phone$$
$#end }
$#out (cust.txt)
$#exec (notepad cust.txt)
</db2text>
If you provide the names you want to assign the columns in the result records to the Tags interpreter using the sqlcolumns variable, you can access the columns as variables by their name, which I do in this example. In order to use ODBC, you must first have set up the ODBC link for the specific database, as I described earlier. It is beyond the scope of this manual to explain that, but you can get more help by following this sequence of steps in Windows XP: Windows Start -> Control Panel -> Administrative Tools -> Data Sources (ODBC). Ok, now you are on your own.

* Here is another script that builds on the earlier script to read the same RSS document from the CNN news site, extract features from the document to create an HTML document, and then display it using your default internet browser. Note the use of the CDATA-section to escape all the HTML tags.
<rss2html script="/rss2html/text()"><![CDATA[
$#in contextNode(http://rss.cnn.com/rss/cnn_topstories.rss)
<html>
<head>
<h2>$!/rss/channel/title/text()$$</h2>
</head>
<body>
$#forEach node($!/rss/channel/item$$) { list all the items in the feed
  $#set contextNode($?forEach$$)
<h3>$!title/text()$$</h3>
<p>$!description/text()$$</p>
<p><a href="$!link/text()$$">Link</a></p>
$#end }
</body>
</html>
$#out ($?currentPath$$/topstories.htm)
$#open (file://$?currentPath$$/topstories.htm)
]]></rss2html>
This example obtains the Top Stories RSS document from CNN, as in the earlier sample, and creates an HTML document using the <item> objects in the document, writes the result to a file called topstories.htm, and then opens the default browser to display the file. Note that the URL in the $#in command must begin with "http://" so that Tags will know to look for the object on the web.


Tags Referencers

Tags referencers may be coded virtually anywhere within the text of the script, and have the form
$reftype symbol { subscriptor } $$
There are two reference types, distinguished by the single-character reftype:

! (exclamation-mark) indicates an XPath expression. In most circumstances, Tags replaces the referencer with the value obtained by evaluating the XPath expression.

? (question-mark) indicates a variable reference. Tags replaces the referencer with the value of the variable specified by the symbol. Variables are discussed later.

Referencers may be used anywhere within the script where they make sense. A referencer may also contain referencers, and may result in text that contains other referencers, which are also resolved until only unmarked text is left. As already mentioned, a referencer cannot be split across two or more lines: it must lie wholly within a single text line.

Examples:

$!//config/tag$$ an XPath expression reference that identifies all the <tag> elements in all <config> elements in an XML document.
$?forEach$$ a reference to the Tags variable that contains the local value within a $#forEach statement.
$?3$$ a reference to the third parameter on the command line
receiver->SetSource("$!@source$$"); an XPath referencer to the source attribute embedded in  some text in the script document. (Note that the quotes are part of the output, not part of the referencer.)
$?$?index$$$$
a reference to the variable identified by the value of the referenced index variable (a nested reference)
$#set x($!$x+1$$)
a Tags $#set command using an XPath expression reference to increment the variable x by one. (Note that you can reference a Tags variable within an XPath expression using only the $-leadin character as documented in the XPath specification. Writing ($!$?x$$+1$$) would also work, except that the Tags interpreter resolves the reference instead of the XPath interpreter, so you may have to place it in quotes if it resolves to a string constant.)
$@script!/myscript/mysubroutine$$
This special form of the XPath referencer allows to specify a node to use as a reference point when processing the XPath expression. The example is using the script variable, which is initialized by Tags to the document element of the script document itself. If no context node is specified, Tags uses the contents of the contextNode variable as the reference point.
$# forEach node($!/dep/mod[match(@name, "$?forEach$$.[cC]")]/ref/@name$$)
This example demonstrates an XPath referencer that contains a variable referencer.

Subscriptors

When the type of a resolved referencer is a string, a list of strings, or a nodeset (an XPath object), you can use an optional trailing subscriptor to obtain a portion of the resolved referencer value. A subscriptor is annotated as an open curly-brace, followed by a number, followed by a close curly-brace, and is appended to the end of the referencer before the trailing dual markers (the $$ tail). The subscriptor, itself, can also incorporate one or more referencers, but it must resolve to a positive number (integer). If the resolved value of the subscriptor is zero, or less than zero, the value of the resolved subscripted referencer is left unchanged (i.e., not subscripted - subscripts start at one.).

If the resolved value of the subscripted referencer is a string, Tags assumes that the string is a series of fields preceeded by a delimiting character. Any character can act as a delimiting character, and it is (by definition) identified as the first character of the string. If the first character is a comma, the delimiting character is a comma. If the first character is the letter "A", the delimiting character is the letter "A". (Notice in the example below that the string is prefixed with a comma to identify the comma as the delimiting character.) In this manual, strings delimited in this way are referred to as a delimited string, or as a string record .
,Lincoln,Abraham,Springfield,Illinois
If the value of the resolved referencer is a nodeset, Tags obtains the node in the nodeset corresponding to the subscriptor value, counting the first node as node one. I.e., the first node in a nodeset variable is identified as $?nodeSet{1}$$, where nodeSet is the name of the variable.

If the value of the subscriptor is larger than the number of objects (fields, strings, or nodes), then the value of the subscripted referencer is empty. If the type of the resolved subscripted referencer is not a string, string list, or a nodeset, the subscriptor is ignored.

Example of using the subscriptor notation:

$#text pres(,Lincoln,Abraham,Springfield,Illinois)
$#text city($?pres{3}$$) sets city to "Springfield"

$#set record(,$!@xyz$$) note the leading comma
$#set field($?record{5}$$) sets field to the value of the fifth field in the xyz attribute


Tags Variables

Tags supports variables that can be referenced and assigned values. Each variable has a name and a value. Unless it violates some other Tags rule, any alphanumeric string can be a variable name. Values may be of any Tags type (as described in the next section), or they may be empty. Variable names are case-sensitive. You set the value of a variable using one of several Tags commands, and you obtain the value by using the $?varname$$ referencer form.

Tags provides several variables that contain information about the processing environment of the script. For example, the command-line parameters are available as variables whose names are the numbers corresponding to the positions of the parameters that they contain. For example, the first parameter is available in the variable referenced as "$?1$$", the second parameter is available in the "$?2$$" variable, and so on. In the example command line given in the introduction, $?0$$ contains "Tags", and $?1$$ contains "help.xml".

Tags also allows you to access the command-line flag-parameters (annotated in the command-line using  the form -letter{letter}). Examples of command-line flag-parameters are -D, -C, -a, etc. Flag-parameters are preserved as Tags variables having the letter as both their name and their value. The names are always capitalized, regardless whether the flag-parameter is or not. Variables named "$?a$$" and "$?A$$" are different variables, and only the second could represent a flag-parameter. The flag-parameter variables make it easy for the user to communicate special conditions to the script. By the way, notice that there is no provision for referencing numeric flag parameters as Tags variables.

You can also reference an environment variable by appending its name to "env.". If you reference an environment variable, such as PATH, as a variable (e.g., as in $?env.path$$), the value of the environment variable is returned. Tags does not currently change the values of environment variables, it only allows you to access their values in your script. This might change.

Tags pre-defines a number of variables to provide a means of communicating between the Tags interpreter and your Tags script. Some of these variables are associated with specific Tags commands. But there are several which have meaningful values for the duration of the execution of a script. Following is a list of Tags variables that have special meaning in the Tags language:

columns, getColumns, inputColumns, SQLColumns, regXColumns

Used by various variants of the forEach command to parse the forEach input into fields.

command line flags

Command line flags are referenced by their letter value using the notation $?x$$, where x is the actual upper-case letter value of the flag. Tags interprets any command-line parameter that is immediately preceeded by either a minus sign or a slash as a command line flag group. Each letter in the group is a flag. Only letters can be used as flags in Tags. The value of a flag variable is the name of the variable. For example, if you code -AbC on the command line, Tags will create three variables called $?A$$, $?B$$, and $?C$$, with respective values of "A", "B", and "C".

command line parameters

Command line parameters are referenced by their position using the notation $?n$$, where n is the index of the parameter in question. The first parameter is indexed as one. Parameters are always strings. Command line flags as described above are not counted and are handled in their own way.

contextNode

Used by XPath references to identify the default root of an XPath search (string). Set by Tags during initialization to reference the root element of your Tags script. You set it according to need. Tags provides an enhanced form for an XPath referencer expression that allows you to use any variable as the context node for the expression. The form is  $@var!xpath$$. Note that $@contextNode!expression$$ is the same as $!expression$$. The variable should contain an XML node.

currentPath

Contains the absolute path to the current directory. Tags sets this to the directory from which you are running your Tags script.

date and time

Tags provides date and time information to your Tags script through several variables, which are updated before the interpreter processes each script command. $?time$$ (string - format is hh:mm:ss), $?day$$ (number - day of the month), $?dayOfWeek$$ (string - name of the week day), $?dayOfYear$$ (number - Julian day), $?month$$ (number - month of the year), $?monthName$$ (string - name of the month), and $?year$$ (number - all four digits).

dsn

ODBC data source name used by SQL interface (string). You must set this before using the SQL-variant of the $#forEach command.

empty

Convenience variable set by Tags to contain absolutely nothing. Use it to clear other variables to empty as in $#set var($?empty$$).

environment variables

Variables whose name starts with "env." is interpreted as an environment variable, and Tags will attempt to return the value of the corresponding environment variable, if defined. Otherwise, the value of the referencer is empty. Note that you cannot change the value of an environment variable in a Tags script. Note that, unlike other Tags variables, environment variable names are not case sensitive. For example, reference the Path environment variable as $?env.path$$.

error

Set by Tags as the result of the $#exec and $#open commands. It contains the value returned by the executed program.

file variables

$?fileDrive$$ (drive:), $?fileName$$, $?filePath$$ (path\), $?fileInfo$$. These variables are set by the file-variant of the $#forEach command, which is described later in this document.

HTTP variables

TBD: $?HTTPHeaders$$ and $?HTTPResponseHeaders$$.

grep

Set this with a regular expression before using a $#forEach command to provide a filter in selecting objects to present in the $?forEach$$ variable. It is not required for proper forEach operation, but it can improve the performance of your script in many cases. Even when you set the variable outside the $#forEach loop, it appears empty inside the loop. But, once set, it retains its value outside the loop. This means that, unless you change its value, yourself, it will have the same value for two consequtive $#forEach loops, which might not be what you want. So you should set it or clear it as needed before each loop. $#forEach variants that apply the grep variable are the Field, Line, Lineb, Str, Strb, and the default variants. Regular expressions in Tags are compatible with the rules of Perl 5, and are implemented using the PCRE software.

last

Set by the $#forEach command to the index of the last object in the object set being processed by the command (number). The value is not known in some variants of the $#forEach command, and is set to zero in those cases.

output

Container in which Tags collects output text that is otherwise undirected, and is automatically dumped to the standard output if not otherwise used. If it is copied or appended to another variable, it is flushed.

password

ODBC password used by the SQL interface (string). Not all database accesses require this, but when they do, you must set the value before using the SQL-variant of the $#forEach command.

position

Set by the $#forEach command to the index of the current forEach value (number). The first object is indexed as one.

regex variables

After performing a $#match or regex version of an $#if or $#ifn, a set of variables contain the matching substrings. The $?regXCount$$ variable specifies the number of matched substrings, and the $?regXi$$ variables contain the matched substrings; e.g., the third matched substring is in the variable named $?regX3$$ while the original matched string is in the variable named $?regX0$$. TBD: $?regXColumns$$.

script

Set by Tags during initialization to the root of the Tags script (XPath node). Use this to implement subroutines by writing XPath expressions referencing other elements within the same Tags script, as in the following example. $@script!/myscript/mysubroutine/text()$$.

sqlcolumns

Set this before using the SQL-version of the $#forEach command to define the fields of the record set you expect to obtain via your select statement. See also the ODBC example given earlier.

tagsPath

Set by Tags to the tagsPath environment variable if present. Otherwise set to the path of the Tags executable (string).

userName

ODBC user name by the SQL interface (string). Not all database accesses require this, but when they do, you must set the value before using the SQL-variant of the $#forEach command.

You should remember that, except for environment variables, your script can set the value of any variable, and you can lose valuable information by overwriting the values of certain variables. For example, you will lose the value of the variable $?script$$ by setting it to some other value. On the other hand, you may well overwrite the value of $?contextNode$$ frequently when you are using XPath expressions.


Properties of Variables

When a Tags variable is defined, it has a value, and it also has three additional properties that can be ascertained using the $?#, $?%, and $?? prefixes. Note that these are the regular $? prefix with an additional #, % and ? appended, respectively.

Number of Fields

Use the $?# prefix to obtain the number of fields in the contents of the specified variable. Note that this value only makes sense when the variable contains a string.

Length

Use the $?% prefix to obtain the length of the contents of a specified variable. 

Type

Use the $?? prefix to obtain the type of the contents of the specified variable. There are a number of types that Tags values might have. Here is a list of the types along with the meaning of the length property for that type in parentheses. Value types must be compatible with the context. An XPath node or XPath nodeset value resulting from the resolution of a Tags referencer discovered in ordinary text is converted to text, or may be an error. String type values are acceptable everywhere. When XPath expressions obtain boolean or numeric values, Tags converts them to strings.
$#set x( this is a string)
$#set t($??x$$) t is set to "string"
$#set f($?#x$$) f is set to 4
$#set i($?%x$$) i is set to 17
After the four $#set commands are processed, x contains " this is a string", t contains "string", f contains the number of fields in x, which is four, and i contains the length of x, which is 17. If x was set to a node list, its length is taken as the number of nodes in the list and the number of fields is set to zero. If x was set to a string list, its length is defined as the number of strings in the list. And so on.


Commands

Here are some general comments about Tags commands.

A Tags command may be coded virtually anywhere within the text of the script, but must be the sole occupant of the text line. Tags commands have the following form:
$#commandName argument1 (argument2 ) commentable area to the end of the line
Unlike variable names, the commandName is not case-sensitive. While all commands have a commandName , not all commands have argument1 and argument2, and no command has argument1 without having argument2. In all commands that have argument2, the parentheses are required.

In most cases where it is used, argument1 is processed differently than argument2Argument1 is usually resolved to a string, while argument2 is resolved only as far as needed. On the other hand, argument2 can resolve to a nodelist, or a SQL result set in the forEach command, for example. This should be fairly intuitive in each case. (yeah right - I'll try to clarify this more as I work more on the manual.)

When Tags parses a command, it must be able to isolate the two arguments. This can occasionally conflict with the characters that the two arguments must use. Specifically, Tags uses the following characters to parse a command:
If these characters are paired within the command arguments, then Tags should have no trouble. But if they are not paired, Tags will fail to understand the command. You can help Tags out by "hiding" unmatched characters by immediately preceeding the characters with the backward-apostrophe (`) up by the tilde (~). (By the way, it is harmless, though unnecessary, to hide any character in a command argument in this way.)

Here is an example:
$#match $?s$$(.*() will fail to parse, but
$#match $?s$$(.*`() will work fine
There are three basic categories of commands:
  1. Conditional commands
  2. The forEach command
  3. Additional commands
Conditional commands perform the same function they do in any scripting or programming language, they let the script make decisions, and vary its behaviour according to the conditions it encounters.

The forEach command provides the ability to repeat specified functionality over a set of objects, such as nodes in a nodeset, text lines in a file, inputs from a user, fields in a text record, etc.

A number of commands that I don't categorize further fall into the additional commands group. These include several debugging commands, an output director command, several variable setters and a variable loader, an include command, and a number of others. A bit of a hodge-podge.


Conditional Commands

Tags provides a set of commands that conditionally control the inclusion or exclusion of text and/or other commands.

$#if (expression)

Is false if the expression evaluates to false , and is true otherwise.

$#ifn (expression)

Is true if the expression evaluates to either empty, to the value zero (0), or to the string "false" (case ignored), and is false otherwise.

$#elif (expression)

Is false if the expression evaluates to empty, to the value zero (0), or to the string "false" (case ignored), or if a previous conditional command was true, and is true otherwise.

$#elifn (expression)

Is false if the expression evaluates to non-empty, is not the value zero (0) and is not the string "false" (case ignored), or if a previous conditional command was true. Is true otherwise.

$#else

Is false if a previous conditional command was true, and is true otherwise.

$#end

Required to terminate a conditional command sequence. Also required to terminate a forEach command, discussed below.

Expressions must resolve to strings to be properly evaluated. Tags automatically converts XPath boolean and numeric results into strings, so boolean true and false are converted to their string equivalents. XPath and variable expression results that are nodes or nodesets are converted into strings before they are evaluated according to these rules.

These expression values are recognized as false:
All other values are taken as true .

Examples:

$#if ($!$?position$$ = $?last$$$$)
"$!text()$$",
$#else
"$!text()$$"
$#end

This example shows an $#if-command, which might be coded within a $#forEach loop, and is a test to determine if the last object is being processed to decide whether to terminate the line with a comma. The $#forEach command is explained in some detail below.

$#if ($?A$$)
   do something big deal here...
$#end

The second example tests to determine if the command-line flag A is present by testing if the variable, named "A", contains a value other than empty.

Additional commands that depend on boolean values also evaluate expressions according to the same rules as the conditional commands.


Regular Expressions

Scripting in Tags sometimes requires the need for regular expressions. Four of the conditional commands have additional forms that support the use of regular expressions in decision making.

$#if string(regular-expression)

Is true if the string matches the regular-expression, and is false otherwise. If true, subsequent $#elif and $#elifn statements are ignored.

$#ifn string(regular-expression)

Is true if the string does not match the regular-expression, and is false otherwise. If true, subsequent $#elif and $#elifn statements are ignored.

$#elif string(regular-expression)

If evaluated, is true if the string matches the regular-expression, and is false otherwise. If evaluated and true, subsequent $#elif and $#elifn statements are ignored.

$#elifn string(regular-expression)

If evaluated, is true if the string does not match the regular-expression, and is false otherwise. If evaluated and true, subsequent $#elif and $#elifn statements are ignored.

Each conditional command matches the regular-expression with the string. (Note that it MUST be a string. Anything else will fail.) If the regular-expression matches the string, and it contains sub-match expressions (i.e., expressions coded within parentheses in the regular expression), Tags sets variables to the matched portions of the string.  These variables have names that correspond to the positions of the sub-match expressions within the regular-expression. The sub-match variable names have the form $?regXi$$, wherei is the index of the sub-match expression that corresponds to the variables.

  Here is an example:

$#if $?date$$(^([0-9]*)-([0-9]*)-([0-9]*) ([0-9]*):([0-9]*):([0-9]*))
$#set date($?regX2$$/$?regX3$$/$?regX1$$ $?regX4$$:$?regX5$$:$?regX6$$)
$#end
This fragment reformats a date from y-m-d h:m:s to m/d/y h:m:s. (Just a reminder: Note that the parentheses are all paired in this example, so that Tags can find the beginning of the expression by matching the pairs. If the parentheses do not match, you must use the back-quote character (`) to escape the unmatched parentheses.) If the value of the date variable is "2005-12-09 14:21:15", then the match generates the following six sub-match variables:
$?regX1$$ = 2005
$?regX2$$ = 12
$?regX3$$ = 09
$?regX4$$ = 14
$?regX5$$ = 21
$?regX6$$ = 15
One additional variable, called $?regXCount$$ , contains the number of sub-matched expressions. In the example, its value is six.

If a conditional command evaluates to false, the $?regXCount$$ variable is set to zero. If a conditional command results in fewer sub-match variables than the last match, only the variables for the sub-matches of the latest match survive. Sub-match variables are not managed in any other way.

This is not an explanation for regular expressions. You can find out more by following this link.

See also the $#match command, which is described later.


The $#forEach Command

$#forEach type(argument)

Processes the commands and text that fall between the $#forEach command and its matching $#end command once for each object identified in the $#forEach argument. For each object, the variable, $?forEach$$, is set to contain the object to allow the text within the loop to reference the object, while the $?position$$ variable is set to its index. Note that Tags handles $#forEach nesting so that the $?forEach$$ variable is maintained according to its context.

While processing the $#forEach loop, the variable $?last$$ is set to the index of the last object to be processed in the loop .

The type can be either empty or it can be "count", "field", "get", "input", or "SQL". If empty, the $#forEac h argument should resolve into either a nodeSet containing zero or more nodes, a single node, or zero or more text lines (each line is taken as a$#forEach object.)

Count

If the type field specifies "count", then the $#forEach argument must resolve into a number. The $#forEach logic performs the loop once for each value from one to the argument value, incrementing by one for each pass.

Field

If the type field specifies "field", then the $#forEach argument should resolve to a string record, with its first character identifying the field separator character. The $#forEach  logic loops for each field in the argument, setting the $?forEach$$ variable to each field in turn.

File

If the type field specifies "file", then the $#forEach argument should resolve to a string record having the form, |directory|mask|type, where directory is the path to the directory of interest, mask is a filename expression, and type may be any "sum" of dir, tree, data, or any. Combine them using the plus-sign (+). The mask expression can use the plus-sign (+) and the minus-sign (-) to include or exclude ambiguous or absolute file names. E.G., *.cpp+*.h-s* includes cpp files and header files except those that start with the letter s. The $?forEach$$ variable contains the full pathname of each file that the forEach command finds per each iteration..

Get

If the type field specifies "get", then the $#forEach argument should resolve to a prompt string that is displayed in the console window. User input is accepted, and when the user presses the Enter-key, the $#forEach loop is performed. During the pass, the user response is available in the$?forEach$$ variable. The $#forEach loop is terminated when the user presses the Esc-key.

Line, Lineb

If the type field specifies "line" or "lineb", then the $#forEach argument resolve to a file name. The $?forEach$$ variable contains the text of each consequtive text line in the specified file.

The Tags interpreter opens the file, and then performs the loop once for each text object it finds in the file. The$?position$$ variable is incremented to reflect which object is being processed. Since the number of objects within the file is not known during the loop, the $?last$$ variable is not valid.

You can use the Lineb variant to ignore blank text lines.

There are two kinds of text objects that Tags recognizes: XML elements, and simple lines of text terminated by either a newline or a return, or both, in any combination.

If the first non-whitespace character in a line is a "<", then the object is assumed to be a valid XML element. The Tags interpreter locates the end-tag for the element, and then loads the element into a DOM and stores its reference in $?forEach$$. If Tags is unable to load the document into the DOM, then Tags quits with  prejudice.

If the first non-whitespace character is not a "<", then the text line is read and loaded into $?forEach$$. Tags can handle text lines as long as 4095 characters. Any longer than that, and Tags terminates. This variant of the $#forEach command provides the ability to convert each text line into a set of variables through the use of the columns variable, which is discussed in the next section.
 
When all objects in the file have been processed, the file is closed.

Example:
The file input.txt contains a list of numbers followed by names that are associated with the numbers. Here are a few lines from that file:
0,UNKNOWN
1,CREATE TABLE
2,INSERT
3,SELECT
Suppose the problem is to reformat each line so that the output looks like this:
<map value="0" name="UNKNOWN"/>
<map value="1" name="CREATE TABLE"/>
<map value="2" name="INSERT"/>
<map value="3" name="SELECT"/>
The reform script in the file input.xml uses the input type of the $#forEach command to accomplish this:
<reform><![CDATA[$\j
$#forEach line(input.txt)
  $#text line(,$?forEach$$)
<map value="$?line{1}$$" name="$?line{2}$$">
$#end
]]></reform>
This example uses subscripting to obtain the individual fields in each line. Notice the comma in the $#text command. The comma is combined with the text line to form a value of, for example, ",1,CREATE TABLE ", which is stored in the line variable. The leading comma informs the Tags parser that the fields are separated by commas.

These files are included in this release. Use the following command line to run this example:
> Tags input.xml >map.xml

Node

If the type field specifies "Node", then the $#forEach argument must resolve to a node list, and the $#forEach loop is performed once for each node in the node list. The $?forEach$$ variable will contain each node in turn.

SQL

If the type field specifies "SQL", then the $#forEach argument must resolve to a SQL query, which is performed against the DSN named in the $?dsn$$ variable. The $#forEach loop is performed once for each row in the result set of the query, with the $?forEach$$ variable containing each row in turn. Because of the relative complexity of this $#forEach option, it is discussed in more detail under its own heading below.

XML

If the type field specifies "XML", then the $#forEach argument must resolve to a file containing a list of one or more XML documents. The $#forEach loop is performed once for each XML document in the file, with the type of the $?forEach$$ variable being "document_node". The $?forEach$$ variable can be accessed using XPath expressions.


Variables Associated with the forEach Command

These variables have a special relationship with the $#forEach command. As the command initializes, it saves the value of the variables, and restores their values at the end of the loop. Note that some variables are inputs to the $#forEach command while others are output by the $#forEach command.

columns, getColumns, inputColumns, SQLColumns

Set these variables to cause the $#forEach command to parse the value of the forEach variable into a set of variables containing its fields. If the columns variable is not empty, the parse is applied whenever the forEach variable is a string. This can happen for the input-type, the SQL-type, and for the default-type of the $#forEach command. For all types except the SQL-type, the format of the columns variable can have one of two forms:

1. ,name1,name2,...,nameN
2. ,name1{size1},name2{size2},...,nameN{sizeN}

Use the first form when the forEach value is a string record, and use the second form if the forEach value is a record comprised of a set of fixed-length fields. If a name is omitted, the field is skipped and no variable is created for that field. While the forms shown above use the comma as the field delimiter, any special character is acceptable.

In the first form, if the forEach value is not a proper string record, i.e., does not start with a non-alphanumeric character, the field delimiter of the columns variable is assumed to be appropriate for the forEach value as well.

columns is used by the Str/Strb and the anonymous types of forEach.
getColumns is used by the Get type,
inputColumns is used by the Line/Lineb type.
SQLColumns is used by the SQL type.

forEach

Variable set by the $#forEach command to contain each object, in turn, that is contained in the forEach argument. For example, if the forEach argument is a nodeset, then the $?forEach$$ variable will contain a node. When Tags begins, it initializes $?forEach$$ to reference the script text.

position

Variable set to the index of the current object processed by the $#forEach command. (the position of the first object is one, the second object is two, etc.) When Tags begins, it initializes $?position$$ to zero.

last

Variable set to the index of the last object processed by the $#forEach command. This variable is not valid during an input -type or SQL-type $#forEach loop. When Tags begins, it initializes $?last$$ to zero.

contextNode

Unless you use the $@var!xpathExpression$$ form, you must set this variable before using any XPath expression to search any subtree of an XML document. When Tags begins, It initializes $?contextNode$$ to reference the script document.

Example:

Here is an example using the variables provided by the $#forEach command:
$#forEach ($!//event$$)
$#set contextNode($?forEach$$)
$#if ($!$?position$$ =$?last$$$$)
&quot;$!text()$$&quot;,
$#else
&quot;$!@name$$&quot;
$#end
$#end
$# At this point, after the above forEach command is
$# processed, the value of both the forEach and
$# the contextNode variables revert to the values held before
$# the forEach command was encountered.
Here, the XPath expression "@name " is to be applied to each of the <event> elements in the script document. In this example, the script writer has set the $?contextNode$$ variable to let Tags know where to look for the text() and  name="" attribute by setting the $?contextNode$$ variable to contain the current <event> element object. Note that the $?contextNode$$ variable is not set automatically.

The values of the $?forEach$$ ,$?position$$, $?last$$ , and $?contextNode$$ variables are saved before processing a $#forEach loop, and, at the completion of the $#forEach$$ loop, are reset to their saved values. Note that while you generally would not $#set the $?forEach$$, $?position$$ , and $?last$$ variables, you should $#set the $?contextNode$$ variable to control the context of your XPath search expressions within the $#forEach context.

Another example:

Assuming that the following Tags script is stored in a file, called letter.xml, it can be processed with the following command line:
> tags letter.xml >letter.txt
Tags script in the file, letter.xml:
<letter script="/letter/body/text()">
<body>
$#!/letter/data/salute/text()$$$\j
$!/letter/data/firstname/text()$$$\j
$!/letter/data/lastname/text()$$
$!/letter/data/street/text()$$
$!/letter/data/city/text()$$,$\j
$!/letter/data/state/text()$$$\j
Dear $/letter/data/salute/text()$$:

I am looking for fresh wood for my sawmill. I am especially
looking for Eastern hardwoods. Do you have any on hand? I will
be happy to remove it and pay you a fair price for the opportunity.

Sincerely,
Paul B.
</body>
<data>
<salute>Mr</salute>
<firstname>George</firstname>
<lastname>Washington</lastname>
<street>123 Cherry Lane</street>
<city>Mt Vernon</city>
<state>Virginia</state>
</data>
</letter>
There are several variables associated with the SQL Query interface, which are discussed in the next section.



Using the forEach File Interface

The form of the forEach argument is

    |directoryName|fileMask|searchType

The directory name can be any ambiguous or non-ambiguous path given the value of the $?currentPath$$ variable. The file mask can be a logical expression comprised of ambiguous and non-ambiguous file names concatenated with either the plus sign (implements union) or the minus sign (implements difference). The valid searchTypes can be one from the set { root | tree } and one from the set { data | dir | any } where the defaults are root and data.

The variables that the $#forEach command sets are

$?fileInfo$$ is a string record having the form

    |fileName|createDate|createTime|createSecs|modificationDate|modificationTime|modificationSecs|size|"dir" or "data"

$?fileDrive$$ is the drive letter followed by a colon,

$?filePath$$ is the path followed by a forward-slash, and

$?fileName$$ is the file name and extension, if any.

Note that file paths can use the forward-slash or backward-slash.


Using the forEach SQL Query Interface

Tags provides a SQL query interface through the SQL variant of the $#forEach command. For example, assuming that there is an accessable dataset, called name-and-address, on your computer, the following $#forEach command implements a simple query to that table:

$#forEach SQL(select name, street, city, state, zipcode from name-and-address)
  ..etc
$#end

Generally, the result of a SQL query is what is called a result-set: a set of rows (records) that satisfy the query. Tags repeats the forEach loop once for each row in the result-set, setting the $?forEach$$ variable to each row in the result-set, in turn.

By itself, the $#forEach command given above does not provide enough information to perform the query.  The ODBC system requires additional information, such as the name of the database in which the name-and-address table resides, the name of the server computer, and the name of the ODBC interface driver needed to interface to the specific database server.

To communicate this information, the ODBC interface provides an encapsulation object, called a DSN, or Data Service Name, which is maintained by the system as a Registry key, and its associated entries in the Registry at HKEY_LOCAL_MACHINE/ SOFTWARE/ ODBC/ ODBC.INI/ dsnkey; where dsnkey is the name of the DSN. (Use your Registry Editor to examine some DSNs, but be careful not to make any changes to the Registry unless you know what you are doing - standard warning) These entries usually identify the database name, the server name, and the ODBC driver name. Depending on the type of database, other information may be stored there as well.

While there are several ways to create a DSN, the easiest is by using the ODBC Data Source Administrator tool at Start/Settings/Control Panel/Administrative Tools/Data Sources (ODBC) . This tool is available in all 32-bit Windows operating systems, as far as I know.

Many database servers require that a query is accompanied by a username and a password, which the database administrator sets up beforehand, though not all database interfaces require a username and a password.

Be that as it may, the Tags SQL Query implementation needs this additional information to pass on to the ODBC interface. You provide the information to Tags before the $#forEach SQL command through specific Tags variables. These variables are named as follows:
The $?dsn$$ variable is always required, but, depending on the specific ODBC interface, the $?username$$ and $?password$$ may not be required. For example, generally they are required if you are querying a Microsoft SQL Server or Oracle database, but are not likely to be required if you are querying a FoxPro table.

As I mentioned earlier, the result of a successful query is a result-set, and Tags provides each row in the result-set as a delimited string in the $?forEach$$ variable. To access the specific columns (fields) in the row (record) contained in the $?forEach$$ variable, you can use the subscripting feature as in the following example:
<Tags script="/Tags/text()">
$#set dsn(Tags-customer-dsn)
$#forEach SQL(select * from cust)
$?forEach{1}$$, $?forEach{2}$$, $?forEach{3}$$, $?forEach{4}$$, (and so on)
$#end
</Tags>
In this example, the code assumes that a DSN, called Tags-customer-dsn, exists in the Registry.

Tags provides another way of identifying the columns of the row that does not use the subscripting method. You can provide the column names as a string record in a Tags variable, called columns . Tags not only places the column values in the forEach variable, it also places the values into variables named in the columns variable. Here is an example where the programmer has set the columns variable:
<Tags script="/Tags/text()">
$#set dsn(Tags-customer-dsn)
$#set columns(,CustNo,Name,Street,City,Stat