Tags
A Scripting Language for Text
- Introduction
- How to Execute a Tags Script
- Some Basics
- Examples
- Tags Referencers
- Tags Variables
- Properties of Variables
- Commands
- Conditional Commands
- Regular Expressions
- The $#forEach Command
- Variables Associated with the forEach Command
- Using the forEach File Interface
- Using the forEach SQL Query Interface
- The while Commands
- Additional Commands
- $#add name(value), $#adds name(value), $#addt
name(value), $#addu name(value)
- $#console (value)
- $#debug (value)
- $#defer name(value)
- $#drop (name)
- $#exec wait(command-line)
- $#get varName(prompt)
- $#in name(file-name), $#inb name(file-name), $#inp
name(file-name)
- $#include (file-name)
- $#match string(regular-expression)
- $#message (message), $#msg (message)
- $#open wait(file-name)
- $#out name(file-name), $#outa
name(file-name)
- $#pause (message)
- $#play (wave-file-name)
- $#pop, $#push
- $#sleep (time)
- $#set name(value), $#text name(value), $#xml
name(value)
- $#stop (message)
- $#trace (value)
- $#translate (arg-char-set,fun-char-set)
- $#vars (message)
- Examples:
- The Line Output Management Commands
- Errors
- Tags Download
Introduction
Tags is a scripting language for processing text. You can write simple Tags
scripts to process plain and delimited text. You can also easily extract
information from HTML and XML documents obtained from websites, use ODBC to
support SQL queries, and use simple commands to manipulate folders and text
files.
You can write a valid Tags script in a single line of text. But you could
also write a Tags script that spans many files to implement, for example, a
complex document and software generation library. I know because I have.
Here is the traditional "Hello World" written as a Tags script:
<hello>
Hello World.
</hello>
A Tags script is always embedded within an XML document as text (a text node,
in XML language). A trivial Tags script is simply the text contained in the
document element of the script document - in this case, the text within the
hello element. Tags' default action is to output the text it finds as it
processes the script to the standard output. If you replaced the "Hello
World." text with the text of a book, Tags would output the entire text of
the book. But a Tags script can do much more.
Here are some simple sample scripting snippets:
* Load the Top Stories RSS document from the CNN website into a Tags
variable, and then save it in a file.
<topstories script="/topstories/text()">
$#in topstories(http://rss.cnn.com/rss/cnn_topstories.rss)
$#out topstories(topstories.xml)
</topstories>
Tags commands are identified by the leading "$#" character sequence (but
there can be leading spaces as in the sample). The Tags language also
supports variables, and in this example, topstories is a Tags variable. The $#in command reads the document identified
by the URL in the parentheses into the topstories variable. The $#out command writes the contents of the
topstories variable to the file
named topstories.xml.
* Read text lines from a file, and write them to the standard output.
<copy script="/copy/text()">
$#forEach line(myfile.txt)
$?forEach$$
$#end
</copy>
The line-variant of the $#forEach command (there are several other
variants as you will see later) reads each text line from myfile.txt,
placing the text line in the forEach
variable where you can reference it in the part of the script between
the $#forEach and the following $#end commands. In the sample script, the
line following the $#forEach command
references the forEach variable.
Tags variables are referenced by preceeding the variable name with the "$?"
character sequence, and following the variable name with the "$$" character
sequence. (You can change the characters that Tags will expect within the
script using the marks attribute in
the document element, but it's probably not worth doing.) In this example,
the $?forEach$$ reference causes the
contents of the forEach variable to
replace the variable reference and to be written to the standard Tags output
file.
* Select records from a database, and write them to the standard output.
<select script="/select/text()">
$#set dsn(customerDSN)
$#set username(myusername)
$#set password(mypassword)
$#forEach SQL(select * from customer)
$?forEach$$
$#end
</select>
Tags uses the ODBC
interface to support the SQL
query. To run this script, you need a database table, called customer,
and you need to have defined a DSN
(Data Source Name), called customerDSN, to provide the interface
information to the ODBC driver. You must also preset the dsn variable to the DSN name. You may also
need to set the username and password variables if they are needed. The
SQL-variant of the $#forEach command issues the SQL select
statement to the ODBC driver, and then places each resulting record in the
forEach variable, where you can reference it. In this sample, as in
the previous sample, the $?forEach$$
reference causes the contents of the forEach variable to be written to the
standard Tags output file.
The Tags scripting language supports XPath and regular expressions to allow
considerable scripting power. And its simple but comprehensive command set is
easy for anyone with scripting or programming experience to learn and use. If
you aren't familiar with XPath, here
is a place to start. And if you don't know regular expressions, you could
start here.
How to Execute a Tags Script
You can execute a Tags script from the command line, from within a batch
file, from a program, or from a WSH script (JavaScript or VBScript). The Tags
command line takes the following parameters:
- The name of the Tags script file to execute,
- Any parameters needed by the Tags script.
There are also several pre-defined flag-parameters that you can use:
-V
Plays the ok.wav file on success,
and the error.wav file on failure,
if the files are available.
-X
Displays this manual in the default browser, if both are available.
-Z
Saves variables to files when they are loaded with the $#in command (for debugging).
-n
(n is a number) Adjusts the time Tags sleeps to share CPU cycles between
commands. Not usually required for short runs. Mostly useful for running a
Tags script in the "background.
Example:
> tags hello.xml -v >hello.txt
This command causes Tags to execute using one of the sample files included in
this release. On completion, it plays the ok.wav file if successful, or the error.wav file if not successful (assuming
that the wav-files are present.)
Several sample scripts are included with the release.
When you install the Tags files by downloading and unzipping the tags.zip file from http://paul.medlock.com/tags.zip,
you should also add an environment variable, called tagsPath, and set it to contain the path
to the folder where you installed Tags.
Some Basics
The elements, attributes, and text of a script file are wholly determined by
the application. Since you make up the element and attribute names, along
with the structure of the script file, to fit your application, there is no
DTD or schema that describes a valid Tags script.
The text in the Tags script is free-form and can contain any ordinary text
and special characters except for the standard five XML predefined
characters:
- instead of "<", use <
- instead of ">", use >
- instead of "&", use &
- instead of "'" (apostrophe), use '
- instead of """ (quote), use "
If your text contains any of these characters, you may need to convert them
to the equivalent XML entity reference. On the other hand, you can choose to
embed your text in CDATA-sections instead. You can use a CDATA-section
anywhere you could write text, and you can even mix them together, since Tags
treats CDATA-sections as if they were text. A CDATA-section begins with the
string "<![CDATA[" and ends with "]]>". Here is an example:
<element>
put your <![CDATA[<marked> up text &]]> commands here
</element>
The text may also contain white-space: viz., spaces, tabs, and new-lines.
Since these characters are preserved in the text, you will find that they
will frequently appear in your output unless you control their use..
Here's a useful idea: If you aren't
using the CDATA option and you choose to convert the special
characters to entity-references when performing a search-and-replace,
be sure to replace the ampersands with & first. Otherwise,
you will never find the ampersands later to fix them.
|
The examples in this manual may not use the XML entities when they should so
that they are easier to read. But don't forget that you will have to deal
with that issue before you can use your script in Tags. The characters that
Tags uses for markup were chosen so as not to infringe on XML's markup.
Here is another useful idea: You can
check an XML document for being well-formed using Internet Explorer
5+, Netscape 6+. Mozilla, Sea Monkey, FireFox, etc; To use IE, for
example, just drag the name of the file you want to check onto the IE
shortcut on your desktop. IE will recognize the XML file name
extension and display the document. If the document contains an
error, your browser will report the line and column numbers where the
error was detected. Of course, if you use a different file name
extension, the browser may not recognize the file as XML.
|
XML is case-sensitive, and, consequently, XPath expressions are
case-sensitive. Tags is partly case-sensitive. Command names are not, but
variable names are.
A Tags script file is a well-formed XML document. Usually the bulk of the
file is the text that you want in the output. Here is the Hello.xml example
again:
<hello script="/hello/text()">
Hello world.
</hello>
and you can run it with the command line
> tags hello.xml >hello.txt
The document element of a Tags script document should contain the script attribute, which identifies to the
Tags interpreter where the the script is within the document using an XPath
expression. In the example, the value of the script attribute is "/hello/text()". This
is an absolute XPath expression. It's a good idea to always use an absolute
XPath expression to locate the script. The script attribute is optional, but only if
the Tags script is the sole occupant of the document element, as in this
case. We need the script attribute
in more complex script documents, since the script probably will not be in
such an obvious place, so you are probably better off by getting in the habit
of using it.
By default, Tags writes the text generated by the script to the standard
output file, but at least one of the sample scripts we have already discussed
demonstrates how to direct Tags output to other files.
About those pesky whitespace characters. If you look carefully at the
contents of the output file from the Tags run above, you will notice that
there is a blank line, followed by the "Hello world." line. This blank line
resuts from the newline that follows the <Tags> element - the "Hello world."
line is on the next line down. You can remove that extraneous line from the
output in two ways. You could rewrite the script as
<hello script="/hello/text()">Hello world.
</hello>
or you could use a join-command:
<hello script="/hello/text()">$\j
Hello world.
</hello>
The join-command ($\j) is one of several
special text output control commands. Another command is the newline-command,
which breaks lines, and is written as $\n. It causes the text
of the line that follows the command to be written as the next line. In the
following line of text, the newline-command causes the one line to be output
as two lines.
this is the first line$\nthis is the second line
There are other output control commands, but I'll explain them later in the
manual.
As we saw the second sample script, you can redirect output to files other
than the standard output using the $#out command. Let's modify the hello.xml file by using the $#out command to redirect its output to
another file:
<hello script="/hello/text()">$/j
Hello world.
$#out (hello.txt)
</hello>
After you run this example, you will find the output of the script in hello.txt. Note that this version of the
$#out command does not identify a
variable as the output source as did the RSS document load sample in the
first section of this manual.
In most programming languages, the text information is usually marked off
from the other elements of the language with special marks, such as quotation
marks, etc., while the language commands are not marked. In Tags, it's the
other way around: text is written simply as text. It is the special Tags
commands that are marked.
There are two kinds of Tags symbols: commands and referencers. Commands occupy a single line
of text, and are identified by a $-sign followed by a #-sign followed by the
command name. Spaces are not allowed to separate these three parts, but
commands do not have to start in the beginning of the line: there may be
leading spaces. Lines that begin with the "$#" identifier that are followed
by a space or do not have a recognized command name are considered comments
and are ignored.
You use referencers to modify the
outputs that your Tags script generates. You can reference the text and
attributes of the Tags script document, other XML documents that you load,
and variables whose values you set. Referencers begin with a $-sign followed
by either an explanation-point ("!") or a question-mark ("?") followed by an
expression of some kind, followed by two $-signs. Referencers can appear
pretty much anywhere within your text as you need them, but they must be
complete on the same line on which they start. The file copy and ODBC
samples both used the $?forEach$$
variable referencer.
Commands and referencers will be discussed in more detail in subsequent
sections, but here are some examples:
Tags commands:
$#out (myfile.txt)
$#text class(myclass)
$#if (true)
$#end
$#debug (on)
$#get objectname(Enter the name of the new object:)
$# this is a comment
Tags referencers:
$!/model/help/text()$$
prompt="$!@prompt$$"$\j
<map name="Action" value="$?line{1}$$" info="$?line{2}$$"/>
The effect that these commands and referencers might have on the output of a
script depends on the context in which they operate. Different data at the
locations specified by the referencer expressions will result in different
outputs. And, since there is no difference between data and program in Tags,
any referencer could obtain text that contains commands and referencers that
Tags would also process. That's how Tags provides something akin to the
subroutine paradigm that programmers are familiar with, though not exactly,
since Tags does not provide a facility for passing parameters to
"subroutines".
Examples
Here are a couple of examples of more complex activities you can implement in
a few script lines:
* Query a database table, called customer, to obtain customer information,
and write the information into a text file, and then display the results in
notepad. The script assumes that a DSN, called customerDSN, has been
created for the database table access.
<db2text script="/db2text/text()">
$#set dsn(customerDSN) assumes that the DSN customerDSN was previously declared
$#set sqlcolumns(,CustNo,Name,Street,City,State/Prov,ZipCode,Country,Phone)
$#forEach SQL(select * from customer) {
$?CustNo$$,$?Name$$,$?Street$$,$?City$$,$?State/Prov$$,$?ZipCode$$,$?Country$$,$?Phone$$
$#end }
$#out (cust.txt)
$#exec (notepad cust.txt)
</db2text>
If you provide the Tags interpreter the names you want to assign the columns
in the result records using the sqlcolumns variable, you can access
the columns as variables by their name. In order to use ODBC, you must first
have set up the ODBC link for the specific database, as I described earlier.
It is beyond the scope of this manual to explain that, but you can get more
help by following this sequence of steps: Windows Start -> Control Panel
-> Administrative Tools -> Data Sources (ODBC). Ok, now you are on your
own.
* Here is another script that reads an RSS document from the CNN news site,
extracts features from the document to create an HTML document, and displays
it using your default internet browser Note the use of the CDATA-section to
escape all the HTML tags.
<rss2html script="/rss2html/text()"><![CDATA[
$#in contextNode(http://rss.cnn.com/rss/cnn_topstories.rss)
<html>
<head>
<h2>$!/rss/channel/title/text()$$</h2>
</head>
<body>
$#forEach node($!/rss/channel/item$$) { list all the items in the feed
$#set contextNode($?forEach$$)
<h3>$!title/text()$$</h3>
<p>$!description/text()$$</p>
<p><a href="$!link/text()$$">Link</a></p>
$#end }
</body>
</html>
$#out ($?currentPath$$/topstories.htm)
$#open (file://$?currentPath$$/topstories.htm)
]]></rss2html>
This example obtains the Top Stories RSS document from CNN, as in the
earlier sample, and creates an HTML document using the <item> objects
in the document, writes the result to a file called topstories.htm,
and then opens the default browser to display the file. Note that the URL
must begin with "http://".
Tags referencers may be coded virtually anywhere within the text of the
script, and have the form
$reftype symbol { subscriptor } $$
There are two reference types, distinguished by the single-character reftype:
! (exclamation-mark) indicates an
XPath expression. In most circumstances, Tags replaces the referencer with
the value obtained by evaluating the XPath expression.
? (question-mark) indicates a
variable reference. Tags replaces the referencer with the value of the
variable specified by the symbol. Variables are discussed later.
Referencers may be used anywhere within the script where they make sense.
Referencers may contain referencers, or result in text that contains other
referencers, which are also resolved until only unmarked text is left. As
already mentioned, a referencer cannot be split across two or more lines: it
must lie wholly within a single text line.
Examples:
| $!//config/tag$$ |
an XPath expression reference that
identifies all the <tag> elements in all <config>
elements in an XML document. |
| $?forEach$$ |
a reference to the Tags variable that
contains the local value within a $#forEach statement. |
| $?3$$ |
a reference to the third parameter on
the command line |
| receiver->SetSource("$!@source$$"); |
an XPath referencer to the source
attribute embedded in some text in the script document. (Note that
the quotes are part of the output, not part of the referencer.)
|
$?$?index$$$$
|
a reference to the variable identified
by the value of the referenced index variable (a nested
reference)
|
$#set x($!$x+1$$)
|
a Tags $#set command using an
XPath expression reference to increment the variable x by one. (Note that you can
reference a Tags variable within an XPath expression using only the
$-leadin character as documented in the XPath specification. Writing
($!$?x$$+1$$) would also work, except that the Tags interpreter
resolves the reference instead of the XPath interpreter, so you may
have to place it in quotes if it resolves to a string constant.)
|
$@script!/myscript/mysubroutine$$
|
This special form of the XPath
referencer allows to specify a node to use as a reference point when
processing the XPath expression. The example is using the script variable, which is
initialized by Tags to the document element of the script document
itself. If no context node is specified, Tags uses the contents of
the contextNode variable as
the reference point.
|
Subscriptors
When the type of a resolved referencer is a string, a list of strings, or a
nodeset (an XPath object), you can use an optional trailing subscriptor to
obtain a portion of the referencer value. A subscriptor is annotated as an
open curly-brace, followed by a number, followed by a close curly-brace, and
is appended to the end of the referencer before the trailing dual markers
(the $$ tail). The subscriptor, itself, can also incorporate one or more
referencers, but it must resolve to a positive number. If the resolved value
of the subscriptor is zero, or less than zero, the value of the resolved
subscripted referencer is left unchanged (i.e., not subscripted).
If the resolved value of the subscripted referencer is a string, Tags assumes
that the string is a series of fields preceeded by a delimiting character. Any
character can act as a delimiting character, and it is (by definition)
identified as the first character of the string. If the first character is a
comma, the delimiting character is a comma. If the first character is the
letter "A", the delimiting character is the letter "A". (Notice in the
example below that the string is prefixed with a comma to identify the comma
as the delimiting character.) In this manual, strings delimited in this way
are referred to as a delimited
string, or as a string record
.
,Lincoln,Abraham,Springfield,Illinois
If the value of the resolved referencer is a nodeset, Tags obtains the node
in the nodeset corresponding to the subscriptor value, counting the first
node as node one. I.e., the first node in a nodeset variable is identified as
$?nodeSet{1}$$, where nodeSet is the name of the variable.
If the value of the subscriptor is larger than the number of objects (fields,
strings, or nodes), then the value of the subscripted referencer is empty. If
the type of the resolved subscripted referencer is not a string, string list,
or a nodeset, the subscriptor is ignored.
Example of using the subscriptor notation:
$#set record(,$!@xyz$$) note the leading comma
$#set field($?record{5}$$)
Notes on using XPath expressions in Tags
XPath expressions in Tags must be coded within a context. XPath calls it the
context node. Tags has a variable (the next section) called $?contextNode$$,
which provides a default node from which the expression is evaluated. Read this section of
the W3C XPath spec for an explanation of the context node. .
Tags supports variables that can be referenced and assigned values. Each
variable has a name and a value. Unless it violates some other Tags rule, any
alphanumeric string can be a variable name. Values may be of any Tags type
(as described in the next section), or they may be empty. Variable names are
case-sensitive. You set the value of a variable using one of several Tags
commands, and you obtain the value by using the $?varname$$ referencer form.
Tags provides several variables that contain information about the processing
environment of the script. For example, the command-line parameters are
available as variables whose names are the numbers corresponding to the
positions of the parameters that they contain. For example, the first
parameter is available in the variable named "$?1$$", the second
parameter is available in the "$?2$$" variable, and so
on. In the example command line given in the introduction, $?0$$ contains
"Tags", and $?1$$
contains "help.xml".
Tags also allows you to access the command-line flag-parameters using the
form -letter{letter} Examples of command-line
flag-parameters are -D, -C, -a, etc. Flag-parameters are preserved as Tags
variables having the letter as both their name and their value. The names are
always capitalized, regardless whether the flag-parameter is or not.
Variables named "$?a$$" and "$?A$$" are different
variables, and only the second could represent a flag-parameter. The
flag-parameter variables make it easy for the user to communicate special
conditions to the script. By the way, notice that there is no provision for
referencing numeric flag parameters as Tags variables.
You can also reference environment variables by appending their name to
"env.". If you
reference an environment variable, such as PATH, as a variable (e.g., as in
$?env.path$$),
and if you have not declared a variable by that name, the value of the
environment variable is returned. Tags never changes the values of
environment variables, it only allows you to access their values in your
script.
Tags pre-defines a number of variables to provide a means of communicating
between the Tags interpreter and the Tags script. Some of these variables are
associated with specific Tags commands. But there are several which have
meaningful values for the duration of the execution of a script. Following is
a list of Tags variables that have special meaning in the Tags language:
columns, getColumns, inputColumns, SQLColumns,
regXColumns
Used by various variants of the forEach command to parse the forEach input
into fields.
command line flags
Command line flags are referenced by their letter value using the notation
$?x$$, where
x is the actual
letter value of the flag. Tags interprets any command-line parameter that is
immediately preceeded by either a minus sign or a slash as a command line
flag group. Each letter in the group is a flag. Only letters can be used as
flags in Tags. The value of a flag variable is the name of the variable. for
example, if you code -AbC on the command line, Tags will create three
variables called $?A$$, $?B$$, and $?C$$, with respective values of "A", "B",
and "C".
command line parameters
Command line parameters are referenced by their position using the notation
$?n$$, where
n is the index
of the parameter in question. The first parameter is indexed as one.
Parameters are always strings. Command line flags as described above are not
counted and are handled in their own way.
contextNode
Used by XPath references to identify the default root of an XPath search
(string). Set by Tags during initialization to reference the root element of
your Tags script. You set it according to need. Tags provides a special form
for XPath referenec expression that allow you to use any variable as the
context node for the expression. The form is $@var!xpath$$. Note that $@contextNode!expression$$ is the same as
$!expression$$.
currentPath
Contains the absolute path to the current directory. Tags sets this to the
directory from which you are running your Tags script.
date and time
Tags provides date and time information to the Tags script through several
variables, which are updated before the interpreter processes each script
command. $?time$$ (string - format
is hh:mm:ss), $?day$$ (number - day
of the month), $?dayOfWeek$$ (string
- name of the week day), $?dayOfYear$$ (number - Julian day), $?month$$ (number - month of the year),
$?monthName$$ (string - name of the
month), and $?year$$ (number - all
four digits).
dsn
ODBC data source name used by SQL interface (string). You must set this
before using the SQL-variant of the $#forEach command.
empty
Convenience variable set by Tags to contain absolutely nothing. Use it to
clear other variables to empty.
environment variables
Variables whose name starts with "env." is interpreted as an environment
variable, and Tags will attempt to return the value of the corresponding
environment variable, if defined. Otherwise, the value of the reference is
empty. Note that you cannot change the value of an environment variable in a
Tags script. Note that environment variable names are not case sensitive. For
example, reference the Path environment variable as $?env.path$$.
error
Set by Tags as the result of the $#exec and $#open commands. It contains the
value returned by the executed program.
file variables
$?fileDrive$$ (drive:), $?fileName$$, $?filePath$$ (path\), $?fileInfo$$. These variables are set by
the file-variant of the $#forEach
command, which is described later in this document.
HTTP variables
TBD: $?HTTPHeaders$$ and $?HTTPResponseHeaders$$.
grep
Set this with a regular expression before using a $#forEach command to provide a filter in
selecting objects to present in the $?forEach$$ variable. It is not required
for proper forEach operation, but it can improve the performance of your
script in many cases. Even when you set the variable outside the $#forEach loop, it is empty inside the
loop. But, once set, it retains its value outside the loop. This means that,
unless you change its value, yourself, it will have the same value for two
consequtive $#forEach loops, which
might not be what you want. So you should set it or clear it as needed before
each loop. forEach variants that apply the grep variable are the Field, Line,
Lineb, Str, Strb, and the default variants.
last
Set by the $#forEach command to the
index of the last object in the object set being processed by the command
(number). The value is not known in some variants of the $#forEach command, and is set to zero in
those cases.
output
Container in which Tags collects output text that is otherwise undirected,
and is automatically dumped to the standard output if not otherwise used. If
it is copied or appended to another variable, it is flushed.
password
ODBC password used by the SQL interface (string). Not all database accesses
require this, but when they do, you must set the value before using the
SQL-variant of the $#forEach
command.
position
Set by the $#forEach command to the
index of the current forEach value (number). The first object is indexed as
one.
random
Provides a source for random numbers when referenced. If the script sets this
variable, Tags re-randomizes the random number source according to the value
in the random variable.
regex variables
After performing a $#match or regex
version of an $#if or $#ifn, a set of variables contain the
matching substrings. The $?regXCount$$ variable specifies the
number of matched substrings, and the $?regXi$$ variables contain the matched
substrings; e.g., the third matched substring is in the variable named $?regX3$$ while the original matched
string is in the variable named $?regX0$$. TBD: $?regXColumns$$.
script
Set by Tags during initialization to the root of the Tags script (XPath
node). Use this to implement subroutines by writing XPath expressions
referencing other elements within the same Tags script, as in the following
example. $@script!/myscript/mysubroutine$$.
sqlcolumns
Set this before using the SQL-version of the $#forEach command to define the
fields of the record set you expect to obtain via your select statement. See
also the example given earlier.
Set by Tags to the tagsPath
environment variable if present. Otherwise set to the path of the Tags
executable (string).
uniqueID
A read-only variable that provides a source of universally unique UUIDs.
userName
ODBC user name by the SQL interface (string). Not all database accesses
require this, but when they do, you must set the value before using the SQL-variant of the $#forEach command.
You should remember that, except for environment variables, your script can
set the value of any variable, and you can lose valuable information by
overwriting the values of certain variables. For example, you will lose the
value of the variable $?script$$ by setting
it to some other value. On the other hand, you may well overwrite the value
of $?contextNode$$
frequently when you are using XPath expressions.
Properties of Variables
When a Tags variable is defined, it has a value, and it also has three
additional properties that can be ascertained using the $?#, $?%, and $??
prefixes. Note that these are the regular $? prefix with an additional #, %
and ? appended, respectively.
Number of Fields
Use the $?# prefix to obtain the number of fields in the contents of the
specified variable. Note that this value only makes sense when the variable
contains a string.
Length
Use the $?% prefix to obtain the length of the contents of a specified
variable.
Type
Use the $?? prefix to obtain the type of the contents of the specified
variable. There are a number of types that Tags values might have. Here is a
list of the types along with the meaning of the length property for that type
in parentheses.
- string (number of characters)
- string_list (number of strings)
- node_list (number of nodes)
- element_node (1)
- attribute_node (1)
- text_node (1)
- cdata_section_node (1)
- entity_reference_node (1)
- entity_node (1)
- processing_instruction_node(1)
- comment_node (1)
- document_node(1)
- document_fragment_node (1)
- notation_node (1)
Value types must be compatible with the context. An XPath node or XPath
nodeset value resulting from the resolution of a Tags referencer discovered
in ordinary text is converted to text, or may be an error. String type values
are acceptable everywhere. When XPath expressions obtain boolean or numeric
values, Tags converts them to strings.
$#set x( this is a string)
$#set t($??x$$) t is set to "string"
$#set f($?#x$$) f is set to 4
$#set i($?%x$$) i is set to 17
After the four $#set commands are processed, x contains " this is a string",
t contains "string", f contains the number of fields in x, which is four, and
i contains the length of x, which is 17. If x were set to a node list, its
length is taken as the number of nodes in the list and the number of fields
is set to zero. If x were set to a string list, its length is defined as the
number of strings in the list. And so on.
Commands
Here are some general comments about Tags commands.
A Tags command may be coded virtually anywhere within the text of the script,
but must be the sole occupant of the text line. Tags commands have the
following form:
$#commandName argument1 (argument2 )
Unlike variable names, the commandName is not case-sensitive. While
all commands have a commandName ,
not all commands have argument1 and
argument2, and no command has argument1 without having argument2. In all commands that have argument2, the parentheses are
required.
In most cases where it is used, argument1 is processed differently than
argument2 , Argument1 is usually resolved to a string,
while argument2 is resolved only as
far as needed. On the other hand, argument2
can resolve to a nodelist, or a SQL result set in the forEach command,
for example. This should be fairly intuitive in each case. (yeah right - I'll
try to clarify this more as I work more on the manual.)
When Tags parses a command, it must be able to isolate the two arguments.
This can conflict with the characters that the two arguments must use.
Specifically, Tags uses the following characters to parse a command:
- quote ("""),
- apostrophe ("'"),
- open parenthesis ("("), and
- close parenthesis (")")
If these characters are paired within a command arguments, then Tags should
have no trouble. But if they are not paired, Tags will fail to understand the
command. You can help Tags out by "hiding" unmatched characters by
immediately preceeding the characters with the backward-apostrophe (`). (By
the way, it is harmless, though unnecessary, to hide any character in a
command argument in this way.)
Here is an example:
$#match $?s$$(.*() will fail to parse, but
$#match $?s$$(.*`() will work fine
There are three basic categories of commands:
- Conditional commands
- The forEach command
- Additional commands
Conditional commands perform the
same function they do in any scripting or programming language, they let the
script make decisions, and vary its behaviour according to the conditions it
encounters.
The forEach command provides the
ability to repeat specified functionality over a set of objects, such as
nodes in a nodeset, text lines in a file, inputs from a user, fields in a
text record, etc.
A number of commands that I don't feel like further categorizing fall into
the additional commands group. These
include several debugging commands, an output director command, a variable
setter and a variable loader, an include command, and a number of others. A
bit of a hodge-podge.
Conditional Commands
Tags provides a set of commands that conditionally control the inclusion or
exclusion of text and/or other commands.
$#if (expression)
Is false if the expression evaluates to false , and is true otherwise.
$#ifn (expression)
Is true if the expression evaluates to either empty, to the value zero (0),
or to the string "false" (case ignored), and is false otherwise.
$#elif (expression)
Is false if the expression evaluates to empty, to the value zero (0), or to
the string "false" (case ignored), or if a previous conditional command was
true, and is true otherwise.
$#elifn (expression)
Is false if the expression evaluates to non-empty, is not the value zero (0)
and is not the string "false" (case ignored), or if a previous conditional
command was true. Is true otherwise.
$#else
Is false if a previous conditional command was true, and is true otherwise.
$#end
Required to terminate a conditional command sequence. Also required to
terminate a forEach command, discussed below.
Expressions must resolve to strings to be properly evaluated. Tags
automatically converts XPath boolean and numeric results into strings, so
boolean true and false are converted to their string
equivalents. XPath and variable expression results that are nodes or nodesets
are converted into strings before they are evaluated according to these
rules.
These expression values are recognized as false:
- the value is empty
- the value is zero (0)
- the value is "false"
- the value is "off"
- the value is "no".
All other values are taken as true
.
Examples:
$#if ($!$?position$$ = $?last$$$$)
"$!text()$$",
$#else
"$!text()$$"
$#end
This example shows an $#if-command,
which might be coded within a $#forEach loop, and is a test to determine
if the last object is being processed to decide whether to terminate the line
with a comma. The $#forEach command
is explained in some detail below.
$#if ($?A$$)
do something big deal here...
$#end
The second example tests to determine if the command-line flag A is present
by testing if the variable, named "A", contains a value other than empty.
Additional commands that depend on boolean values also evaluate expressions
according to the same rules as the conditional commands.
Regular Expressions
Scripting in Tags sometimes requires the need for regular expressions. Four
of the conditional commands have additional forms that support the use of
regular expressions in decision making.
$#if string(regular-expression)
Is true if the string matches the regular-expression, and is
false otherwise. If true, subsequent $#elif and $#elifn
statements are ignored.
$#ifn string(regular-expression)
Is true if the string does not match the regular-expression,
and is false otherwise. If true, subsequent $#elif and $#elifn
statements are ignored.
$#elif string(regular-expression)
If evaluated, is true if the string matches the
regular-expression, and is false otherwise. If evaluated and true,
subsequent $#elif and $#elifn statements are ignored.
$#elifn string(regular-expression)
If evaluated, is true if the string does not match the
regular-expression, and is false otherwise. If evaluated and true,
subsequent $#elif and $#elifn statements are ignored.
Each conditional command matches the regular-expression with the string. (Note that it MUST be a string.
Anything else will fail.) If the regular-expression matches the string, and it contains sub-match
expressions (i.e., expressions within parentheses), Tags sets variables to
the matched portions of the string. These variables have names that
correspond to the positions of the sub-match expressions within the
regular-expression. The sub-match variable names have the form $?regXi$$, wherei is the index of the sub-match expression
that corresponds to the variables.
Here is an example:
$#if $?date$$(^([0-9]*)-([0-9]*)-([0-9]*) ([0-9]*):([0-9]*):([0-9]*))
$#set date($?regX2$$/$?regX3$$/$?regX1$$ $?regX4$$:$?regX5$$:$?regX6$$)
$#end
This fragment reformats a date from y-m-d h:m:s to m/d/y h:m:s.
(Just a reminder: Note that the parentheses are all paired in this example,
so that Tags can find the beginning of the expression by matching the pairs.
If the parentheses did not match, you would have to use the ~ character
(tilde) to escape the unmatched parentheses.) If the value of the date
variable is "2005-12-09 14:21:15", then the match generates the following six
sub-match variables:
$?regX1$$ = 2005
$?regX2$$ = 12
$?regX3$$ = 09
$?regX4$$ = 14
$?regX5$$ = 21
$?regX6$$ = 15
One additional variable, called $?regXCount$$ , contains the number of
sub-matched expressions. In the example, its value is six.
If a conditional command evaluates to false, the $?regXCount$$ variable is
set to zero. If a conditional command results in fewer sub-match variables
than the last match, only the variables for the sub-matches of the latest
match survive. Sub-match variables are not managed in any other way.
This is not an explanation for regular expressions. You can find out more by
following this link.
See also the $#match command, which is described later.
The $#forEach Command
$#forEach
type(argument)
Processes the commands and text that fall between the $#forEach command and its matching $#end command once for each object
identified in the $#forEach
argument. For each object, the variable, $?forEach$$, is set to contain the object
to allow the text within the loop to reference the object, while the $?position$$ variable is set to its index.
Note that Tags handles $#forEach
nesting so that the $?forEach$$
variable is maintained according to its context.
While processing the $#forEach loop,
the variable $?last$$ is set to the
index of the last object to be processed in the loop .
The type can be either empty or it can be "count", "field", "get", "input", or "SQL". If empty, the $#forEac h argument should resolve into
either a nodeSet containing zero or more nodes, a single node, or zero or
more text lines (each line is taken as a$#forEach object.)
Count
If the type field specifies "count", then the $#forEach argument must resolve into a
number. The $#forEach logic performs
the loop once for each value from one to the argument value, incrementing by
one for each pass.
Field
If the type field specifies "field",
then the $#forEach argument should
resolve to a string record, with its first character identifying the field
separator character. The $#forEach logic loops for each field in the
argument, setting the $?forEach$$
variable to each field in turn.
File
If the type field specifies "file",
then the $#forEach argument should
resolve to a string record having the form, |directory|mask|type, where directory is the path to the directory of
interest, mask is a filename
expression, and type may be any
"sum" of dir, tree, data, or any. Combine them using the plus-sign (+).
The mask expression can use the plus-sign (+) and the minus-sign (-) to
include or exclude ambiguous or absolute file names. E.G., *.cpp+*.h-s* includes cpp files and header
files except those that start with the letter s. The $?forEach$$ variable contains the full
pathname of each file that the forEach command finds per each iteration..
Get
If the type field specifies "get", then the $#forEach argument should resolve to a
prompt string that is displayed in the console window. User input is
accepted, and when the user presses the Enter-key, the $#forEach loop is performed. During the
pass, the user response is available in the$?forEach$$ variable. When the user
presses the Esc-key, the forEach loop is terminated.
Line, Lineb
If the type field specifies "line" or
"lineb", then the $#forEach argument resolve to a file name.
The $?forEach$$ variable contains
the text of each consequtive text line in the specified file.
The Tags interpreter opens the file, and then performs the loop once for each
text object it finds in the file. The
$?position$$ variable is incremented to
reflect which object is being processed. Since the number of objects within
the file is not known during the loop, the
$?last$$ variable is not valid.
You can use the Lineb variant to ignore blank text lines.
There are two kinds of text objects that Tags recognizes: XML elements, and
simple lines of text terminated by either a newline or a return, or both, in
any combination.
If the first non-whitespace character in a line is a "<", then the object
is assumed to be a valid XML element. The Tags interpreter locates the
end-tag for the element, and then loads the element into a DOM and stores its
reference in $?forEach$$. If Tags is unable to load the document into
the DOM, then Tags quits with prejudice.
If the first non-whitespace character is not a "<", then the text line is
read and loaded into $?forEach$$. Tags can handle text lines as long
as 4095 characters. Any longer than that, and Tags terminates. This variant
of the $#forEach command provides the ability to convert each text
line into a set of variables through the use of the columns variable,
which is discussed in the next section.
When all objects in the file have been processed, the file is closed.
Example:
The file
input.txt contains a list
of numbers followed by names that are associated with the numbers. Here are a
few lines from that file:
0,UNKNOWN
1,CREATE TABLE
2,INSERT
3,SELECT
The problem is to reformat each line so that the output looks like this:
<map value="0" name="UNKNOWN"/>
<map value="1" name="CREATE TABLE"/>
<map value="2" name="INSERT"/>
<map value="3" name="SELECT"/>
The reform script in the file
input.xml uses the
input type of the $#forEach command to
accomplish this:
<reform><![CDATA[$\j
$#forEach line(input.txt)
$#set line(,$?forEach$$)
<map value="$?line{1}$$" name="$?line{2}$$">
$#end
]]></reform>
This example uses subscripting to obtain the individual fields in each line.
Notice the comma in the $#set command. The comma is combined with the text
line to form a value of "
,2,CREATE
TABLE ", which is stored in the
line variable, for example. The leading
comma informs the Tags parser that the fields are separated by commas.
These files are included in this release. Use the following command line to
run this example:
> Tags input.xml "//text()" >map.xml
Node
If the type field specifies "Node", then the $#forEach argument must resolve
to a node list, and the forEach loop is performed once for each node in the
node list. The $?forEach$$ variable will contain each node in turn.
SQL
If the type field specifies "SQL",
then the $#forEach argument must
resolve to a SQL query, which is performed against the DSN named in the $?dsn$$ variable. The forEach loop is
performed once for each row in the result set of the query, with the $?forEach$$ variable containing a row.
Because of the relative complexity of this forEach option, it is discussed
further under its own heading below.
XML
If the type field specifies "XML", then the $#forEach argument must resolve
to a file containing a list of one or more XML documents. The forEach loop is
performed once for each XML document in the file, with the type of the
$?forEach$$ variable being "document_node". The $?forEach$$ variable can be
accessed using XPath expressions.
Variables Associated with the forEach Command
These variables have a special relationship with the $#forEach
command. As the command initializes, it saves the value of the variables, and
restores their values at the end of the loop. Note that some variables are
inputs to the $#forEach command while others are output by the $#forEach
command.
columns, getColumns, inputColumns, SQLColumns
Set these variables to cause the $#forEach command to parse the value
of the forEach variable into a set of variables containing its fields.
If the columns variable is not empty, the parse is applied whenever
the forEach variable is a string. This can happen for the
input-type, the SQL-type, and for the default-type of the
$#forEach command. For all types except the SQL-type, the
format of the columns variable can have one of two forms:
1. ,name1,name2,...,nameN
2. ,name1{size1},name2{size2},...,nameN{sizeN}
Use the first form when the forEach value is a string record, and use
the second form if the forEach value is a record comprised of a set of
fixed-length fields. If a name is omitted, the field is skipped and no
variable is created for that field. While the forms shown above use the comma
as the field delimiter, any special character is acceptable.
In the first form, if the forEach value is not a proper string record,
i.e., does not start with a non-alphanumeric character, the field delimiter
of the columns variable is assumed to be appropriate for the
forEach value as well.
columns is used by the Str/Strb and
the anonymous types of forEach.
getColumns is used by the Get
type,
inputColumns is used by the
Line/Lineb type.
SQLColumns is used by the SQL
type.
forEach
Variable set by the $#forEach
command to contain each object, in turn, that is contained in the forEach
argument. For example, if the forEach argument is a nodeset, then the
$?forEach$$ variable will contain a node. When Tags begins, it initializes
$?forEach$$ to reference the script
text.
position
Variable set to the index of the current object processed by the $#forEach command. (the position of the
first object is one, the second object is two, etc.) When Tags begins, it
initializes
last
Variable set to the index of the last object processed by the $#forEach command. This variable is not
valid during an input -type or SQL-type $#forEach loop. When Tags begins, it
initializes $?last$$ to zero.
contextNode
Unless you use the $@var!xpathExpression$$
form, you must set this variable before using any XPath expression to
search any subtree of an XML document. When Tags begins, It initializes $?contextNode$$ to reference the script
document.
Example:
Here is an example using the variables provided by the $#forEach command:
$#forEach ($!//event$$)
$#set contextNode($?forEach$$)
$#if ($!$?position$$ =$?last$$$$)
"$!text()$$",
$#else
"$!@name$$"
$#end
$#end
$# At this point, after the above forEach command is
$# processed, the value of both the forEach and
$# the contextNode variables revert to the values held before
$# the forEach command was encountered.
Here, the XPath expression "@name "
is to be applied to each of the <event> elements in the script
document. In this example, the script writer has set the $?contextNode$$ variable to let Tags know
where to look for the text() and
name="" attribute by setting the
$?contextNode$$ variable to contain
the current <event> element
object. Note that the $?contextNode$$ variable is not set
automatically.
The values of the $?forEach$$ ,$?position$$, $?last$$ , and $?contextNode$$ variables are saved
before processing a $#forEach loop,
and, at the completion of the $#forEach$$ loop, are reset to their saved
values. Note that while you generally would not $#set the $?forEach$$, $?position$$ , and $?last$$ variables, you should $#set the $?contextNode$$ variable to control the
context of your XPath search expressions within the $#forEach context.
Another example:
Assuming that the following Tags script is stored in a file, called letter.xml, it can be processed with the
following command line:
> tags letter.xml >letter.txt
Tags script in the file, letter.xml:
<letter script="/letter/body/text()">
<body>
$#!/letter/data/salute/text()$$$\j
$!/letter/data/firstname/text()$$$\j
$!/letter/data/lastname/text()$$
$!/letter/data/street/text()$$
$!/letter/data/city/text()$$,$\j
$!/letter/data/state/text()$$$\j
Dear $/letter/data/salute/text()$$:
I am looking for fresh wood for my sawmill. I am especially
looking for Eastern hardwoods. Do you have any on hand? I will
be happy to remove it and pay you a fair price for the opportunity.
Sincerely,
Paul B.
</body>
<data>
<salute>Mr</salute>
<firstname>George</firstname>
<lastname>Washington</lastname>
<street>123 Cherry Lane</street>
<city>Mt Vernon</city>
<state>Virginia</state>
</data>
</letter>
There are several variables associated with the SQL Query interface, which
are discussed in the next section.
Using the forEach File Interface
The form of the forEach argument is
directoryName|fileMask|searchType
The directory name can be any ambiguous or non-ambiguous path given the value
of the $?currentPath$$ variable. The
file mask can be a logical expression comprised of ambiguous and
non-ambiguous file names concatenated with either the plus sign or the minus
sign. The valid searchTypes can be one from the set { root | tree } and one from the set { data | dir | any } where the defaults
are root and data.
The variables that the forEach command sets are
$?fileInfo$$ is a string record
having the form
|fileName|createDate|createTime|createSecs|modificationDate|modificationTime|modificationSecs|size|"dir"
or "data"
$?fileDrive$$ is the drive letter
followed by a colon,
$?filePath$$ is the path followed by
a slash, and
$?fileName$$ is the file name and
extension, if any.
Using the forEach SQL Query Interface
Tags provides a SQL query interface through the SQL variant of the $#forEach command. For example, assuming
that there is an accessable dataset, called name-and-address, on your
computer, the following $#forEach
command implements a simple query to that table:
$#forEach SQL(select name,
street, city, state, zipcode from name-and-address)
..etc
$#end
Generally, the result of a SQL query is what is called a result-set: a set of rows (records) that
satisfy the query. Tags repeats the forEach loop once for each row in the
result-set, setting the $?forEach$$ variable to each row in the result-set,
in turn.
By itself, the $#forEach command given above does not provide enough
information to perform the query. The ODBC system requires additional
information, such as the name of the database in which the name-and-address
table resides, the name of the server computer, and the name of the ODBC
interface driver needed to interface to the specific database server.
To communicate this information, the ODBC interface provides an encapsulation
object, called a DSN, or Data Service
Name, which is maintained by the system as a Registry key, and its associated
entries in the Registry at HKEY_LOCAL_MACHINE/ SOFTWARE/ ODBC/ ODBC.INI/
dsnkey; where dsnkey is the
name of the DSN. (Use your Registry Editor to examine some DSNs, but be
careful not to make any changes to the Registry unless you know what you are
doing - standard warning) These entries usually identify the database name,
the server name, and the ODBC driver name. Depending on the type of database,
other information may be stored there as well.
While there are several ways to create a DSN, the easiest is by using the
ODBC Data Source Administrator tool at Start/Settings/Control Panel/Administrative
Tools/Data Sources (ODBC) . This tool is available in all 32-bit
Windows operating systems, as far as I know.
Many database servers require that a query is accompanied by a username and a
password, which the database administrator sets up beforehand, though not all
database interfaces require a username and a password.
Be that as it may, the Tags SQL Query implementation needs this additional
information to pass on to the ODBC interface. You provide the information to
Tags before the $#forEach SQL
command through specific Tags variables. These variables are named as
follows:
- $?dsn$$
- $?username$$
- $?password$$
The $?dsn$$ variable is always
required, but, depending on the specific ODBC interface, the $?username$$ and $?password$$ may not be required. For
example, generally they are required if you are querying a Microsoft SQL
Server or Oracle database, but are not likely to be required if you are
querying a FoxPro table.
As I mentioned earlier, the result of a successful query is a result-set, and
Tags provides each row in the result-set as a delimited string in the $?forEach$$ variable. To access the
specific columns (fields) in the row (record) contained in the $?forEach$$ variable, you can use the
subscripting feature as in the following example:
<Tags script="/Tags/text()">
$#set dsn(Tags-customer-dsn)
$#forEach SQL(select * from cust)
$?forEach{1}$$, $?forEach{2}$$, $?forEach{3}$$, $?forEach{4}$$, (and so on)
$#end
</Tags>
In this example, the code assumes that a DSN, called Tags-customer-dsn, exists in the
Registry.
Tags provides another way of identifying the columns of the row that does not
use the subscripting method. You can provide the column names as a string
record in a Tags variable, called columns . Tags not only places the column
values in the forEach variable, it
also places the values into variables named in the columns variable.
Here is an example where the programmer has set the columns variable:
<Tags script="/Tags/text()">
$#set dsn(Tags-customer-dsn)
$#set columns(,CustNo,Name,Street,City,State/Prov,ZipCode,Country,Phone)
$#forEach SQL(select * from cust)
$?CustNo$$, $?Name$$, $?Street$$, $?City$$, $?State/Prov$$, $?ZipCode$$, $?Country$$, $?Phone$$
$#end
</Tags>
And, even if you don't provide the column names in the columns
variable, Tags will obtain the column names from the SQL interface, and store
them in the columns variable. If the SQL interface has no name for the
column, Tags substitutes a fill-in name of "SQLCol1" for the first missing
name, "SQLCol2" for the second, and so on. You can always leave the
columns variable empty, and then access it within the $#forEach
loop to find out the names, yourself. This behaviour is somewhat different
than that described for the columns variable in the earlier section
describing the variables associated with the $#forEach command.
The while Commands
while (expression), while string(regularexpression)
Repeats the Tags script between the $#while command and the terminating $#end
command while the expression (or regular expression) is true.
whilen
Repeats the Tags script between the $#whilen command and the terminating
$#end command while the expression (or regular expression) is not true.
last (expression), last string(regularexpression)
Terminates a $#while or $#forEach loop if the expression (or regular
expression) is true.
lastn (expression), lastn string(regularexpression)
Terminates a $#while or $#forEach loop if the expression (or regular
expression) is false.
next (expression), next string(regularexpression)
Skips the remainder of the Tags script before the terminating $#end command
if the expression (or regular expression) is true.
nextn (expression), nextn string(regularexpression)
Skips the remainder of the Tags script before the terminating $#end command
if the expression (or regular expression) is false.
Additional Commands
$#add name(value), $#adds name(value), $#addt name(value),
$#addu name(value)
Inserts/Appends a string or string list to the named string list. If the
specified variable is not a string list, it is converted to one before the
value of the expression is appended to it. The suffixes -s, -t, and -u allow
to insert the value in its sorted position (s), at the top of the list (t),
or is inserted uniquely in its sorted position (u). In the latter case, the
value is added only if that value is not already in the list. If a string
list is being inserted as sorted or unique, each item in the string list is
inserted sorted or unique. Otherwise, the entire list is inserted as a
group.
$#console (value)
Opens the console window if the value is true, closes the
console window if the value is false. This may not be available in some
versions of Tags.
$#debug (value)
Activates or deactivates the debug facilities of Tags according to the
specified value . If the value is false, debugging is turned off. Otherwise,
debugging is turned on. Use this script command instead of the -Z
command-line flag for debugging a short section of script.
When debugging is on, the contents of a variable, when being loaded by the
$#in command, is written to a file.
The name of the file is set to the name of the variable followed by ".dbg".
Also, the XML element or text line read by each iteration of the input type
of $#forEach command is written to
a text file, called "nextelement.dbg
" or "nextline.dbg", respectively.
Use the $#pause command to examine
these files between iterations.
$#defer name(value)
Sets the variable specified by name
to the unresolved form of the specified value. E.g., if the value is a string containing a
referencer, the string is stored in the variable without resolving the
referencer. If the variable is subsequently referenced, its contents will be
resolved at that time.
$#drop (name)
Removes the variable specified by name from the system, releasing any
resources the variable may own, such as its contents. You don't usually need
to use this command, since the system manages its resources automatically,
but it can be used to improve memory usage when variables containing very
large files are no longer needed. This might only be an issue in advanced
circumstances.
$#exec wait(command-line)
Sends the specified command-line to
the operating system for execution. If the word "wait" is present, Tags waits
for its completion. otherwise it does not.
$#get varName(prompt)
Asks the user to enter a value to assign to the variable having the specified
varName . If the console window was
not specified from the command-line (using the -W flag-parameter), nor by the
$#console command, the console
window is opened, the prompt is
displayed, and the user may enter a response, followed by the Enter key. The
console window is left open.
$#in name(file-name), $#inb name(file-name), $#inp
name(file-name)
If the file-name is a URL (starts
with "http://"), Tags loads the document from the internet. Otherwise, it
loads the local document identified by the file-name into the variable having the
specified name. If the file-name does not specifiy a path, the
TagsPath environment variable is
used to determine the directories to search for the file, if the variable is
present. Otherwise, the path of the Tags command you specified in the command
line is used. If you did not specify a path, then Tags will look for the file
in the current directory.
If the file extension is "htm" or "html", then the file is assumed to be an
HTML file. Otherwise, if the first non-blank character in the file is a
"<", then the file is assumed to be a well-formed XML document. If the
document is determined to be an HTML document, then it is "tidied up" to make
it well-formed in the XML sense before it is loaded into a DOM, since the DOM
can only handled well-formed documents. An XML document is loaded into a DOM
straight away.
At the completion of the $#in
command, the named variable will contain the document node representing the
parsed document. If Tags is unable to load the document into a DOM, then Tags
quits with prejudice.
If the first non-blank character is not a "<", then the file is assumed to
be a simple text (or binary) file and it is loaded into the named variable as
simple text. If the file is binary, you cannot manipulate it with Tags.
Note: if the XML file is local, and if it contains $#include commands, these are resolved as
explained in the $#include
description above before the document is parsed.
the $#inp form uses the contents of the $?HTTPHeaders$$ variable to send an
HTTP POST request to the specified URL. You must set up the contents of the
variable before using this command. (Hmmm... Seems to me I overlooked the
data part of the Post. Need to check that out.)
$#include (file-name)
Tags processes the $#include command as a script or XML data object is
loaded. Immediately after loading the script document specified as the first
command-line parameter, Tags processes all $#include commands embedded in the
document. Because of this, the file-name can only reference variables
that contain command-line parameters.
Be sure that included files do not affect the well-formedness of the XML
document when they are inserted into the script document at the include
points. After all includes have been performed, the document must still be
well-formed XML.
****************************************************************************************
Included files can also contain $#include commands. Be careful about
circular references: If some file, say file-A, includes another file, say
file-B, and file-B includes file-A, you have an infinite loop. They are not
detected by Tags, and will cause the program to run until it fills up memory
and crashes. If the full include file path is not specified, Tags looks in
the directory where the script file was found.
$#include commands are also detected
and processed by the $#in command,
but are not handled by the $#forEach command.
$#match string(regular-expression)
Matches the
regular-expression with
the
string . If the
regular-expression matches the
string, and it contains sub-match
expressions (i.e., expressions within parentheses), Tags sets variables to
the matched portions of the string. These variables have names that
correspond to the positions of the sub-match expressions within the
regular-expression. The sub-match variable names have the form
$?regXi$$, where
i is the index of the sub-match expression
that corresponds to the variables.
Here is an example
$#set string(2005-12-09 14:21:15)
$#set expression(^([0-9]*)-([0-9]*)-([0-9]*) ([0-9]*):([0-9]*):([0-9]*))
$#match $?string$$($?expression$$)
This match generates six sub-match variables:
$?regX1$$ = 2005
$?regX2$$ = 12
$?regX3$$ = 09
$?regX4$$ = 14
$?regX5$$ = 21
$?regX6$$ = 15
One additional variable, called
$?regXCount$$ , contains the number of
sub-matched expressions. In the example, its value is six.
If a
$#match command fails, the
$?regXCount$$ variable is set to
zero. If a
$#match command results
in fewer sub-match variables than the last previous
$#match command, only the variables for
the sub-matches of the latest
$#match command are valid. Sub-match
variables are not managed in any other way.
This is not an explanation for regular expressions. You can find out more by
following this
link.
$#message (message), $#msg (message)
TBD
$#open wait(file-name)
Obtains the name of the program that is set by the operating system to open
the type of the specified file-name,
and then executes that program passing the file-name as the sole parameter. If the
word "wait" is present, Tags waits for its completion. otherwise it does not.
$#out name(file-name), $#outa name(file-name)
Normally, the name is omitted.
Outputs the accumulated results of the processing of the Tags script to the
file identified by the file-name. Line management commands are processed at
this time (line management commands are described in a later section of this
manual). If a Tags script contains no $#out command, the accumulated results are
written to the file Tags.txt at the
end of the run.
An alternate use of the $#out
command is to save the contents of the variable specified by the name to the file specified by file-name.
$#pause (message)
After displaying the pause message
in the console window, if it is open (the -W command line flag, or the
$#console command), Tags waits for
you to press a key before continuing. The pause command is ignored if the
console window is not open.
$#play (wave-file-name)
Plays the wave-file specified by the wave-file-name. If the wave-file-name does not specify the path,
the TagsPath environment variable is
used to determine the directories to search for the file if available.
Otherwise, the path of the Tags command you specified in the command line is
used. If you did not specify a path, then Tags will look for the wave-file-name in the current directory.
$#pop, $#push
TBD
$#sleep (time)
Relinquishes control of the CPU for the specified time, which is in
milliseconds. Useful for scripts you want to run in the "background." An
example might be a script that watches for change in a page at some URL.
$#set name(value), $#text name(value), $#xml name(value)
Sets the variable having the specified name
to the resolved value. The
value is resolved to its "natural" type, such as nodeset, node, or string.
$#stop (message)
Aborts Tags processing, placing the specified resolved message as the last line of the output
file. If the console window is visible, Tags displays the stop message there, and waits for the user to
press a key before terminating the run.
$#trace (value)
Activates or deactivate the trace facilities of Tags according to the
resolved value . If the value is false, tracing is turned off.
Otherwise, tracing is turned on. Tracing causes Tags commands to be written
to the output as Tags executes them. Each traced Tags command line is
appended with the state and depth of the condition stack. Use this script
command instead of the -Y command-line flag to limit the trace to a portion
of your script. If the console window is open, Tags displays the trace
information there as well.
$#translate (arg-char-set,fun-char-set)
Translates output characters such that characters matching characters in the
arg-char-set (the
characters before the comma) are translated to corresponding characters in
the
fun-char-set
(the characters after the comma). This command works like the XPath translate
command.
All characters following the
$#translate command are translated until
either there is no more output or until a
$#translate command is encountered that
has no argument. It is an error if the number of characters in the
argcharset is greater than the number of
characters in the
funcharset. Any
characters in the
funcharset beyond
the number of characters in the
argcharsetare, however, ignored.
For example, if the
argcharset
contains the three characters "{}|", and if the
funcharset contains the three characters
<
>
", the command is written as
$#translate ({}|,<>") -- no embedded spaces!
$#translate command are translated
to '<', all '}' characters in the output are translated to '>', and all
'|' characters in the output are translated to '"' (quotes). Thus,
{element attribute=|value|/}
becomes, after translation,
<element
attribute="value"/>.
Restrictions apply: You cannot use a comma nor a close-parenthesis in either
the
argcharset or the
funcharset. Otherwise, the program could
not parse the command. Watch out: Spaces within the parentheses are subject
to the translating rules.
$#vars (message)
Lists the variables sorted by name at the point in Tags processing when the
command is encountered. Tags displays the resolved
message before the variable list. The
variable list is also displayed in the console window if it is open. You
could use the following command sequence to display the variables in the
console window to help you debug::
$#console(on)
$#vars (Here is a list of the
variables)
$#pause (Press any key)
$#console (off)
Except for the $#defer command, all
commands resolve their arguments before applying them. Tags expects that you
will use Tags referencers prolifically, both in text and in expressions.
Using the $#defer command, you can
store commands in variables for later reference.
Any command not in the above list is taken by Tags as a comment. Comments are
not written to the output file, but they are displayed in the console window
if it is active (use the -W command-line flag, or the $#console command).
A Tags command occupies a single text line, and can be indented; as long as
only whitespace preceeds the command, since leading whitespace is ignored.
Tags also ignores any text on the same line following the command.
| Here's a helpful idea: if your editor
has the ability to match braces, you can put an open-brace after each
$#if ,$#ifn and$#forEach command, and a
close-brace after each matching $#end command, and then you can
use your editor match-braces commands to match up begin and end parts
of Tags command sequences. |
Examples:
$#set $?converter$$($?converter$$)
$#ifn ($?$?converter$$$$)
$#trace (on)...stuff to debug
$#trace (off)
The first example sets a variable whose name is the value of the variable
converter to the value of the variable converter. I.e., the name and the
value of the variable are the same.
The second example evaluates the value of the variable named by the converter
variable. If the variable doesn't exist, or its name and value is false, then the text within the $#ifn is processed.
The third example traces a section of script, then turns the debug off.
The Line Output Management Commands
When you direct Tags to output your text to a file or a pipe using some form
of the $#out command, you can manage how it processes your raw output using
these line management commands. Simply embed them in your output streams.
The Tab Command ($\t{i})
This output command allows you to format the output text by aligning on
specific offsets. Tags supports a variable, called $?tabs$$, which you can set to contain a
table of line offsets as adelimited
string, which are indexed by the tab command subscript. If you do not
provide the tab table, then Tags uses the tab command subscript, itself, as
the line offset. Here is an example using a tab table:
<Tags>
Customer Report Using The Tags ODBC Facility (December 10, 2004)
$#set dsn(Tags-customer-dsn)
$#set tabs(+9+39+62+78+95+104+124)
$#set columns(,CustNo,Name,Street,City,State/Prov,ZipCode,Country,Phone)
CustNo$\t{1}Name$\t{2}Street$\t{3}City$\t{4}State/Prov$\t{5}ZipCode$\t{6}Country$\t{7}Phone
------$\t{1}----$\t{2}------$\t{3}----$\t{4}----------$\t{5}-------$\t{6}-------$\t{7}-----
$#forEach SQL(select * from cust)
$?CustNo$$$\t{1}$?Name$$$\t{2}$?Street$$$\t{3}$?City$$$\t{4}$?State/Prov$$$\t{5}$?ZipCode$$$\t{6}$?Country$$$\t{7}$?Phone$$
$#end
</Tags>
In this example, the offset associated with $\t{1} is 9, and the offset associated
with $\t{4} is 78.
The Join-Line Command ($\j )
Sometimes you need to control the output of parts of a single output line.
Use the special symbol $\j at the point on a
line where you want to concatenate the next piece of the output line. Here is
an example:
<$!@name$$ $\j
$#if ($!@prompt$$)
prompt="$!@prompt$$" $\j (notice the space before"prompt")
$#end
$#if ($!@default$$)
default="$!@default$$" $\j (notice the space before "default")
$#end
$#if ($!@value$$)
> $\j
$!@value$$ $\j
</$!@name$$>;
$#else
/>;
$#end
Notice that the prompt="" and the
default="" attributes and the value
may not be required. Supposing that only the prompt="" attribute is present, the output
would appear as below:
<name prompt="Please say hello"/>
The space on the line before "prompt" puts the space between "name" and
"prompt." Note that, as in the example, text beyond the $\j concatenator
operator is discarded.
The New-Line Command ($\n)
You can split a line into two output lines using the $\n command. The text
before the $\n is
written to the output, then a newline is written, and then the text after the
$\n command is
written to the output.
The CData Commands ($\c and $\d)
The $\c command
generates a <![CDATA[ begin-tag in
the output stream, while the $\d command generates a
closing ]]> end-tag in the output
stream.
The Blank-Line Command ($\b{i})
The $\b{i}
command removes all subsequent groups of blank lines from the raw output,
replacing them with the number of blank lines specified by i. E.g., if your raw output contains groups
of blank lines, and you specify $\b3,
then each subsequent single group of blank lines is replaced by three blank
lines in the "cooked" output.
Errors
All errors detected by Tags result in immediate termination of the resolution
process. An error message is generated and appended to the output file. While
native Tags errors are explained with a short phrase or sentence, XPath
errors are given as an number. You can translate (?) the number using these
tables:
XPath parser errors
These errors are the result of a badly-formed XPath expression.
2850
|
XPE_UNKNOWNENTITY |
2851
|
XPE_BADENTITY |
2852
|
XPE_DOUBLECOLONEXPECTED |
2853
|
XPE_QNAMEEXPECTED |
2854
|
XPE_LPARENEXPECTED |
2855
|
XPE_RPARENEXPECTED |
2856
|
XPE_RPARENNOTEXPECTED |
2857
|
XPE_RBRACKETEXPECTED |
2858
|
XPE_VARNAMEEXPECTED |
2859
|
XPE_LITERALEXPECTED |
2860
|
XPE_UNEXPECTEDEND |
2861
|
XPE_EQUALSIGNEXPECTED |
2862
|
XPE_UNKNOWNOPERATOR |
2863
|
XPE_TOOMANYCOLONS |
XPath evaluator errors
These errors are the result of context errors. The expression parsed
successfully.
2800
|
XPE_UNDERRUN
|
2801
|
XPE_NODEEXPECTED |
2802
|
XPE_NODESETEXPECTED |
2803
|
XPE_STRINGEXPECTED |
2804
|
XPE_NUMBEREXPECTED |
2805
|
XPE_BOOLEANEXPECTED |
2806
|
XPE_OPNOTEXPECTED |
2807
|
XPE_AXISNAMEUNKNOWN |
2808
|
XPE_WRONGNRARGUMENTS |
2809
|
XPE_PROCINSTEXPECTED |
2810
|
XPE_STACKEMPTY |
2811
|
XPE_STACKNOTEMPTY |
2812
|
XPE_FUNCTIONUNKNOWN |
2813
|
XPE_BADOPERANDTYPE
|
2814
|
XPE_EMPTYRESULT |
2815
|
XPE_CONTEXTEXPECTED |
2816
|
XPE_PATHEXPECTED |
2817
|
XPE_DIVIDEBYZERO |
2818
|
XPE_NOVARS |
Here is my Electronic License Agreement cribbed from others that I have seen:
This is a legal Agreement between you and Paul J Medlock, Jr. (hereinafter
referred to as "I" or "me"). The terms of this Agreement govern your use of
the software in the Tags package and any other materials on this website. By
downloading and installing the software in the Tags package, or other
materials on this website, you are agreeing to be bound by this Agreement. If
you do not agree to the terms of this Agreement, please do not download and
install the software onto your computer. You are free to use the Tags
software on your machine and/or other machines on a LAN in your home and/or
at your office at no cost. You are not free to give copies to others. If
others are interested in it, direct them to this site instead. You may not
sell the software in any form, no matter how well you hide it. Nor can you
claim that you wrote it. I did. All materials that are copyrightable are
copyrighted by me.
I make no warranty for your use of this software. Nor do I promise that it
does what I claim it does. If the documentation makes an outlandish claim of
functionality, test the software before assuming that it actually does what's
claimed. If you have any problem, or if Tags causes you any loss: personal,
financial, hardware, emotional, or otherwise, I am in no way responsible, and
I am not liable for any damages whatsoever. If you violate patents,
trademarks, or copyrights with the use of this software, I am not a party to
that violation, and I won't help you in court. Don't forget, it's free.
In other words, use it at your own risk.
Here is a zip-file containing the latest release of the Windows version
of Tags and its supporting and
example files. If you download the software, it means you are willing to
abide by the terms and conditions of the License Agreement above.
If you use Tags, please be kind enough to give me some feedback: bugs,
ideas for features, comments, etc. If you are interested in a version for
Linux or Unix, let me know.