Data Splitter Tutorial

The general setup sequence for Data Splitter solutions is as follows :

This tutorial contains examples of the following :

See the Data Splitter dialogs topic for more information.

The samples referred to in this tutorial are distributed with the program.


Example 1:  Determine the line count of a group of text files (sample: File-Count-Lines.dss)

screen shot: file line counter

In other words, count the "newline" characters in a group of files.   Newline characters may be carriage returns, line feeds, or carriage returns and line feeds.   With Data Splitter you can define "newline" any way you want.

In this solution we're interested only in newlines.   So, here's what to do :

At this point this Data Splitter solution opens files and counts the "NewLines" in them, but no output is produced yet.

Also, the counters must be zeroed out and incremented at the appropriate times :

Continuing the solution :

Sample File-Count-Lines.dss extends this example by placing the results in an HTML table.


Example 2:  Search for + replace text in ASCII files (sample: File-Search-Replace.dss)

screen shot: file search and replace

In this solution we're interested in two types of data :

Two types of data, two nodes.

For example :

screen shot: new text string set

In this example "2002-2009" is replaced with "2002-2010", etc.

Use a file comparison utility such as "fc" or "windiff" to compare the output to the input. "OutFile" in this case has been defined with a wildcard file specification, so selecting View | OutFile will probably produce an error message such as "filename syntax is incorrect".


Example 3:  Search for text in files (sample: File-Search.dss)

screen shot: file search solution

In this solution we're interested in the search text and the line in which it is contained :

The hit should be output when the "Line tail" node is recognized.   At that point the beginning of the line, the hit, and the end of the line have been recognized.   In other words, there's a line with a "hit" in it.   So, output the name of the file containing the line and the line itself :

When one of the search text items is encountered in a line that line will be sent to the "Results" file.   The lines from each file will be preceded by the file name - setting "Execute once per input stream" when executing "FileNameToResults" outputs the file name just once per file.

The trick here is to get a line at a time by recognizing an "anything but newline" pattern.   This effectively breaks the input into lines by restarting at the start node whenever a something not in that set, i.e. a newline (carriage return or line feed, decimal 13 or 10) is encountered.

In order for the "Line tail" node to be recognized the "Line head" and "The Hit" nodes must also have be recognized.   In other words:  because of the link order, Data Splitter can only get to "Line tail" via "Line head" and "The Hit".   When "Line tail" is recognized the line containing the "The Hit" is simply the concatenation of the "Line head", "The Hit" and "Line tail" nodes.

Sample File-Search.dss extends this example by producing both text and HTML result files, plus a list of files with hits and a "stats" summary.



*Home *Download *Help *Site index *FAQs *Support