Guide to the sample solutions accompanying
the Data Splitter installation



EMail-Search.dss

Finds messages containing any of the text strings listed in the "SearchText" string set.   User must specify input folders and "SearchText" items.   Produces HTML output.

Makes use of post-processor File-HTML-Generate-Line-Breaks.dss, which restores the line-by-line appearance of the original emails that is lost in the initial conversion to HTML format.

Note:  Prior to running the sample email parsers it may be advisable to set the message profile in Options | Message.


EMail-Search-Word-Pairs.dss

Finds messages with proximate text strings, i.e. two strings near each other.   User must specify input folders and the "Word1/2" lists.   Produces HTML output.

Makes use of 3 node groups :

HTML-Strip converts HTML tags to text so they will display as-is in the HTML output
EMail-SearchWordPairsInner does the actual formatting of an email that has been determined to have word pairs near each other
HTML-Add-Line-Breaks restores line breaks in HTML, similar to the File-HTML-Generate-Line-Breaks post-processor

This example locates emails with the words "e-mail" OR "email" and variations on the word "parse" somewhere near each other in the message body, "near" being defined as within 200 characters.   See definition of Pattern "(near)".


EMail-To-Database.dss

Sample email parser that transmits extracted fields to a database.   This example parses eBay end-of-auction notification messages into database table "eauction".   Can be customized for other generated email formats by modifying the string sets :

SubjectFilter text that begins the subject field in the email header
TextFields a list of text field labels and their respective database destinations
NumericFields a list of numeric field labels and their respective database destinations
CurrencyFields a list of currency field labels and their respective database destinations

Makes use of 3 node groups :

Name-Address extracts the name and email address (RName + RMail) from the from/to fields in the email header
Decimal-Number gets a decimal number: digits + decimal point + digits
Text gets text from the input, strips leading / training blanks, stops at end of line

A single action group, "NewEMail", transmits the parsed fields to the database.  

Also requires a database / ODBC connection.   File SQL.txt accompanies the installation:  it contains an SQL statement for creating the "eauction" table used by EMail-To-Database.dss.


File-Count-Keywords.dss

Counts words and keywords in a group of HTML files (keywords relating to "email" and "parsing" in this example, see the "Keywords" string set definition).   User must specify input files and keywords of interest.   Produces HTML and text output.

Note:  The link from the Start ("*") node to the "Word" node must always have the largest number (i.e. be the last link in the sequence).   Keywords in nodes with link numbers greater than the general-case "Word" node will never be found (think about this one)!

Uses node group HTML-element.dsss to skip over HTML tags.


File-Count-Lines.dss

Determines the number of lines (new line characters) in a group of text files.   User must specify input files.   Produces HTML output.


File-CRLF-LF.dss

Replaces carriage return / line feed (CRLF) sequences with single line feed characters (LF).   User must specify input files and and an existing output directory.


File-Filter-Unprintables.dss

Extracts printable characters (ASCII 30-126) from the input, discards everything else.   User must specify input files.   Produces "cleaned up" output with newlines where the unprintable characters were.


File-Generate-Site-Index.dss

Generates a website index from a group of HTML files.   Extracts the content of the <TITLE> tag and the "description" META tag, and generates a single HTML file, siteindex.htm.   User must specify the input files and may have to modify the hard-wired META tag search string :

		<META name="description" content=

... depending on how those tags are coded in the input files.


File-HTML-Generate-Line-Breaks.dss

Transforms line breaks (carriage return / line feed pairs) to HTML <BR> tags.   Used to post-process the output of sample EMail-Search.dss (above).

There are two nodes in this solution to handle the possibility that the input contains a mixture of CRLF and LF newlines.   It looks for CRLF 1st, LF 2nd, and converts both to HTML line breaks.


File-LF-CRLF.dss

Replaces line feed characters with carriage return / line feed sequences.   User must specify input files and and an existing output directory.


File-Search.dss

Searches for one or more text strings in the input files.   User specifies :

The "search items" are defined as a string set.   Specify the search text in the "Text" column of "search items".

Running File-Search.dss produces :


File-Search-Replace.dss

Searches for and replaces one or more text strings in the input files.   Specify the search text in the "Text" column of the "new text" string set;   specify the replacement text in the "Other text" column.


Web-Extract-Number.dss

This is an HTML parser that extracts two items from each web page in the input URL list :

Specify the label by pressing the Define Label button.   Press Run and wait for scanning to complete, then press View Results.

The output is an HTML file with a table containing, for each input URL :

The HTML header for the output file is contained in text file Web-Extract-Number-Header.txt (file variable HTMLHeader).   You can modify this text file to change the output's appearance (font, color, etc.).

This example extracts the content of an <h1> tag and a decimal number.   It can be modified to extract from another identifying tag (<title>, for example), and to extract other data formats (numbers without decimal points, text, etc.).


Web-Extract-Title-Header.dss

This HTML parser extracts two items from each web page in the input URL list :

The output is a web page (HTML file) containing a brief listing for each input URL :

The HTML header for the output file is contained in text file Web-Extract-Title-Header.txt (file variable HTMLHeader).   You can modify this text file to change the output's appearance (font, color, etc.).


Web-Watch-Words.dss

This sample watches a list of websites for keyword occurrences.   The user specifies :

The output is a single HTML file (web page) displaying the time of the scan, the URLs of the pages containing the search words, and the text containing the search words.



*Home *Download *Help *Site index *FAQs *Support