Example :  web page number scraper

Here's a simple web scraping example:   Web-Extract-Number.dss, which extracts numbers from the specified web pages.

Many generated web pages contain labeled numbers, i.e. labels followed by a number :

The "other stuff" is invisible HTML code that can simply be ignored for this scraping application.

We also want to extract some type of unique identifier from the page.   This particular scraper looks for the contents of the first HTML <h1> tag :

screen shot: web page number extraction

The parser locates the first <h1> tag on the page and extracts the text between it and the closing </h1> tag.   You can adapt this parser to use the page's <title> tag, or any other identifying HTML tag (element), by changing the start node's string :

After the identifier the parser searches for the specified label on the page, then looks for a DecimalNumber pattern, i.e. some digits, a decimal point, and two more digits.   When it finds that it executes FoundNumber, an action group that transmits the extracted data and other information to the target (file or database, depending on how the target is defined).

This simple example extracts two pieces of information from each web page :

The output is a table that contains, for each web page, both extracted values plus :

This example also uses a file variable, a small file that contains the HTML header inserted at the top of the output (Web-Extract-Number-Header.txt).   You can modify this text file to change the output's appearance (font, color, etc.).

This web scraping / HTML parsing example can be modified to extract other numeric formats, or other string patterns, from the web page list.