Web extraction

The extraction of data from web pages is also known as "web scraping".
Visit the web scraping topic for more information.

These samples are (more or less) ready to use once they're installed.

The Web-* samples install with Data Splitter.   They visit the websites in the Input URLs list and produce HTML summaries of the extracted data.  

The Census-* samples install separately.   They contain Microsoft Access databases and automatically configure the ODBC DSNs for those databases during installation.   The Census-* samples require MS Access 2000 or later.


Web-watch-words

This sample scans websites periodically for keyword occurrences.   The user specifies :

This sample outputs a web page (HTML file).   You can accumulate results in the output by checking the "append" box for the output file.   In this case results will be added to the bottom of the output file.

Web-watch-words can be modified to produce other file formats or perform database updates.

Installs with Data Splitter.


Web-extract-number

This sample is configured to scrape stock quotes, but can be modified (easily) to scrape labeled numbers from other web pages by changing the Label and the input URL list.

Installs with Data Splitter.


Web-extract-title-header

Extracts two items from each web page in the input URL list :

The "rules" in the HeaderTag list determine which headers are extracted.   These rules work well for the news headlines sites specified in the sample input URL list, but will probably require modification for other sites.

Installs with Data Splitter.


Census-01

U.S. Census Bureau table scraper:  state populations + areas

Extracts selected fields from a single table on the U.S. Census Bureau website and puts the extracted data in a database.

To run this sample :

See the HTML table parser example for more information.

Installs separately.   Requires MS Access 2000 or later.

Data Splitter download button

Census-02

U.S. Census Bureau table scraper:  zip code data

Extracts selected fields from multiple zip code tables on the U.S. Census Bureau website.   Just specify the zip codes you're interested in and press the Grab button.

To run this sample :

This sample demonstrates the use of SQL to convert the user-entered zip codes into a source URL table which is scanned to produce the desired output.

Installs separately.   Requires MS Access 2000 or later.

Data Splitter download button