What is Data Splitter?
Data Splitter is a Windows desktop application that searches for text and user-defined patterns in a variety of input sources :
- files, including Microsoft Word .DOC files
- emails (from MAPI message stores - Outlook, in other words)
- web pages
It can be configured to perform actions with the found items, for example :
- outputting the found items, + other text, to a file
- storing the found items in a database field
Data Splitter is highly configurable and can create simple search reports, transform data (search + replace, for example) or generate databases. See the features and help topics for more information.
What makes it special?
Generality
There are numerous software tools developed specifically to extract data from emails ("email parsers"), web pages ("web scrapers"), and files (assorted file search and conversion utilities).
Data Splitter was designed from the outset to be a general-purpose data transformer. Its patented design isn't geared to any one input type, so it can handle just about anything.
Uniqueness in the marketplace
For now, Data Splitter appears to be unique in its ability to transform input from files, the Web, and MAPI emails. There are plenty of ad hoc tools to transform files, scrape the web, parse emails, etc., but Data Splitter stands out from the crowd in its ability to do all of the above.
To demonstrate this point, if you enter these search terms in Google :
email parser web scraper
email parser file converter
file converter web scraper
- you’ll get hits for Set Machine, WWWGrab, and DTUtilities (the previous Data Splitter product & company names) near the top of the organic search results. No one’s likely to enter those search terms - of course - but it does say something about the product’s unique ability to transform a variety of data sources.
Information gathering opportunity for the vendor
This product provides the user with a single tool, a single "go-to" resource that can perform a wide variety of tasks. Such a product also provides the vendor with a unique opportunity to gather information about the wide variety of data transformation needs out there. For example, the "request pages" at datasplitter.com for :
Graphical-tabular representation of solutions
Data transformation is challenging no matter what method used. Data Splitter is no exception, though its unique and proprietary method of configuration via tables and graphs allows creation of data transformations that :
- end users can easily modify, via the tables
- developers can view and modify graphically
Simplified, re-usable ("3-D") solutions
A quick look at any Data Splitter solution will reveal a graph: a 2-dimensional representation of patterns expected in the input, and the relationships between the patterns. Less obvious is the fact that the individual nodes can refer to other graphs, or "node groups". So, in a way, there can be multiple "levels" in a solution, or a third "dimension", if needed. The program can split the input into pieces and make subsequent passes over the pieces, without limit. This isn't just pretty, it's a powerful feature that allows solutions to be broken into smaller, easier to understand, re-usable components. See the Solutions within solutions topic for more information.
Current state of Data Splitter
"Data Splitter" is the working title of a Windows desktop application that combines two products that have been in use since 2002 / 2003: "Set Machine" and "WWWGrab". It is mature, stable software that has been used by me, and others, to perform a variety of data transformation tasks :
- File conversions - extraction of data from HTML, PDF, and other proprietary formats into databases
- Web page scraping - extraction of data from specified websites into databases
- Outlook email parsing - extraction of data from generated emails (bounce messages, Amazon, PayPal + eBay notifications, + others) into databases
Some customers have downloaded the product and figured out how to set it up themselves (so it is possible), but most buy a product license and pay me to set it up. So, at this point it's a software product and a service.
What's next?
The "next generation" Data Splitter will automate much of the customer interaction. The configurations, or "solutions", are relatively simple text files that in many cases can be generated automatically. A new front end / user interface will gather information from the user, analyze data samples and generate the solutions that Data Splitter will use to perform the data tranformations.
The next-gen front end will be a complex piece of software, but I've got a good idea of how to design and build it after years of customer interaction, sample data analysis, and delivery of successful data transformations.
Near-term enhancements
Planned enhancements for Data Splitter in the near future :
- Ability to scan PDF (Portable Document Format) files
- Multiple concurrent HTTP "get" threads for faster scraping of multiple websites
- On-board MySQL database (removing 3rd-party DBMS requirement for some of the samples)
Send a message at the Data Splitter contact page to find out more.
Thanks!
Jim