Home | Downloads | Mailing List | About Tenmax | Contact Us | Resellers
Webspiders Teleport Pro Teleport Ultra Teleport VLX Teleport Exec Exec/VLX

WebDisc Web-to-CD Copying Service Legal Archiving Marketing

Dataplex Web Mining Service

AboutEvaluatePurchaseVersion History
Html2Text
Version 4.130.1060 for Windows
Price:  $79.95

About Html2Text

HTML2Text converts HTML files into displayable text -- in milliseconds.  Perfect for both small and large applications:  for summarizing web pages, producing excerpts of pages, accurately detecting changes to pages, and more.  Machine-generated but hand-tuned for performance, HTML2Text can convert thousands of web pages to their text equivalents in a single, lightning-fast pass.  HTML2Text understands HTML 4.0+ together with all of its current extensions (scripts, style sheets, and Dynamic HTML).  It also performs character-set conversions, and can generate text in a variety of formats.

Program Features

  • Uses a machine-generated, hand-tuned FSM (Finite State Machine) for fastest possible conversion speeds:  HTML2Text can convert thousands of files to text in seconds
  • Understands HTML 4.0+ and all current extensions (scripts, style sheets, and Dynamic HTML) (and Microsoft and Netscape dialects of both) 
  • Can render text with faithful line-spacing (as browser would render) or as a single paragraph (optimal for change detection and excerpt generation) 
  • Can perform SGML and ISO-8859 entity conversions

Using HTML2Text

HTML2Text is a command-line program, so you execute it from a DOS box or from the Windows Run command.  As a command-line tool, it is designed for easy integration with other software or processes -- it can be run automatically from a batch file, or by executing it as a separate process from within another program.

Running Html2Text with no parameters displays a usage guide:

c:>html2text
v4.130.1060 (c) 2006 Tennyson Maxwell Information Systems, Inc. 
Converts html files to displayable text

Usage: html2text [-ornastmighfpcwq] 
    path  path to file(s) to convert
      -o  output file or directory (default=convert in place) 
      -r  recurse into subdirectories
      -n  do not perform SGML and ISO-8859 entity conversions
      -a  always attempt conversion, even if file is not HTML
      -s  treat newlines as spaces (generate as single paragraph) 
      -t  rename converted files to have .txt extensions
      -m  create single file with multiple records (must also use -o) 
      -i  prepend records with id string (must also use -m) 
      -g  prepend records with page title (must also use -m) 
      -h  prepend records with original html (must also use -m) 
      -f  prepend records with filename (must also use -m) 
      -p  prepend records with complete path to file (must also use -m) 
      -c  append delimiter after table cell endings
      -w  word-wrap text at W characters per line
      -q  quiet output (no progress messages) 
C:\>


Updated August 13, 2008.  © 2008 Tennyson Maxwell Information Systems, Inc.  All Rights Reserved.  Terms of Use