Home > Text Tools
Miscellaneous Utilities for Text Processing

- and staging ground for useful pieces of future routines.
FreqList Builders now have their own area at ../freq.

1. Tag Stripper

Removes HTML tags.
2. Corpus Builder
Join up to 25 files - to about half a million words.
3. Sentence Extractor / T-Unit Calculator (+ Std. Dev.) *NEW!
File to sentences.
4. Proper Stripper
Under repair summer 2010

5. The Compleat Stripper
NEW JUNE 2010: TEN kinds of text clean-up for input to other routines - including Javascript Regexes and Regex Checker.
6. Three useful off-site DBs (Collocations and Associations)

 

 


Notes

  • Some of these routines require TEXT files as their input. A text file is a simple file that contains no codes for emphasis, font sizes, etc. To transform a Word file into a text file, simply SAVE it AS text. You will not thereby lose the original file, but create an additional text file (identifiable by the .txt extension).

  • Most of these routines take their file inputs from a menu that accesses the hard drive; they have not been adapted for copy-paste text entry. They have not been tested for French.

  • For complex jobs, combine routines (e.g., first strip tags of html file, save as text file, then build list or extract sentences).

  • T Cobb - UQAM