Home > Text Tools
Miscellaneous Utilities for Text Processing

Testing and staging ground for useful pieces of future Lextutor routines.

And pieces of existing routines with independent uses

Forwarding addresses:

    FreqList Builders now moved out to ../freq;     Randomizers to ../rand

1. Tag Stripper

Removes HTML tags.
And Jan '16 square brackets [bla bla] and curly braces {bla bla}
2. Corpus Builder
Join up to 25 files - to about half a million words.
3. Random Wiki Entries by Subject
Build your own balanced corpus with modest labour
4. Sentence Extractor / T-Unit Calculator (+ Std. Dev.)
File to sentences.
5. Proper Stripper Under repair

5. The Compleat Stripper (some elements under review Sept 2016)

Brought back on user demand Sept 2016 with problematic experiments removed




  • Some of these routines require TEXT files as their input. A text file is a simple file that contains no codes for emphasis, font sizes, etc. To transform a Word file into a text file, simply SAVE it AS text. You will not thereby lose the original file, but create an additional text file (identifiable by the .txt extension).

  • Most of these routines take their file inputs from a menu that accesses the hard drive or YOUR computer; they have not been adapted for copy-paste text entry.

  • For complex jobs, combine routines (e.g., first strip tags of html file, save as text file, then build list or extract sentences).

    Tom Cobb - UQAM - and correspondents, users, code-bloggers