Miscellaneous Tools for Text Processing |
Forwarding addresses: FreqList Builders haved moved out to ../freq; Randomizers to ../rand 1. Tag Stripper
Removes HTML tags.2. Corpus Builder
And Jan '16 square brackets [bla bla] and curly braces {bla bla}Join up to 50 files - to >500,000 wds. NEW v.3 - ZIP upload, no known limit (APR 2020)3. Random Wiki Entries by SubjectBuild your own balanced corpus with modest labour4. Sentence Extractor / T-Unit Calculator (+ Std. Dev.)File to sentences.5. Proper Stripper BACK!Eliminate proper nouns from the middles of sentences5. The Compleat Stripper (some elements under review)0
Notes
Some of these routines require TEXT files as their input. A text file is a simple file that contains no codes for emphasis, font sizes, etc. To transform a Word file into a text file, simply SAVE it AS text. You will not thereby lose the original file, but create an additional text file (identifiable by the .txt extension on the name). Most of these routines take their inputs from a menu that accesses files on YOUR computer; they have not been adapted for copy-paste text entry. Complex jobs can involve combining routines (e.g., first strip tags of html, save as text file, then combine, build list, extract sentences, or many others).