This routine eliminates capped mid-sentence words from a file with the following Perl REGEX (Regular Expression)
$file =~ s/[^\.\!\?\:\'\n]\s+(?=(\b[A-Z][A-Za-z]+\b))//g);
The effect is to substitute words whose first letter does not ^ follow any [ ] terminal punctuation .?!: or new line \n yet begins a new word /b with any capital letter [A-Z] followed by another letter capital or not [A-Za-z] (to include all-cap words like BBC) - to substitute such words with nothing // throughout the $file everywhere they are found (globally).
|