[Chugalug] tag cloud a mailing list
seabre986 at gmail.com
Sat Dec 15 07:14:07 UTC 2012
Actually, you want to do something called lemmaisation, not stemming,
although they are related, stemming does something slightly different.
Lemmaisation does what I described.
I can probably whip up a dirty example with python and nltk.
On Fri, Dec 14, 2012 at 10:14 AM, Sean Brewer <seabre986 at gmail.com> wrote:
> If you can export the e-mails easily, general algorithm is something like
> 1. Tokenize the words in the e-mail body.
> 2. Remove stop words (a, an, the, etc. You can find word lists, and
> libraries like NLTK have them built in)
> 3. Use stemming algorithm to reduce word tokens to their, I think the
> correct vocabulary is, free morpheme (e.g. convert the token word "passing"
> to "pass")
> 4. Rank by frequency of result.
> That should get you in the neighborhood.
> On Fri, Dec 14, 2012 at 9:11 AM, Matthew Keys <mk6032 at yahoo.com> wrote:
>> Does anyone know how to create a tag clouds based on the body of an
>> email? The google gods point me in the direction of outlook pluggins but
>> I'm looking for something more linux cli scriptable; maybe something that
>> could parse through an exported mailbox/folder.
>> Chugalug mailing list
>> Chugalug at chugalug.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Chugalug