[Chugalug] tag cloud a mailing list

Sean Brewer seabre986 at gmail.com
Fri Dec 14 15:14:27 UTC 2012


If you can export the e-mails easily, general algorithm is something like
this:
1. Tokenize the words in the e-mail body.
2. Remove stop words (a, an, the, etc.  You can find word lists, and
libraries like NLTK have them built in)
3. Use stemming algorithm to reduce word tokens to their, I think the
correct vocabulary is, free morpheme (e.g. convert the token word "passing"
to "pass")
4. Rank by frequency of result.

That should get you in the neighborhood.

On Fri, Dec 14, 2012 at 9:11 AM, Matthew Keys <mk6032 at yahoo.com> wrote:

> Does anyone know how to create a tag clouds based on the body of an email?
> The google gods point me in the direction of outlook pluggins but I'm
> looking for something more linux cli scriptable; maybe something that could
> parse through an exported mailbox/folder.
>
> _______________________________________________
> Chugalug mailing list
> Chugalug at chugalug.org
> http://chugalug.org/cgi-bin/mailman/listinfo/chugalug
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://chugalug.org/pipermail/chugalug/attachments/20121214/99e57067/attachment.html>


More information about the Chugalug mailing list