[Chugalug] tag cloud a mailing list

Sean Brewer seabre986 at gmail.com
Sat Dec 15 08:44:21 UTC 2012


I ran across this: https://github.com/larsmans/weighwords

It might make what you want to do even easier.

On Sat, Dec 15, 2012 at 2:14 AM, Sean Brewer <seabre986 at gmail.com> wrote:

> Actually, you want to do something called lemmaisation, not stemming,
> although they are related, stemming does something slightly different.
> Lemmaisation does what I described.
>
> I can probably whip up a dirty example with python and nltk.
>
>
> On Fri, Dec 14, 2012 at 10:14 AM, Sean Brewer <seabre986 at gmail.com> wrote:
>
>> If you can export the e-mails easily, general algorithm is something like
>> this:
>> 1. Tokenize the words in the e-mail body.
>> 2. Remove stop words (a, an, the, etc.  You can find word lists, and
>> libraries like NLTK have them built in)
>> 3. Use stemming algorithm to reduce word tokens to their, I think the
>> correct vocabulary is, free morpheme (e.g. convert the token word "passing"
>> to "pass")
>> 4. Rank by frequency of result.
>>
>> That should get you in the neighborhood.
>>
>> On Fri, Dec 14, 2012 at 9:11 AM, Matthew Keys <mk6032 at yahoo.com> wrote:
>>
>>> Does anyone know how to create a tag clouds based on the body of an
>>> email? The google gods point me in the direction of outlook pluggins but
>>> I'm looking for something more linux cli scriptable; maybe something that
>>> could parse through an exported mailbox/folder.
>>>
>>> _______________________________________________
>>> Chugalug mailing list
>>> Chugalug at chugalug.org
>>> http://chugalug.org/cgi-bin/mailman/listinfo/chugalug
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://chugalug.org/pipermail/chugalug/attachments/20121215/060b6f62/attachment.html>


More information about the Chugalug mailing list