[Chugalug] tag cloud a mailing list

Sean Brewer seabre986 at gmail.com
Tue Dec 18 04:07:03 UTC 2012


Here's an example of what I'm thinking: https://gist.github.com/4324904

It's in ruby, though. I found a neat stemmer/lemmatizer algorithm and an
implementation in ruby, but not in python.

Here's example output: https://gist.github.com/4324904#comment-657626

On Sat, Dec 15, 2012 at 8:06 PM, Sean Brewer <seabre986 at gmail.com> wrote:

> I forgot to add, that you could use all that stuff to find a probable
> topic of a conversation, which is a basically a tag. I thought that might
> be the direction you were heading. I could be wrong.
>
>
> On Sat, Dec 15, 2012 at 1:00 PM, Sean Brewer <seabre986 at gmail.com> wrote:
>
>> Yeah, I think that's the difference. Code for the word cloud makes a
>> cloud for most commonly used words.
>>
>>
>> On Sat, Dec 15, 2012 at 4:34 AM, Matt Keys <mk6032 at yahoo.com> wrote:
>>
>>>  I ran across a few like that, too. I'm a bit confused as to the
>>> difference between a word cloud and a tag cloud. I'm guessing tag clouds
>>> presume that you've attached some form of tag to an example text, which the
>>> code would use to sort upon whereas word clouds you just point the code to
>>> a pile of text that has not been tagged/grouped?
>>>
>>>
>>> On 12/15/2012 03:44 AM, Sean Brewer wrote:
>>>
>>> I ran across this: https://github.com/larsmans/weighwords
>>>
>>>  It might make what you want to do even easier.
>>>
>>> On Sat, Dec 15, 2012 at 2:14 AM, Sean Brewer <seabre986 at gmail.com>wrote:
>>>
>>>> Actually, you want to do something called lemmaisation, not stemming,
>>>> although they are related, stemming does something slightly different.
>>>> Lemmaisation does what I described.
>>>>
>>>>  I can probably whip up a dirty example with python and nltk.
>>>>
>>>>
>>>> On Fri, Dec 14, 2012 at 10:14 AM, Sean Brewer <seabre986 at gmail.com>wrote:
>>>>
>>>>> If you can export the e-mails easily, general algorithm is something
>>>>> like this:
>>>>>  1. Tokenize the words in the e-mail body.
>>>>> 2. Remove stop words (a, an, the, etc.  You can find word lists, and
>>>>> libraries like NLTK have them built in)
>>>>> 3. Use stemming algorithm to reduce word tokens to their, I think the
>>>>> correct vocabulary is, free morpheme (e.g. convert the token word "passing"
>>>>> to "pass")
>>>>> 4. Rank by frequency of result.
>>>>>
>>>>>  That should get you in the neighborhood.
>>>>>
>>>>>  On Fri, Dec 14, 2012 at 9:11 AM, Matthew Keys <mk6032 at yahoo.com>wrote:
>>>>>
>>>>>>  Does anyone know how to create a tag clouds based on the body of an
>>>>>> email? The google gods point me in the direction of outlook pluggins but
>>>>>> I'm looking for something more linux cli scriptable; maybe something that
>>>>>> could parse through an exported mailbox/folder.
>>>>>>
>>>>>>  _______________________________________________
>>>>>> Chugalug mailing list
>>>>>> Chugalug at chugalug.org
>>>>>> http://chugalug.org/cgi-bin/mailman/listinfo/chugalug
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> Chugalug mailing listChugalug at chugalug.orghttp://chugalug.org/cgi-bin/mailman/listinfo/chugalug
>>>
>>>
>>>
>>> _______________________________________________
>>> Chugalug mailing list
>>> Chugalug at chugalug.org
>>> http://chugalug.org/cgi-bin/mailman/listinfo/chugalug
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://chugalug.org/pipermail/chugalug/attachments/20121217/99f144de/attachment-0001.html>


More information about the Chugalug mailing list