Yeah, I think that's the difference. Code for the word cloud makes a cloud for most commonly used words. <br><br><div class="gmail_quote">On Sat, Dec 15, 2012 at 4:34 AM, Matt Keys <span dir="ltr"><<a href="mailto:mk6032@yahoo.com" target="_blank">mk6032@yahoo.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div>I ran across a few like that, too. I'm
a bit confused as to the difference between a word cloud and a tag
cloud. I'm guessing tag clouds presume that you've attached some
form of tag to an example text, which the code would use to sort
upon whereas word clouds you just point the code to a pile of text
that has not been tagged/grouped?<div><div class="h5"><br>
<br>
On 12/15/2012 03:44 AM, Sean Brewer wrote:<br>
</div></div></div><div><div class="h5">
<blockquote type="cite">I ran across this: <a href="https://github.com/larsmans/weighwords" target="_blank">https://github.com/larsmans/weighwords</a>
<div><br>
</div>
<div>It might make what you want to do even easier.<br>
<br>
<div class="gmail_quote">On Sat, Dec 15, 2012 at 2:14 AM, Sean
Brewer <span dir="ltr"><<a href="mailto:seabre986@gmail.com" target="_blank">seabre986@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Actually,
you want to do something called lemmaisation, not stemming,
although they are related, stemming does something slightly
different. Lemmaisation does what I described.
<div>
<br>
</div>
<div>I can probably whip up a dirty example with python and
nltk.
<div>
<div><br>
<br>
<div class="gmail_quote">On Fri, Dec 14, 2012 at 10:14
AM, Sean Brewer <span dir="ltr"><<a href="mailto:seabre986@gmail.com" target="_blank">seabre986@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>If you can export the e-mails easily, general
algorithm is something like this:</div>
<div>
<div>1. Tokenize the words in the e-mail body.</div>
<div>2. Remove stop words (a, an, the, etc. You
can find word lists, and libraries like NLTK
have them built in)</div>
<div>3. Use stemming algorithm to reduce word
tokens to their, I think the correct
vocabulary is, free morpheme (e.g. convert the
token word "passing" to "pass")</div>
<div>4. Rank by frequency of result.</div>
<div><br>
</div>
<div>That should get you in the neighborhood.</div>
<br>
<div class="gmail_quote">
<div>On Fri, Dec 14, 2012 at 9:11 AM, Matthew
Keys <span dir="ltr"><<a href="mailto:mk6032@yahoo.com" target="_blank">mk6032@yahoo.com</a>></span>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<div>
<div style="font-size:12pt;font-family:times new roman,new york,times,serif">Does
anyone know how to create a tag clouds
based on the body of an email? The
google gods point me in the direction
of outlook pluggins but I'm looking
for something more linux cli
scriptable; maybe something that could
parse through an exported
mailbox/folder.<br>
</div>
</div>
<br>
</div>
_______________________________________________<br>
Chugalug mailing list<br>
<a href="mailto:Chugalug@chugalug.org" target="_blank">Chugalug@chugalug.org</a><br>
<a href="http://chugalug.org/cgi-bin/mailman/listinfo/chugalug" target="_blank">http://chugalug.org/cgi-bin/mailman/listinfo/chugalug</a><br>
<br>
</blockquote>
</div>
<br>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
<br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
Chugalug mailing list
<a href="mailto:Chugalug@chugalug.org" target="_blank">Chugalug@chugalug.org</a>
<a href="http://chugalug.org/cgi-bin/mailman/listinfo/chugalug" target="_blank">http://chugalug.org/cgi-bin/mailman/listinfo/chugalug</a>
</pre>
</blockquote>
<br>
</div></div></div>
<br>_______________________________________________<br>
Chugalug mailing list<br>
<a href="mailto:Chugalug@chugalug.org">Chugalug@chugalug.org</a><br>
<a href="http://chugalug.org/cgi-bin/mailman/listinfo/chugalug" target="_blank">http://chugalug.org/cgi-bin/mailman/listinfo/chugalug</a><br>
<br></blockquote></div><br>