<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">I ran across a few like that, too. I'm
      a bit confused as to the difference between a word cloud and a tag
      cloud. I'm guessing tag clouds presume that you've attached some
      form of tag to an example text, which the code would use to sort
      upon whereas word clouds you just point the code to a pile of text
      that has not been tagged/grouped?<br>
      <br>
      On 12/15/2012 03:44 AM, Sean Brewer wrote:<br>
    </div>
    <blockquote
cite="mid:CANEHAud4mb_um1NqVKRUUueMwoN8HOn_bk8zgngaUpKY8ye5XQ@mail.gmail.com"
      type="cite">I ran across this: <a moz-do-not-send="true"
        href="https://github.com/larsmans/weighwords">https://github.com/larsmans/weighwords</a>
      <div><br>
      </div>
      <div>It might make what you want to do even easier.<br>
        <br>
        <div class="gmail_quote">On Sat, Dec 15, 2012 at 2:14 AM, Sean
          Brewer <span dir="ltr"><<a moz-do-not-send="true"
              href="mailto:seabre986@gmail.com" target="_blank">seabre986@gmail.com</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">Actually,
            you want to do something called lemmaisation, not stemming,
            although they are related, stemming does something slightly
            different. Lemmaisation does what I described.
            <div>
              <br>
            </div>
            <div>I can probably whip up a dirty example with python and
              nltk. 
              <div>
                <div class="h5"><br>
                  <br>
                  <div class="gmail_quote">On Fri, Dec 14, 2012 at 10:14
                    AM, Sean Brewer <span dir="ltr"><<a
                        moz-do-not-send="true"
                        href="mailto:seabre986@gmail.com"
                        target="_blank">seabre986@gmail.com</a>></span>
                    wrote:<br>
                    <blockquote class="gmail_quote" style="margin:0 0 0
                      .8ex;border-left:1px #ccc solid;padding-left:1ex">
                      <div>If you can export the e-mails easily, general
                        algorithm is something like this:</div>
                      <div>
                        <div>1. Tokenize the words in the e-mail body.</div>
                        <div>2. Remove stop words (a, an, the, etc.  You
                          can find word lists, and libraries like NLTK
                          have them built in)</div>
                        <div>3. Use stemming algorithm to reduce word
                          tokens to their, I think the correct
                          vocabulary is, free morpheme (e.g. convert the
                          token word "passing" to "pass")</div>
                        <div>4. Rank by frequency of result.</div>
                        <div><br>
                        </div>
                        <div>That should get you in the neighborhood.</div>
                        <br>
                        <div class="gmail_quote">
                          <div>On Fri, Dec 14, 2012 at 9:11 AM, Matthew
                            Keys <span dir="ltr"><<a
                                moz-do-not-send="true"
                                href="mailto:mk6032@yahoo.com"
                                target="_blank">mk6032@yahoo.com</a>></span>
                            wrote:<br>
                          </div>
                          <blockquote class="gmail_quote"
                            style="margin:0 0 0 .8ex;border-left:1px
                            #ccc solid;padding-left:1ex">
                            <div>
                              <div>
                                <div
                                  style="font-size:12pt;font-family:times
                                  new roman,new york,times,serif">Does
                                  anyone know how to create a tag clouds
                                  based on the body of an email? The
                                  google gods point me in the direction
                                  of outlook pluggins but I'm looking
                                  for something more linux cli
                                  scriptable; maybe something that could
                                  parse through an exported
                                  mailbox/folder.<br>
                                </div>
                              </div>
                              <br>
                            </div>
_______________________________________________<br>
                            Chugalug mailing list<br>
                            <a moz-do-not-send="true"
                              href="mailto:Chugalug@chugalug.org"
                              target="_blank">Chugalug@chugalug.org</a><br>
                            <a moz-do-not-send="true"
                              href="http://chugalug.org/cgi-bin/mailman/listinfo/chugalug"
                              target="_blank">http://chugalug.org/cgi-bin/mailman/listinfo/chugalug</a><br>
                            <br>
                          </blockquote>
                        </div>
                        <br>
                      </div>
                    </blockquote>
                  </div>
                  <br>
                </div>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
Chugalug mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Chugalug@chugalug.org">Chugalug@chugalug.org</a>
<a class="moz-txt-link-freetext" href="http://chugalug.org/cgi-bin/mailman/listinfo/chugalug">http://chugalug.org/cgi-bin/mailman/listinfo/chugalug</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>