<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">Nice start! I started working on it in
      python and pointing to a mbox source but I keep getting hung up on
      the method of extraction. I can't decide if I should focus on the
      subject or the body... or maybe I should focus on both? The
      subject is usually pretty condensed to begin with and I'm thinking
      that'd be the smarter place to start... but it wouldn't be as
      thorough. The body throws in problems like possible multipart
      messages, strange encodings, etc. It would be interesting to see
      different results using a chugalug export of maybe the month of
      December.<br>
      <br>
      On 12/17/2012 11:07 PM, Sean Brewer wrote:<br>
    </div>
    <blockquote
cite="mid:CANEHAucoY5gJLPu6xh8DdxosScYDko3tfLBfuQqpVxYMsZk0Lw@mail.gmail.com"
      type="cite">Here's an example of what I'm thinking: <a
        moz-do-not-send="true" href="https://gist.github.com/4324904">https://gist.github.com/4324904</a>
      <div><br>
      </div>
      <div>It's in ruby, though. I found a neat stemmer/lemmatizer
        algorithm and an implementation in ruby, but not in python.</div>
      <div><br>
      </div>
      <div>Here's example output: <a moz-do-not-send="true"
          href="https://gist.github.com/4324904#comment-657626">https://gist.github.com/4324904#comment-657626</a><br>
        <br>
        <div class="gmail_quote">On Sat, Dec 15, 2012 at 8:06 PM, Sean
          Brewer <span dir="ltr"><<a moz-do-not-send="true"
              href="mailto:seabre986@gmail.com" target="_blank">seabre986@gmail.com</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">I forgot
            to add, that you could use all that stuff to find a probable
            topic of a conversation, which is a basically a tag. I
            thought that might be the direction you were heading. I
            could be wrong.
            <div class="HOEnZb">
              <div class="h5"><br>
                <br>
                <div class="gmail_quote">
                  On Sat, Dec 15, 2012 at 1:00 PM, Sean Brewer <span
                    dir="ltr"><<a moz-do-not-send="true"
                      href="mailto:seabre986@gmail.com" target="_blank">seabre986@gmail.com</a>></span>
                  wrote:<br>
                  <blockquote class="gmail_quote" style="margin:0 0 0
                    .8ex;border-left:1px #ccc solid;padding-left:1ex">
                    Yeah, I think that's the difference. Code for the
                    word cloud makes a cloud for most commonly used
                    words.   
                    <div>
                      <div><br>
                        <br>
                        <div class="gmail_quote">On Sat, Dec 15, 2012 at
                          4:34 AM, Matt Keys <span dir="ltr"><<a
                              moz-do-not-send="true"
                              href="mailto:mk6032@yahoo.com"
                              target="_blank">mk6032@yahoo.com</a>></span>
                          wrote:<br>
                          <blockquote class="gmail_quote"
                            style="margin:0 0 0 .8ex;border-left:1px
                            #ccc solid;padding-left:1ex">
                            <div text="#000000" bgcolor="#FFFFFF">
                              <div>I ran across a few like that, too.
                                I'm a bit confused as to the difference
                                between a word cloud and a tag cloud.
                                I'm guessing tag clouds presume that
                                you've attached some form of tag to an
                                example text, which the code would use
                                to sort upon whereas word clouds you
                                just point the code to a pile of text
                                that has not been tagged/grouped?
                                <div>
                                  <div><br>
                                    <br>
                                    On 12/15/2012 03:44 AM, Sean Brewer
                                    wrote:<br>
                                  </div>
                                </div>
                              </div>
                              <div>
                                <div>
                                  <blockquote type="cite">I ran across
                                    this: <a moz-do-not-send="true"
                                      href="https://github.com/larsmans/weighwords"
                                      target="_blank">https://github.com/larsmans/weighwords</a>
                                    <div><br>
                                    </div>
                                    <div>It might make what you want to
                                      do even easier.<br>
                                      <br>
                                      <div class="gmail_quote">On Sat,
                                        Dec 15, 2012 at 2:14 AM, Sean
                                        Brewer <span dir="ltr"><<a
                                            moz-do-not-send="true"
                                            href="mailto:seabre986@gmail.com"
                                            target="_blank">seabre986@gmail.com</a>></span>
                                        wrote:<br>
                                        <blockquote class="gmail_quote"
                                          style="margin:0 0 0
                                          .8ex;border-left:1px #ccc
                                          solid;padding-left:1ex">Actually,

                                          you want to do something
                                          called lemmaisation, not
                                          stemming, although they are
                                          related, stemming does
                                          something slightly different.
                                          Lemmaisation does what I
                                          described.
                                          <div> <br>
                                          </div>
                                          <div>I can probably whip up a
                                            dirty example with python
                                            and nltk. 
                                            <div>
                                              <div><br>
                                                <br>
                                                <div class="gmail_quote">On
                                                  Fri, Dec 14, 2012 at
                                                  10:14 AM, Sean Brewer
                                                  <span dir="ltr"><<a
moz-do-not-send="true" href="mailto:seabre986@gmail.com" target="_blank">seabre986@gmail.com</a>></span>
                                                  wrote:<br>
                                                  <blockquote
                                                    class="gmail_quote"
                                                    style="margin:0 0 0
                                                    .8ex;border-left:1px
                                                    #ccc
                                                    solid;padding-left:1ex">
                                                    <div>If you can
                                                      export the e-mails
                                                      easily, general
                                                      algorithm is
                                                      something like
                                                      this:</div>
                                                    <div>
                                                      <div>1. Tokenize
                                                        the words in the
                                                        e-mail body.</div>
                                                      <div>2. Remove
                                                        stop words (a,
                                                        an, the, etc.
                                                         You can find
                                                        word lists, and
                                                        libraries like
                                                        NLTK have them
                                                        built in)</div>
                                                      <div>3. Use
                                                        stemming
                                                        algorithm
                                                        to reduce word
                                                        tokens to their,
                                                        I think the
                                                        correct
                                                        vocabulary is,
                                                        free morpheme
                                                        (e.g. convert
                                                        the token word
                                                        "passing" to
                                                        "pass")</div>
                                                      <div>4. Rank by
                                                        frequency of
                                                        result.</div>
                                                      <div><br>
                                                      </div>
                                                      <div>That should
                                                        get you in the
                                                        neighborhood.</div>
                                                      <br>
                                                      <div
                                                        class="gmail_quote">
                                                        <div>On Fri, Dec
                                                          14, 2012 at
                                                          9:11 AM,
                                                          Matthew Keys <span
                                                          dir="ltr"><<a
moz-do-not-send="true" href="mailto:mk6032@yahoo.com" target="_blank">mk6032@yahoo.com</a>></span>
                                                          wrote:<br>
                                                        </div>
                                                        <blockquote
                                                          class="gmail_quote"
                                                          style="margin:0
                                                          0 0
                                                          .8ex;border-left:1px
                                                          #ccc
                                                          solid;padding-left:1ex">
                                                          <div>
                                                          <div>
                                                          <div
                                                          style="font-size:12pt;font-family:times
                                                          new roman,new
york,times,serif">Does anyone know how to create a tag clouds based on
                                                          the body of an
                                                          email? The
                                                          google gods
                                                          point me in
                                                          the direction
                                                          of outlook
                                                          pluggins but
                                                          I'm looking
                                                          for something
                                                          more linux cli
                                                          scriptable;
                                                          maybe
                                                          something that
                                                          could parse
                                                          through an
                                                          exported
                                                          mailbox/folder.<br>
                                                          </div>
                                                          </div>
                                                          </div>
                                                        </blockquote>
                                                      </div>
                                                    </div>
                                                  </blockquote>
                                                </div>
                                              </div>
                                            </div>
                                          </div>
                                        </blockquote>
                                      </div>
                                    </div>
                                  </blockquote>
                                </div>
                              </div>
                            </div>
                          </blockquote>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </div>
              </div>
            </div>
          </blockquote>
        </div>
      </div>
      <br>
    </blockquote>
    <br>
  </body>
</html>