preface
         1. introduction: hacking on twitter data
         installing python development tools
         collecting and manipulating twitter data
         tinkering with twitter's apl
         frequency analysis and lexical diversity
         visualizing tweet graphs
         synthesis: visualizing retweets with protovis
         closing remarks
         2. microformats: semantic markup and common sense collide
         xfn and friends
         exploring social connections with xfn
         a breadth-first crawl of xfn data
         geocoordinates: a common thread for just about anything
         wikipedia articles + google maps = road trip?
         slicing and dicing recipes (for the health of it)
         collecting restaurant reviews
         summary
         3. mailboxes: oldies but goodies
         .mbox: the quick and dirty on unix mailboxes
         mbox + couchdb = relaxed email analysis
         bulk loading documents into couchdb
         sensible sorting
         map/reduce-inspired frequency analysis
         sorting documents by value
         couchdb-lucene: full-text indexing and more
         threading together conversations
         look who's talking
         visualizing mail "events" with simile timeline
         analyzing your own mail data
         the graph your (gmail) inbox chrome extension
         closing remarks
         4. twitter: friends, followers, and setwise operations
         restful and oauth-cladded apis
         no, you can't have my password
         a lean, mean data-collecting machine
         a very brief refactor interlude
         redis: a data structures server
         elementary set operations
         souping up the machine with basic friend/follower metrics
         calculating similarity by computing common friends and followers
         measuring influence
         constructing friendship graphs
         clique detection and analysis
         the infochimps "strong links" apl
         interactive 3d graph visualization
         summary
         5. twitter: the tweet, the whole tweet, and nothing but the tweet
         pen: sword∷ tweet: machine gun (?!?)
         analyzing tweets (one entity at a time)
         tapping (tim's) tweets
         who does tim retweet most often?
         what's tim's influence?
         how many of tim's tweets contain hashtags?
         juxtaposing latent social networks (or #justinbieber versus #teaparty)
         what entities co-occur most often with #justinbieber and #teaparty
         tweets?
         on average, do #justinbieber or #teaparty tweets have more
         hashtags?
         which gets retweeted more often: #justinbieber or #teaparty?
         how much overlap exists between the entities of #teaparty and
         #justinbieber tweets?
         visualizing tons of tweets
         visualizing tweets with tricked-out tag clouds
         visualizing community structures in twitter search results
         closing remarks
         6. linkedln: clustering your professional network for fun (and profit?)
         motivation for clustering
         clustering contacts by job title
         standardizing and counting job titles
         common similarity metrics for clustering
         a greedy approach to clustering
         hierarchical and k-means clustering
         fetching extended profile information
         geographically clustering your network
         mapping your professional network with google earth
         mapping your professional network with dorling cartograms
         closing remarks
         ?. 6oogle buzz: tf-idf, cosine similarity, and collocations
         buzz = twitter + blogs (???)
         data hacking with nltk
         text mining fundamentals
         a whiz-bang introduction to tf-idf
         querying buzz data with tf-idf
         finding similar documents
         the theory behind vector space models and cosine similarity
         clustering posts with cosine similarity
         visualizing similarity with graph visualizations
         buzzing on bigrams
         how the collocation sausage is made: contingency tables and scoring
         functions
         tapping into your gmail
         accessing gmail with oauth
         fetching and parsing email messages
         before you go off and try to build a search engine.
         closing remarks
         8. blogs et al.: natural language processing (and beyond)
         nlp: a pareto-like introduction
         syntax and semantics
         a brief thought exercise
         a typical nlp pipeline with nltk
         sentence detection in blogs with nltk
         summarizing documents
         analysis of luhn's summarization algorithm
         entity-centric analysis: a deeper understanding of the data
         quality of analytics
         closing remarks
         9. facebook: the all-in-one wonder
         tapping into your social network data
         from zero to access token in under 10 minutes
         facebook's query apis
         visualizing facebook data
         visualizing your entire social network
         visualizing mutual friendships within groups
         where have my friends all gone? (a data-driven game)
         visualizing wall data as a (rotating) tag cloud
         closing remarks
         10. the semantic web: a cocktail discussion
         an evolutionary revolution?
         man cannot live on facts alone
         open-world versus closed-world assumptions
         inferencing about an open world with fuxi
         hope
         index
      · · · · · ·     (
收起)