I've been playing with a new side project; wine data analytics mined from Twitter, and stored in MongoDb.
I took a first stab at writing a Python celery service to search for wine related twitter posts and dump them in MongoDb. First lesson learned is that a ton of people post about the word 'Wine' on Sunday afternoons. I wrote the service to pull 100 posts, then search again using the lowest post id as the max id to return. After storing 11K posts in MongoDb I looked at the earliest date and realized that it was just a few hours worth of data (vs what I assumed would be several days worth).
Ok...so 'Wine' is just way too broad. So I changed my list of search terms to NOT include just the single word 'Wine'. Depending on how all this data stores, and how much disk space it takes, I might have to go with more of a streaming analytics approach, where I just read the data, add it to the aggregations, and let it go. We'll see.
During all this, it was good to play with MongoDb. I have an instance running on my macbook. Managing it via the terminal was ok, but found a good mac based UI tool called MongoHub that works pretty good. There is also a JNLP browser based tool called MongoBrowser. Its ok for very simple things when you dont have a lot of data, but its not very functional.