This is the website for the Blogclub Twitter Data Processing Workshops — or as I like to call them, the Tworkshops. The intent behind these workshops is to train social scientists not well-versed in computer programming, UNIX environments, and large (non-statistical) datasets on how to process all of this stuff.
Categories that we’re going to be working on include the following:
- Breaking down the structure of a tweet
- How to use the Linux command-line
- Python for non-programmers
- Parallel processing of data (Hadoop and MapReduce)
- And many more!
I’m still learning a lot of this myself, especially the parallel programming aspects, so any feedback at all is more than welcome. Feel free to post comments on the pages for questions and comments.
For now, we’re going to start with Lesson 1.