Lesson 4: Python Modules and I/O

Hopefully the last lesson got you comfy with basic Python variables and data structures. This next lesson will be kind of a lift if you are not familiar with object-oriented programming languages. But it should be mostly straightforward. It’ll also be structured a little differently. I’m not going to ask you to do anything in the course of the lab, since at the end of the introduction of concepts I will give you a lab file that you must complete.

Modules

Remember in the last lesson when we said that Python was object-oriented? What that means is that there things in Python called “objects”, data structures that act independently of other pieces of code. They have their own internal fields and functions. Let’s think of a common object, like a ball. Now maybe this ball has some properties: color, radius, weight, material, etc. And then you can probably do things with the ball: bounce it, catch it, etc. In this lesson we’re not going to focus on how to write objects, but rather how to use them. (If you want to look more into object-oriented programming, check out the Wikipedia page).

Now, say we want to get a whole set of objects without implementing them ourselves. No use reinventing the wheel, yeah? Python modules solve this problem. Modules are like “libraries” or “packages” in other languages. Once you include them, you get a whole other bunch of functionality.

Today we’re just going to look at one module, called json. If you remember from the first lesson, JSON is the file format that tweets come in. The original tweet looks really messy, but it can be as basic as this:

jstring = '{ "key1": 16, "key2": 1, "key3": 28 }'

To import the json module, we just type this:

import json

Then if we want to parse that JSON string into something Python can understand, we do this:

import json
jstring = '{ "key1": 16, "key2": 1, "key3": 28 }'
jdict   = json.loads(jstring)

Then we can index it like a dictionary. So the following would produce specific output.

import json
jstring = '{ "key1": 16, "key2": 1, "key3": 28 }'
jdict   = json.loads(jstring)
print jdict["key3"]
me@blogclub:~/sandbox/tworkshop/bin$ python jsonTest.py
28

There’s a lot of other ways to use JSON, including reading directly from files and also turning Python objects into JSON. You can check out the Python documentation for JSON here. But for now we’ll just focus on this one way. Let’s move on to files.

File I/O

So of course you want to actually get to playing with data, right? Of course you do. That’s where files come in. There are a number of ways to work with files, but today we’re just going to look at two.

First, let’s try to open a file by its name. Say the name of the file is tweets.json. We open it for reading like this:

f = open('tweets.json', 'r')

Now there are some important things to note here. The r denotes reading. w would denote writing, and a would mean append to an existing file. Also, the file must be in the same directory as the Python script for this work. Otherwise we have to give it a relative path. Say that the file is in your data directory, and your Python script is in the bin directory. You would have to write this:

f = open('../data/tweets.json', 'r')

This is telling Python to go down one directory (..) then go into data to look for the file.

It’s not enough to just open the file, we want to read from it. Python makes this pretty easy to do. We just put it in a loop.

f = open('../data/tweets.json', 'r')
for line in f:
    print line

Python runs through each line from a file if you put it in a loop like this. There are other ways of dealing with files that don’t rely on going line-by-line, but this is the most common format of our data, so we’re just going to focus on this.

Now, the second way to deal with file data is to treat it like standard input from the system. Remember from lesson 2 we talked about file I/O and redirection on the command line. Python handles this very well. We can have Python read directly from standard input (called stdin).

import sys
for line in sys.stdin:
    print line

Notice the import? That’s how you get the file handle for stdin. Then, in the Terminal, you specify which file you want to read in with a pipe.

me@blogclub:~/sandbox/tworkshop/bin$ cat ../data/tweets.json | python readFile.py

Notice the use of the relative path to get the actual file.

Lab — Read information from a set of tweets

Now it’s your turn. You’re going to write a very useful and function programming for processing Twitter data.

First, you need to copy two files from my public directory. We did this in the first lesson, but here’s how you do it here. If you are in your ~/sandbox/tworkshop directory:

me@blogclub:~/sandbox/tworkshop$ cp ~ahanna/public/readJson.py bin
me@blogclub:~/sandbox/tworkshop$ cp ~ahanna/public/100tweets.json data

Then you should open the readJson.py file in jEdit. The rest of the directions are in that file.

You will also need to remember what the name of certain fields are in the tweet data structure. Refresh your memory by going back to the first lesson.