Lesson 2: More Terminal and your first Python script

Hopefully you’ve taken some time to master all the stuff on the previous lesson. If you’ve not done everything there, complete the lesson because you’ll need it to move on.

Today we’re going to focus on a few commands that let you view and edit data and programs. By the end of the lesson you will have learned how to write your first Python script.

Viewing files in the Terminal

To simply look at what’s in files, you can use the more and less commands. They are slightly different — more allows you to only go forward in a file, while less allows you to go forward and backward.

You should be logged into the Terminal now. Navigate to the sandbox/tworkshop/data directory (hint, use the cd command). Now view the prettyExample.json file with more.

me@blogclub:~/sandbox/tworkshop/data$ more prettyExample.json 
    "contributors": null, 
    "truncated": false, 
    "text": "TeeMinus24's Shirt of the Day is Palpatine/Vader '12. Support the Sith. Change you can't stop. http:/
    "in_reply_to_status_id": null, 
    "id": 175090352598945794, 
    "entities": {
        "user_mentions": [], 
        "hashtags": [], 
        "urls": [
                "indices": [
                "url": "http://t.co/wFh1cCep", 
                "expanded_url": "http://fb.me/1isEdQJSq", 
                "display_url": "fb.me/1isEdQJSq"
    "retweeted": false, 
    "coordinates": null, 
    "source": "<a href=\"http://www.facebook.com/twitter\" rel=\"nofollow\">Facebook</a>", 
    "in_reply_to_screen_name": null, 
    "id_str": "175090352598945794", 
    "retweet_count": 0, 
    "in_reply_to_user_id": null, 
    "favorited": false, 

You can press Enter to browse the file line-by-line, or Space to go through it by screen. less is similar, but you can use the Up and Down arrow keys to go up and down, and the Page Up and Page Down keys to go up/down much faster. Once you reach the end of the file, you have to press Q to quit.

You can also view the first or last several lines of a file with the head and tail commands. Say you just want to get the first four lines of a file.

me@blogclub:~/sandbox/tworkshop/data$ head -4 prettyExample.json 
    "contributors": null, 
    "truncated": false, 
    "text": "TeeMinus24's Shirt of the Day is Palpatine/Vader '12. Support the Sith. Change you can't stop. http://t.co/wFh1cCep", 

Or perhaps the last four.

me@blogclub:~/sandbox/tworkshop/data$ tail -4 prettyExample.json 
    "possibly_sensitive_editable": true, 
    "in_reply_to_status_id_str": null, 
    "place": null

Pretty easy.

What if you want to display the file all at once without stopping? You can use the cat command. Let’s try it with the example.json file.

me@blogclub:~/sandbox/tworkshop/data$ cat example.json 
{"possibly_sensitive_editable":true,"text":"TeeMinus24's Shirt of the Day is Palpatine\/Vader '12. Support the Sith. Change you can't stop. http:\/\/t.co\/wFh1cCep","id_str":"175090352598945794","entities":{"urls":[{"indices":[95,115],"expanded_url":"http:\/\/fb.me\/1isEdQJSq","display_url":"fb.me\/1isEdQJSq","url":"http:\/\/t.co\/wFh1cCep"}],"hashtags":[],"user_mentions":[]},"retweeted":false,"place":null,"retweet_count":0,"in_reply_to_status_id_str":null,"coordinates":null,"source":"\u003Ca href=\"http:\/\/www.facebook.com\/twitter\" rel=\"nofollow\"\u003EFacebook\u003C\/a\u003E","in_reply_to_user_id_str":null,"in_reply_to_status_id":null,"favorited":false,"geo":null,"in_reply_to_screen_name":null,"in_reply_to_user_id":null,"truncated":false,"created_at":"Thu Mar 01 05:29:27 +0000 2012","possibly_sensitive":false,"contributors":null,"user":{"geo_enabled":false,"profile_link_color":"009999","id_str":"281077639","listed_count":1,"lang":"en","notifications":null,"location":"","is_translator":false,"follow_request_sent":null,"statuses_count":461,"profile_background_color":"131516","followers_count":43,"profile_image_url":"http:\/\/a0.twimg.com\/profile_images\/1428484273\/TeeMinus24_logo_normal.jpg","default_profile":false,"profile_background_tile":true,"description":"We are a limited edition t-shirt company. We make tees that are designed for the fan; movies, television shows, video games, sci-fi, web, and tech. We have it!","following":null,"profile_sidebar_fill_color":"efefef","contributors_enabled":false,"profile_background_image_url_https":"https:\/\/si0.twimg.com\/images\/themes\/theme14\/bg.gif","verified":false,"profile_sidebar_border_color":"eeeeee","profile_image_url_https":"https:\/\/si0.twimg.com\/profile_images\/1428484273\/TeeMinus24_logo_normal.jpg","default_profile_image":false,"protected":false,"show_all_inline_media":false,"profile_use_background_image":true,"favourites_count":0,"created_at":"Tue Apr 12 15:48:23 +0000 2011","name":"Vincent Genovese","friends_count":52,"profile_text_color":"333333","url":"http:\/\/www.teeminus24.com","id":281077639,"profile_background_image_url":"http:\/\/a0.twimg.com\/images\/themes\/theme14\/bg.gif","time_zone":"Eastern Time (US & Canada)","utc_offset":-18000,"screen_name":"TeeMinus24"},"id":175090352598945794}

It’ll look different on your screen — since the file is just one long line, it’ll wrap to the width of your window.

You may ask why anyone would ever want to dump the output of a whole file to the Terminal. Well, sometimes you know the file is small. But sometimes you want to use one file as the input to another file. I’ll get to that below when we look at file redirection and pipes. For now, let’s go to searching through files.

Searching through files

Say you have a big file and you want to know some basic characteristics about it. Does it contain a specific word? How many lines and words are in it?

There’s a number of UNIX commands that can accomplish this. For this lesson we’re just going to focus on the grep and wc commands. grep is a very powerful program that matches a pattern to any parts of the file that match it.

Here’s a basic example. Since we know from the last lesson that the line which has the actual tweet is called “text”, we can get the tweet with grep.

me@blogclub:~/sandbox/tworkshop/data$ grep text prettyExample.json 
    "text": "TeeMinus24's Shirt of the Day is Palpatine/Vader '12. Support the Sith. Change you can't stop. http://t.co/wFh1cCep", 
        "profile_text_color": "333333",

Note that it got all of the lines that had “text” in them. So if the user’s name was “UNIXtextwizard”, you would pick up that part of the file as well.

How about the inverse, i.e. we want everything BUT lines with “text” in them. We would just use grep with the -v option to get the inverse. Try this on your own.

The next command is wc. This tells us some descriptives about the file.

me@blogclub:~/sandbox/tworkshop/data$ wc prettyExample.json 
  77  203 3026 prettyExample.json

The first number is the number of lines, the second is words, and the third is characters. If you want just one of them, you can use an option after wc. For lines, this option is -l.

me@blogclub:~/sandbox/tworkshop/data$ wc -l prettyExample.json 
77 prettyExample.json

There are 77 lines in prettyExample.json.

You may have noticed that there are a lot of options for each of these commands. I’m just showing you the bare minimum of what you can do with all of this stuff. If you ever want to know more about what these commands can do, you can use the manual file for the command. Type man [command], where [command] is the command which you want to see more information for. Try man grep. The man command uses less to display the manual file, so you can browse the file in the exact same way you used less above. If you browse forums enough for information on how to do something technical, you’ll often run into the phrase “RTFM”, which means “Read The F%&* Manual.” It’s good advice that I’d highly recommend. 🙂

I/O Redirection

I/O redirection is a way to redirect where the input or output of a file or command goes. In UNIX, there’s three I/O “streams” that act basically line files. One is called stdin (pronounced standard in), stdout, standard out, and stderr, or standard error (not to be confused with the statistical concept). That’s probably too much information for now, but it helps to contextualize what is happening when we use certain commands.

The easier one to understand conceptually is output redirection. Say you ran a really cool search and you want to keep the results somewhere. Let’s go with the grep from above.

me@blogclub:~/sandbox/tworkshop/data$ grep text prettyExample.json > textMentions.txt
me@blogclub:~/sandbox/tworkshop/data$ cat textMentions.txt 
    "text": "TeeMinus24's Shirt of the Day is Palpatine/Vader '12. Support the Sith. Change you can't stop. http://t.co/wFh1cCep", 
        "profile_text_color": "333333",

We redirected the output of grep to textMentions.txt. Cool stuff.

How about giving a command standard input? Let’s use grep again.

me@blogclub:~/sandbox/tworkshop/data$ grep text < prettyExample.json 
    "text": "TeeMinus24's Shirt of the Day is Palpatine/Vader '12. Support the Sith. Change you can't stop. http://t.co/wFh1cCep", 
        "profile_text_color": "333333", 

You may be asking, but how’s that different from above? Well, grep expects a file by default. But there are commands that just take anything from the standard input stream (like from the keyboard) and not necessarily a file.

Finally, we can chain these two together. Let’s look for a different thing in grep, like “name”.

me@blogclub:~/sandbox/tworkshop/data$ grep name < prettyExample.json > nameMentions.txt 
me@blogclub:~/sandbox/tworkshop/data$ more nameMentions.txt 
    "in_reply_to_screen_name": null, 
        "screen_name": "TeeMinus24", 
        "name": "Vincent Genovese", 

Note that I/O redirection with > will overwrite the previous file. If you want to append to an existing file, use >>.

Finally, the coolest kind of I/O redirection is with what are called pipes. Pipes take the output of command and use it as the input for another. They are represented by the | character, which probably the same key as \ on your keyboard.

Say you want to get the number of lines that contain the word “profile” in prettyExample.json. You can use a pipe to connect grep and wc. Check it.

me@blogclub:~/sandbox/tworkshop/data$ grep profile prettyExample.json | wc -l

Whoa! Awesome. We can get more complicated and chain a number of them together. For example, let’s find out how many user directories have the character “j” in them.

me@blogclub:~/sandbox/tworkshop/data$ ls -l /home | grep j | wc -l

If you recall, ls -l will print the directory listing by line. So we can treat it as an input that lists things by line for grep.

For an exercise, how would you merge the two files you created above (nameMentions.txt and textMentions.txt) into one file called nameTextMentions.txt?

Once you get the hang of pipes and I/O redirection, you will be using them all the time.

Editing and Writing Files

Now we get to the exciting part. This is where we start editing code.

First thing’s first — get a text editor. Go to http://www.jedit.org/ and download jEdit. jEdit is a free, open-source text editor written in Java. It has a ton of cool features and makes editing files on remote servers a snap. Which is why we are using it.

Once you have downloaded it and installed it, go to Plugins->Plugins Manager in the menu. You should get a screen that looks like this.

Click the “Install” tab and find the “FTP” plugin. Select the checkbox and click install. Once it installs, close the window. Now, select Plugin->FTP->Open from Secure FTP Server… from the menu. Type in all your information so looks like the screenshot below.

Now, once it loads, navigate to the “data” directory like you would with a GUI file manager. Open up prettyExample.json.

Once you’ve opened the file, it should be displayed just like this.

Pretty cool, huh? Now leave this file alone — we don’t want to mess with it. Close the file. Now you have a blank file. We’re going to start coding. In Python.

However, before that, I need to give a little background on Python. Python is an interpreted script language. This is different from languages like Java or C, which are traditionally compiled into a language that the computer can read directly. Python is different. It reads files like a script — line-by-line and executing commands in a procedural fashion.

To see this in action, go back to your Terminal. Type python.

me@blogclub:~$ python
Python 2.6.5 (r265:79063, Apr 16 2010, 15:32:38) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.

So you are getting a prompt here, just like you do with the UNIX Terminal. Let’s do the first thing you learn in every programming language: print “Hello World”.

>>> print "Hello World"
Hello World

It worked! Hopefully. Let’s get out of here. Press Ctrl+D to exit. Let’s go back to jEdit. Now type the same thing into jEdit. Now, in the menu go to Plugins->FTP->Save to FTP Server…. Type in your information like before, then navigate to ~/sandbox/tworkshops/bin. In the file name box at the bottom of the dialogue, call the file hello.py. Click Save.

Finally, go back to the Terminal and get to the ~/sandbox/tworkshops/bin directory. (Protip — if you’re in the ~/sandbox/tworkshops/data directory you can just type cd ../bin).

Now we get to run the program from the file.

me@blogclub:~/sandbox/tworkshop/bin$ python hello.py 
Hello World

I’ll leave it there for now. If you want to jump ahead and try stuff on your own (you know, RTFM), the Python documentation is at http://docs.python.org/. We’ll get into some of the finer points of Python in the next lesson.