Mining Twitter with Python

You can also mining Twitter data with Python. Alex Hanna wrote an excellent step-by-step DIY manual for collecting real-time Twitter data with the Streaming API using Python on BadHessian blog. You’d better to read that blog post if you already have some knowledge on Python. However, as a beginner, I had some trouble with doing really basic stuffs, like even installing tweepy package. So I want to address the problems I had when I follow Alex’s instruction and try to write how I resolve those.

A Tweet Has A Lot of Information

When you connected to Twitter API, you will get a lot of data back. However, although a tweet is limited to 140 characters, the data you’ll get are not like tweets that you see on your iPhone; instead, it looks like a page of some engineering student’s text book. See below, that what a tweet looks like.

Isn’t this scary (and awesome at the same time)? Only one tweet has that much information! And you can extract a part of information from a tweet with simple Python or R scripts (I’ll talk about this sometime soon).

Collecting Data

  • Before get started

There are some cases that your computer doesn’t have essential stuffs that you need, such as Python setuptool and some components that enables to run git command.

First, let’s install Python setuptools. Check your Python version by typing

python –version

in your Terminal. Then go here (http://pypi.python.org/pypi/setuptools or http://pypi.python.org/pypi/setuptools#files) and download the file that is relevant to you. For example, if you are using Python 2.7.3 on Mac, you need to download setuptools-0.6c11-py2.7.egg and run the following script.

sh setuptools-0.6c11-py2.7.egg

(when it does not work, try this:)

sudo sh setuptools-0.6c11-py2.7.egg

Then you are good to go.

Second, if you have a trouble with using git comman on your Mac, see this post.

  • Install tweepy package

So, let’s collect tweets using Python! Basically, I will follow the steps Alex introduced in this post and will address problems I had when I follow those steps, mainly because of my lack of knowledge in programming.

First, we need to install tweepy package, which is the scripts  that other people already written for you that help connect your computer to Twitter API more easily. (If you are familiar with R, package in Python is not different from package/library in R.) So, let’s do it. Open your Terminal.app and type as follows:

easy_install tweepy

Wait, some of you might get an error as follows (if not, it’s good for you):

error: can’t create or remove files in install directory

The following error occurred while trying to add or remove files in the
installation directory:

[Errno 13] Permission denied: ‘/Library/Python/2.7/site-packages/test-easy-install-1334.write-test’

The installation directory you specified (via –install-dir, –prefix, or
the distutils default setting) was:

/Library/Python/2.7/site-packages/

Perhaps your account does not have write access to this directory? If the
installation directory is a system-owned directory, you may need to sign in
as the administrator or “root” account. If you do not have administrative
access to this machine, you may wish to choose a different installation
directory, preferably one that is listed in your PYTHONPATH environment
variable.

For information on other options, you may wish to consult the
documentation at:

http://peak.telecommunity.com/EasyInstall.html

Please make the appropriate changes for your system and try again.

However, don’t worry. This happens just because you are trying to install the package without the administrator permission of your system. This problem can be easily solved by adding one command before the original command.

sudo easy_install tweepy

Or you can download source archive here. After downloading, extract the archive and type as follows to install the package.

cd tweepy-master

sudo python setup.py install

You can do that using git command too.

git clone git://github.com/raynach/tweepy.git

cd tweepy

python setup.py build

sudo python setup.py install

  • To Handle Incoming Data

To handle incoming data from Twitter API, we need to create a Python script called StreamListener.  Download the file by clicking here and change the file extension from .jpg to .py.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s