You can also mining Twitter data with Python. Alex Hanna wrote an excellent step-by-step DIY manual for collecting real-time Twitter data with the Streaming API using Python on BadHessian blog. You’d better to read that blog post if you already have some knowledge on Python. However, as a beginner, I had some trouble with doing really basic stuffs, like even installing tweepy package. So I want to address the problems I had when I follow Alex’s instruction and try to write how I resolve those.
A Tweet Has A Lot of Information
When you connected to Twitter API, you will get a lot of data back. However, although a tweet is limited to 140 characters, the data you’ll get are not like tweets that you see on your iPhone; instead, it looks like a page of some engineering student’s text book. See below, that what a tweet looks like.
Isn’t this scary (and awesome at the same time)? Only one tweet has that much information! And you can extract a part of information from a tweet with simple Python or R scripts (I’ll talk about this sometime soon).
- Before get started
There are some cases that your computer doesn’t have essential stuffs that you need, such as Python setuptool and some components that enables to run git command.
First, let’s install Python setuptools. Check your Python version by typing
in your Terminal. Then go here (http://pypi.python.org/pypi/setuptools or http://pypi.python.org/pypi/setuptools#files) and download the file that is relevant to you. For example, if you are using Python 2.7.3 on Mac, you need to download setuptools-0.6c11-py2.7.egg and run the following script.
(when it does not work, try this:)
sudo sh setuptools-0.6c11-py2.7.egg
Then you are good to go.
Second, if you have a trouble with using git comman on your Mac, see this post.
- Install tweepy package
So, let’s collect tweets using Python! Basically, I will follow the steps Alex introduced in this post and will address problems I had when I follow those steps, mainly because of my lack of knowledge in programming.
First, we need to install tweepy package, which is the scripts that other people already written for you that help connect your computer to Twitter API more easily. (If you are familiar with R, package in Python is not different from package/library in R.) So, let’s do it. Open your Terminal.app and type as follows:
Wait, some of you might get an error as follows (if not, it’s good for you):
error: can’t create or remove files in install directory
The following error occurred while trying to add or remove files in the
[Errno 13] Permission denied: ‘/Library/Python/2.7/site-packages/test-easy-install-1334.write-test’
The installation directory you specified (via –install-dir, –prefix, or
the distutils default setting) was:
Perhaps your account does not have write access to this directory? If the
installation directory is a system-owned directory, you may need to sign in
as the administrator or “root” account. If you do not have administrative
access to this machine, you may wish to choose a different installation
directory, preferably one that is listed in your PYTHONPATH environment
For information on other options, you may wish to consult the
Please make the appropriate changes for your system and try again.
However, don’t worry. This happens just because you are trying to install the package without the administrator permission of your system. This problem can be easily solved by adding one command before the original command.
sudo easy_install tweepy
Or you can download source archive here. After downloading, extract the archive and type as follows to install the package.
sudo python setup.py install
You can do that using git command too.
git clone git://github.com/raynach/tweepy.git
python setup.py build
sudo python setup.py install
- To Handle Incoming Data
To handle incoming data from Twitter API, we need to create a Python script called StreamListener. Download the file by clicking here and change the file extension from .jpg to .py.