Cassandra and PHP - Twissandra

In the previous post I've described ho to configure configuration and start Cassandra (more nodeson the same machine), and this time I'd like to present some experiences with accessing Cassandra nodes from PHP. Cassandra is often demonstrated using Twissandra, a schema designed for a Twitter-like application, and this post uses the Twissandra schema too ...

This post is in a way follow-up of the article Cassandra By Example by Eric Evans, that describes basic Cassandra principles (what is a keyspace, column family), compares it to the corresponding relational schema, and demonstrates how to access Cassandra from Python (using pycassa library).

I really don't want to steal ideas from that article or duplicate it - I'd like to update some of the information for current version of Cassandra (0.7.x) and fill in information about access from PHP. I definitely recommend you to read the article by Eric Evans, especially the first half of the article, i.e. sections "Twitter" (presenting a relational schema for Twitter) and "Twissandra" (presenting a corresponding schema for Cassandra).

Creating the schema

The original article by Eric Evans was written for Cassandra 0.6, which means the structure of the database (keyspaces, column families) were declared using an XML file. That changed in 0.7, because XML was completely eliminated and the schema is defined using a command-line utility cassandra-cli. So execute cassandra-cli and create the Twissandra keyspace and column families:

create keyspace Twissandra;

use Twissandra;

create column family User      with comparator = UTF8Type;
create column family Username  with comparator = BytesType;
create column family Friends   with comparator = BytesType;
create column family Followers with comparator = BytesType;
create column family Tweet     with comparator = UTF8Type;
create column family Userline  with comparator = LongType;
create column family Timeline  with comparator = LongType;

The explanation what a keyspace and column families are, as well as what are those created column families for, is in the previously mentioned article.

phpcassa library

There are two client PHP libraries - phpcassa and SimpleCassie. I've tried both and I definitely recommend the former one. SimpleCassie is presented as the simpler to use (which is very tempting, especially for newbies), but the docs is really sparse and I've been unable to make some of the features work. On the other hand, phpcassa worked fine right from the beginning and the documentation is great. Moreover it's a "PHP version" of pycassa library, so most of the Python examples may be rewriten very easily.

Installing it is really simple - just download a suitable package, unpack it and set the include_path at the beginning of a script (or in .htaccess file).

Connecting to Cassandra and initialization

First thing you need to do is to open a connection to Cassandra (at least one of the nodes) and initialize the object representing the column families.

/* load phpcassa libraries */
require_once('phpcassa/connection.php');
require_once('phpcassa/columnfamily.php');

/* connect to the Twissandra keyspace at IP 10.0.0.1 */
$conn = new ConnectionPool('Twissandra', array('10.0.0.1:9160',
                                               '10.0.0.2:9160',
                                               '10.0.0.3:9160'));

/* initialize column families */
$user_cf = new ColumnFamily($conn, 'User');
$username_cf = new ColumnFamily($conn, 'Username');
$friends_cf = new ColumnFamily($conn, 'Friends');
$followers_cf = new ColumnFamily($conn, 'Followers');
$tweet_cf = new ColumnFamily($conn, 'Tweet');
$userline_cf = new ColumnFamily($conn, 'Userline');
$timeline_cf = new ColumnFamily($conn, 'Timeline');

and now we can implement the functions. First the functions to create a user

function create_user($userid, $username, $password) {

    global $user_cf, $username_cf;

    $user_cf->insert($userid, array('id' => $userid,
                                    'username' => $username,
                                    'password' => $password));

    $username_cf->insert($username, array('userid' => $userid));

}

and creating a "follower" (user who follows tweets posted by a friend)

function add_follower($userid, $friendid) {

    global $friends_cf, $followers_cf;

    $friends_cf->insert($friendid, array($userid => time()));
    $followers_cf->insert($userid, array($friendid => time()));

}

The last function creates the tweets

function create_tweet($userid, $tweetid, $body) {

    global $tweet_cf, $timeline_cf, $userline_cf, $followers_cf;

    $time = time();

    $tweet_cf->insert($tweetid, array('id' => $tweetid,
                                      'userid' => $userid,
                                      'body' => $body,
                                      '_ts' => $time));

    $userline_cf->insert($userid, array($time => $tweetid));
    $timeline_cf->insert($userid, array($time => $tweetid));

    foreach ($followers_cf->get($userid) AS $fid => $ftime) {
        $timeline_cf->insert($fid, array($time => $tweetid));
    }

}

and the last two functions are responsible for reading data - listing tweets posted by a particular user and listing a user's timeline (i.e. tweets posted by the user and his friends):

function get_tweets($userid) {

    global $userline_cf, $tweet_cf;

    $timeline = $userline_cf->get($userid);

    return $tweet_cf->multiget($timeline);

}

function get_timeline($userid) {

    global $timeline_cf, $tweet_cf;

    $timeline = $timeline_cf->get($userid);

    return $tweet_cf->multiget($timeline);

}

Sure, these functions are not perfect and there are issues to fix (e.g. using a more precise time than provided by time() function), eliminating global variables (that are used mostly for sake of simplicity), unhandled exceptions e.g. when an item with the given key is not found.

A more detailed info about how to use the phpcassa library may be found hrere, API docs are available here. The script implementing the functions listed above (and a brief example of how to use them) may be downloaded here.

Comments

There are no comments for this article (or are awaiting acceptance).

New comment

All the comments have to be accepted, so there may be some delay between submitting and accepting (or rejecting) the comment. If you enter the e-mail address, you will be informed about acceptance or rejection.

Subject or body may not contain HTML tags - they will be automatically removed. Paragraphs may be separated using a newline (ENTER).

(optional)