Tuesday, November 25, 2014

Trident-ML: Sentiment Analysis Classifier

Trident-ML comes with a pre-trained twitter sentiment classifier, this post shows how to use this classifier to perform sentiment analysis in Storm.

This post shows some very basic example of how to use the pre-trained twitter sentiment classifier in Trident-ML to classifier sentiment of text which will return true (positive) or false (negative).

Firstly create a Maven project (e.g. with groupId="com.memeanalytics" artifactId="trident-sentiment-classifier"). The complete source codes of the project can be downloaded from the link:

https://dl.dropboxusercontent.com/u/113201788/storm/trident-sentiment-classifier.tar.gz

For the start we need to configure the pom.xml file in the project.

Configure pom.xml:

Firstly we need to add the clojars repository to the repositories section:

<repositories>
<repository>
<id>clojars</id>
<url>http://clojars.org/repo</url>
</repository>
</repositories>

Next we need to add the storm dependency to the dependencies section (for storm):

<dependency>
  <groupId>storm</groupId>
  <artifactId>storm</artifactId>
  <version>0.9.0.1</version>
  <scope>provided</scope>
</dependency>

Next we need to add the strident-ml dependency to the dependencies section (for text classification):

<dependency>
  <groupId>com.github.pmerienne</groupId>
  <artifactId>trident-ml</artifactId>
  <version>0.0.4</version>
</dependency>

Next we need to add the exec-maven-plugin to the build/plugins section (for execute the Maven project):

<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>1.2.1</version>
<executions>
<execution>
<goals>
<goal>exec</goal>
</goals>
</execution>
</executions>
<configuration>
<includeProjectDependencies>true</includeProjectDependencies>
<includePluginDependencies>false</includePluginDependencies>
<executable>java</executable>
<classpathScope>compile</classpathScope>
<mainClass>com.memeanalytics.trident_sentiment_classifier.App</mainClass>
</configuration>
</plugin>

Next we need to add the maven-assembly-plugin to the build/plugins section (for packacging the Maven project to jar for submitting to Storm cluster):

<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.2.1</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass></mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>

Sentiment Classification in Trident topology using Trident-ML implementation

Once the pom.xml update is completed, we can build a Trident topology which uses TwitterSentimentClassifier in a DRPCStream to classify text sentiment in Trident-ML. This is implemented in the main class shown below:

package com.memeanalytics.trident_sentiment_classifier;

import com.github.pmerienne.trident.ml.nlp.TwitterSentimentClassifier;

import storm.trident.TridentTopology;
import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.LocalDRPC;
import backtype.storm.generated.StormTopology;
import backtype.storm.tuple.Fields;

public class App 
{
    public static void main( String[] args )
    {
        LocalDRPC drpc=new LocalDRPC();
        
        LocalCluster cluster=new LocalCluster();
        Config config=new Config();
        
        cluster.submitTopology("SentimentClassifierDemo", config, buildTopology(drpc));
        
        try{
         Thread.sleep(2000);
        }catch(InterruptedException ex)
        {
         ex.printStackTrace();
        }
        
        System.out.println(drpc.execute("classify", "Have a nice day!"));
        System.out.println(drpc.execute("classify", "I feel really bad!"));
        System.out.println(drpc.execute("classify", "Whatever, i don't really care"));
        System.out.println(drpc.execute("classify", "feel sleepy zzzz...."));
        
        cluster.killTopology("SentimentClassifierDemo");
        cluster.shutdown();
        drpc.shutdown();
    }
    
    private static StormTopology buildTopology(LocalDRPC drpc)
    {
     TridentTopology topology=new TridentTopology();
     
     topology.newDRPCStream("classify", drpc).each(new Fields("args"), new TwitterSentimentClassifier(), new Fields("sentiment"));
     
     return topology.build();
    }
}

The DRPCStream allows user to pass in a text string to the TwitterSentimentClassifier which will then return a "sentiment" field, that contains the predicted label (true for positive; false for negative) of the testing text.

Next copy the following two files into the "main/resources" folder under the project root folder:

twitter-sentiment-classifier-classifier.json:
https://github.com/pmerienne/trident-ml/blob/master/src/main/resources/twitter-sentiment-classifier-classifier.json

twitter-sentiment-classifier-extractor.json:
https://github.com/pmerienne/trident-ml/blob/master/src/main/resources/twitter-sentiment-classifier-extractor.json

The above step can be important, otherwise you may get a FileNotFoundException during runtime.

Once the coding is completed, we can run the project by navigating to the project root folder and run the following commands:

> .mvn compile exec:java

No comments:

Post a Comment