Programming Breaks: Maven

Showing posts with label Maven. Show all posts

Friday, November 13, 2015

Eclipse Error: archive required for library cannot be read

Recently I encountered a strange bug in eclipse after a colleague includes selenium in a pom file of a meven project. The eclipse complained " Error: archive required for library cannot be read". Detailed error like this:

"Description Resource Path Location Type Archive for required library: [root]/.m2/repository/org/seleniumhq/selenium/selenium-support/2.46.0/selenium-support-2.46.0.jar' in project '[*]' cannot be read or is not a valid ZIP file umom Build path Build Path Problem"

However, it was found that the selenium-support-2.46.0.jar is valid jar which can be open as zip.

After some searching, the following work around seems to work in my case (link: https://bugs.eclipse.org/bugs/show_bug.cgi?id=364653#c3):

Workaround: For each affected project set 'Incomplete build path' to 'Warning' on the Compiler > Building property page. After that, shut down and restart the eclipse. followed by update the maven projects. And the problem is gone

Thursday, November 12, 2015

Run mvn package with multiple modules containing independent pom files without public repository

Recently i need to build a java application which must be built with dependencies from several different modules, each having their own pom files that packages them into jar. As these files are not distributed from a public repository such as Maven Central and I do not reuse repository system such as Nexus in this case. The modules are in their own folders (with names such as "SK-Utils" "SK-Statistics" "SK-DOM" "OP-Core" "OP-Search" "ML-Core" "ML-Tune" "ML-Clustering" "ML-Trees"). What makes it complicated is that these modules actually have their dependencies specified on each other. E.g., ML-Core depends on "SK-DOM"and "SK-Utils" to build and run unit testing. Running these independent modules using IntelliJ IDE is ok. However, the modules failed to build when build using command lines such as "mvn package". Therefore i wrote a bash scripts which put in the same directory containing the independent modules. The bash script basically run "mvn package" using pom file in each module, then install them to the local repository. The bash script "mvn-build.sh" looks like the following:

#!/usr/bin/env bash
dirArray=( "SK-Utils" "SK-Statistics" "SK-DOM" "OP-Core" "OP-Search" "ML-Core" "ML-Tune" "ML-Clustering" "ML-Trees")
for dirName in "${dirArray[@]}"
do
 echo $dirName
 cd $dirName

 jarPath="target/$dirName-0.0.1-SNAPSHOT-jar-with-dependencies.jar"

 if [ -d $jarPath ]; then
     chmod 777 $jarPath
 fi

    mvn package




    mvn install:install-file -Dfile=$jarPath -DgroupId=com.meme -DartifactId=$dirName -Dpackaging=jar -Dversion=0.0.1-SNAPSHOT


    cd ..
done

Just run the above script using command such as "sudo ./mvn-build.sh" from its folder should build the multiple module project.

Note that each module should have a plugin like below specified so that the "jar-with-dependencies" jar will be generated.

<build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-install-plugin</artifactId>
                <version>2.5.2</version>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>2.6</version>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>

                <executions>
                    <execution>
                        <id>make-assembly</id> 
                        <phase>package</phase> 
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

One more note is that if you are running on environment such as centos linux and encounter "mvn: command not found" when executing "sudo ./mvn-build.sh", it may be due to the fact that the PATH environment variable not in the sudoer, in this case, just run

> sudo env "PATH=$PATH" ./mvn-build.sh

Wednesday, October 28, 2015

Unit Testing of AngularJS in Spring MVC maven project using Jasmine, PhantomJS, and Jenkins CI

This post is about some links on how to perform unit testing of angularjs in spring MVC project. One typical setup is with Jasmine and phantomjs.

Some general descriptions of these tools: jasmine is a behavior-driven development framework for testing javascript, phantomjs is a headless browser, which can be invoked on Jenkins CI to run unit testing of angularjs (Jenkins CI is a continuous integration server that supports building and testing of software projects)

POM setup

Firstly includes the following in your maven POM file

<properties>
<angularjs.version>1.4.3-1</angularjs.version>
<phantomjs.outputDir>${java.io.tmpdir}/phantomjs</phantomjs.outputDir>
</properties>

<build>
  <pluginManagement>
        <plugins>
            <!--This plugin's configuration is used to store Eclipse m2e settings 
                only. It has no influence on the Maven build itself. -->
            <plugin>
                <groupId>org.eclipse.m2e</groupId>
                <artifactId>lifecycle-mapping</artifactId>
                <version>1.0.0</version>
                <configuration>
                    <lifecycleMappingMetadata>
                        <pluginExecutions>
                            <pluginExecution>
                                <pluginExecutionFilter>
                                <groupId>com.github.klieber</groupId>
              <artifactId>phantomjs-maven-plugin</artifactId>
                                    <versionRange>
                                        [0.7,)
                                    </versionRange>
                                    <goals>
                                        <goal>install</goal>
                                    </goals>
                                </pluginExecutionFilter>
                                <action>
                                    <ignore></ignore>
                                </action>
                            </pluginExecution>
                        </pluginExecutions>
                    </lifecycleMappingMetadata>
                </configuration>
            </plugin>
        </plugins>
    </pluginManagement>

<plugins>

<plugin>
          <groupId>com.github.klieber</groupId>
          <artifactId>phantomjs-maven-plugin</artifactId>
          <version>0.7</version>
          <executions>
            <execution>
              <goals>
                <goal>install</goal>
              </goals>
            </execution>
          </executions>
          <configuration>
            <version>1.9.7</version>
          </configuration>
        </plugin>
   <plugin>
     <groupId>com.github.searls</groupId>
     <artifactId>jasmine-maven-plugin</artifactId>
     <version>2.0-alpha-01</version>
     <executions>
       <execution>
         <goals>
           <goal>test</goal>
         </goals>
       </execution>
     </executions>
     
  
     
     <configuration>
       <additionalContexts>
         <context>
           <contextRoot>/lib</contextRoot>
           <directory>${project.build.directory}/generated-resources/unit/ml/js</directory>
         </context>
       </additionalContexts>
       <skipTests>true</skipTests>
       <preloadSources>
          <source>/webjars/jquery/2.1.3/jquery.min.js</source>
     <source>/webjars/bootstrap/3.3.5/js/bootstrap.min.js</source>
     <source>/webjars/angularjs/${angularjs.version}/angular.min.js</source>
     <source>/webjars/angularjs/${angularjs.version}/angular-route.min.js</source>
     <source>/webjars/angularjs/${angularjs.version}/angular-animate.min.js</source>
    
      <source>/webjars/angularjs/${angularjs.version}/angular-mocks.js</source>
       </preloadSources>
       <jsSrcDir>${project.basedir}/src/main/resources/js</jsSrcDir>
       <jsTestSrcDir>${project.basedir}/src/test/resources/js</jsTestSrcDir>
       <webDriverClassName>org.openqa.selenium.phantomjs.PhantomJSDriver</webDriverClassName>
      <webDriverCapabilities>
     <capability>
      <name>phantomjs.binary.path</name>
      <value>${phantomjs.binary}</value>
     </capability>
    </webDriverCapabilities>
     </configuration>
   </plugin>
  </plugins>

  
<build>

In the <plugins> section two plugins, namely jasmine and phantomjs maven plugins are added. The jasmine plugin is for jasmine to be used for unit testing and the phantomjs will download the phantomjs executable into a tmp folder so that it can be invoked to run the unit testing by Jenkins. The phantomjs maven plugin is very useful in that when the project is fetched by Jenkins CI to perform testing, the machine running Jenkins CI may not have phantomjs pre-installed. With the phantomjs maven plugin and the "install" goal specified in it, the Jenkins will search locally whether a copy of phantomjs is available in the specified phantomjs.outputDir folder, if not, it will download from the internet and put it in the phantomjs.outputDir folder, and after that the plugin set the phantomjs.binary parameter automatically, so that Jenkins CI knows where to find the phantomjs executable.

The org.eclipse.m2e lifecycle-mapping specified in the <pluginManagement> is used to stop Eclipse from complaining m2e cannot under the "install" goal specified in the phantomjs maven plugin. It does not have any effect on maven when it builds and run the project.

Implement Jasmine and spring unit testing codes

For this, there is already a nice article on how to do it here at:

https://spring.io/blog/2015/05/19/testing-an-angular-application-angular-js-and-spring-security-part-viii

Therefore I won't repeat it.

Wednesday, October 14, 2015

Nexus: Maven pulls everything but the main jar of drools and jboss through POM dependency

I encountered an error in the eclipse in which maven was not able to pull jar files as specified in the pom file related to the drools.6.3.0.Final. The library files are pulled from a local Nexus repository in which both the jboss public proxy has been setup and added in the mirrors of the local maven settings.xml file.

The problem turned out to be that the pom file in the folder '/root/.m2/repository/org/jboss/dashboard-builder/dashboard-builder-bom/6.3.0.Final/" is named as " dashboard-builder-bom-6.3.0.Final.pom.lastUpdated" instead of "

dashboard-builder-bom-6.3.0.Final.pom".

After renamed the pom file back and do a maven force update on the project in eclipse the error was gone.

Saturday, December 20, 2014

Maven: missing artifact org.eclipse.equinox.app.jar

Sometimes if one sees the errors such as "missing artifact org.eclipse.equinox.app.jar..." in Eclipse or Maven, it is probably they did not set the repository right in their pom file. Below is the repositories which can be added to the pom file that might resolve the error:

<repositories>
<repository>
<id>clojars</id>
<url>http://clojars.org/repo</url>
</repository>
<repository>
<id>mvnCentral</id>
<url>http://repo1.maven.org/maven2</url>
</repository>
</repositories>

Monday, December 15, 2014

Maven: generics are not supported in -source 1.3

Today I was integrating two projects, and encountered a very intriguing bug whenever i try to compile and run the Maven project using either Eclipse's "Maven Build" or maven's "mvn clean compile" commands. Initially i was using maven2 and the bug complained that "maven-compiler-plugin" plugin not found in the POM file. I upgraded to maven 3.13. The original error message disappeared, but i got some new error messages:

[ERROR] (use -source 5 or higher to enable for-each loops)
[ERROR] /.../PAClassifierJsonDecoder.java:[53,7] error: generics are not supported in -source 1.3
[ERROR]
[ERROR] (use -source 5 or higher to enable generics)
[ERROR] /.../drools/RulesEngineActions.java:[112,37] error: generics are not supported in -source 1.3
[ERROR]
[ERROR] (use -source 5 or higher to enable generics)
[ERROR] /.../KafkaSpout.java:[3,7] error: static import declarations are not supported in -source 1.3
[ERROR]
[ERROR] (use -source 5 or higher to enable static import declarations)
[ERROR] /.../KafkaSpout.java:[53,29] error: generics are not supported in -source 1.3
[ERROR]
[ERROR] (use -source 5 or higher to enable generics)
[ERROR] /.../KafkaSpout.java:[193,5] error: annotations are not supported in -source 1.3
[ERROR]

After some googling, it appears that maven by default configured to have source set as 1.3 (on Ubuntu at least), so if the project has use anotation such as "@Override" or generic, it will complain as these are not supported on Java 1.3. The remedy is to include the "maven-compiler-plugin" plugin. This is intriguing as i looked through the merged project's POM, and it already has included the "maven-compiler-plugin" plugin in the build section, which should have worked.

After a more detailed scrutiny and compare with examples on the web, it was noticed that the "maven-compiler-plugin" written in the POM file has one additional line "<groupId>org.apache.maven</groupId>" as follows:

<plugin>
   <groupId>org.apache.maven</groupId>
   <artifactId>maven-compiler-plugin</artifactId>
   <version>3.1</version>
   <configuration>
      <source>1.7</source>
      <target>1.7</target>
   </configuration>
</plugin>

Therefore i decided to remove the line "<groupId>org.apache.maven</groupId>". and it work this time after rerunning the maven command

Saturday, December 13, 2014

ElasticSearch: Filtered Query using JEST

While it is possible to query ElasticSearch using httpclient or es node, it is not as effective as JEST. This post explains the basics of using JEST for filtered query against ElasticSearch. To start, create Maven project and add the following dependencies into the pom file:

<dependency>
    <groupId>io.searchbox</groupId>
    <artifactId>jest</artifactId>
    <version>0.1.3</version>
</dependency>
<dependency>
    <groupId>org.slf4j</groupId>
    <artifactId>slf4j-log4j12</artifactId>
    <version>1.6.1</version>
</dependency>
<dependency>
    <groupId>com.googlecode.json-simple</groupId>
    <artifactId>json-simple</artifactId>
    <version>1.1.1</version>
</dependency>

Suppose the elastic search stores indexed document with the following example mapping:

{ "name" : "xxx",
"age" : 11,
"address" : "xxxxx"}

Let's suppose the elasticsearch is running at 192.168.2.2:9200 and the indexed documents are stored under http://192.168.2.2:9200/myindex/mytype.

Let's create a simple java class representing this index document:

public class SearchResultTuple {
 public String address;
 public String name;
 public int age;
}

We want to retrieves 20 records of the indexed documents from the elasticsearch matching using the following query:

{"from": 0, "size" : 20,
"sort" : {
   "age" : { "order" : "asc" }
},
"query" : {
"filtered" : {
   "query" : {
        "match" : { "name" : "James" }
     },

"filter" : {
   "range" : { "age" : { "lte" : 10, "gte" : 20 } }
     }
   }
}
}

The java implementation to execute the above filtered query using JEST on ElasticSearch is shown below:

import io.searchbox.client.JestClient;
import io.searchbox.client.JestClientFactory;
import io.searchbox.client.config.HttpClientConfig;
import io.searchbox.core.Search;
import io.searchbox.core.SearchResult;
import io.searchbox.core.SearchResult.Hit;

import java.io.IOException;
import java.sql.Date;
import java.util.ArrayList;
import java.util.Calendar;
import java.util.HashSet;
import java.util.List;
import java.util.UUID;

import org.json.simple.JSONObject;

public class App 
{
    public static void main( String[] args ) 
    {
        JestClient client = openClient();
        
        JSONObject json=new JSONObject();
     
        json.put("from", 0);
        json.put("size", 20);
     
        JSONObject sortJson=new JSONObject();
        json.put("sort", sortJson);
     
        JSONObject sortDateJson=new JSONObject();
        sortJson.put("age", sortDateJson);
        sortDateJson.put("order", "asc");
     
        JSONObject queryJson=new JSONObject();
        json.put("query", queryJson);
        
        JSONObject filteredJson=new JSONObject();
        queryJson.put("filtered", filteredJson);
 

        JSONObject queryMatchJson=new JSONObject();
        filteredJson.put("query", queryMatchJson);
 
        JSONObject matchJson=new JSONObject();
        queryMatchJson.put("match", matchJson);
 
        matchJson.put("name", "James");
 
        JSONObject filterJson=new JSONObject();
        filteredJson.put("filter", filterJson);
  
        JSONObject rangeJson=new JSONObject();
        filterJson.put("range", rangeJson);

        JSONObject dateJson = new JSONObject();

        rangeJson.put("age", dateJson);

        dateJson.put("gte", 20);
        dateJson.put("lte", 10);
     
        String jsonString = json.toJSONString();
     
        Search search = (Search) new Search.Builder(jsonString)
        .addIndex("myindex")
        .addType("mytype")
        .build();

        try {
           SearchResult result = client.execute(search);
           //System.out.println(result.getJsonString());
           List<Hit<ElasticSearchResultTuple, Void>> hits = result.getHits(SearchResultTuple.class);
           //System.out.println(hits.size());
          for(Hit<SearchResultTuple, Void> hit : hits)
          {
            SearchResultTuple hitTuple = hit.source;
            int age = hitTuple.age;
            String name = hitTuple.name;
            String address =hitTuple.address;
          }
        } catch (Exception e) {
         // TODO Auto-generated catch block
         e.printStackTrace();
        }
     
       
        client.shutdownClient();
    }
    
    private static JestClient openClient()
    {
     HttpClientConfig clientConfig = new HttpClientConfig.Builder("http://192.168.2.2:9200")
          .multiThreaded(true).build();
     JestClientFactory factory = new JestClientFactory();
  
     factory.setHttpClientConfig(clientConfig);
     JestClient jestClient = factory.getObject();
  
     return jestClient;
    }
}

Friday, December 12, 2014

Storm: java.lang.NoClassDefFoundError: org/apache/curator/RetryPolicy

Recently I was working on a Trident topology with Trident-ML which uses nathan marz's storm-kafka which pushes the results from the Trident topology to be read by another storm topology. While the program works perfectly in local cluster testing. When it was deployed in a storm cluster, the following error showed up from nowhere:

java.lang.NoClassDefFoundError: org/apache/curator/RetryPolicy

The error got me stuck for more than half an hour, before i figured out a workable solution. It seems that i am using there is a version compatibility issues with the version of storm, trident-ml, and the storm-kafka. Originally i had been using the following dependencies in the pom file:

<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-core</artifactId>
<version>0.9.2-incubating</version>
<scope>provided</scope>
</dependency>

<dependency>
<groupId>net.wurstmeister.storm</groupId>
<artifactId>storm-kafka-plus-0.8</artifactId>
<version>0.4.0</version>
</dependency>

<dependency>
<groupId>com.github.pmerienne</groupId>
<artifactId>trident-ml</artifactId>
<version>0.0.4</version>
</dependency>

The error was gone, after I changed the storm-kafka dependency from storm-kafka-plus-0.8 to the following:

<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-kafka</artifactId>
<version>0.9.2-incubating</version>
</dependency>

I figured the updated dependency solves the curator framework dependency in the storm-kafka's pom file. Note that if you have a pom-assembly.xml, remember to include the following:

<include>org.apache.storm:*</include>

Monday, December 1, 2014

Maven: Include text or xml file as resources in a Maven project

Frequently we may have some text files or xml or other files which must be included in a Maven project. The way to do it is quite simple. Firstly create a subfolder under the project root folder, e.g., "src/main/resources". Next include the folder location in the build section of the pom file:

<build>
...
<resources>
<resource>
<directory>src/main/resources</directory>
</resource>
</resources>
...
</build>

Next run the following command in the project's root folder:

> mvn resources:resources

Saturday, November 29, 2014

Maven: Include a non-Maven standard library in pom

Suppose that our Maven project depends on a particular library such as ftp4j-1.2.7.jar, which is not listed by online maven repostiories such as mavenrepostory.com or findjar.com. We can include the library in the following way to the pom file.

Next, designate a fake dependency for the jar, e.g.,

groupId: UNKNOWN
actifactId: ftp4j
version: 1.2.7.

Firstly, create a folder "lib" under the current project root directory and place the ftp4j-1.2.7.jar in the "lib/UNKNOWN/ftp4j/1.2.7/" folder.

Now in the pom, create <remote repository> whose url is local (the url points to the "lib" folder):

<repositories>
...
<repository>
<id>pseudoRemoteRepo</id>
<releases>
<enabled>true</enabled>
<checksumPolicy>ignore</checksumPolicy>
</releases>
<url>file://${project.basedir}/lib</url>
</repository>
</repositories>

Finally we can add the ftp4j.jar as a dependency:

<dependencies>
...
<dependency>
<groupId>UNKNOWN</groupId>
<artifactId>ftp4j</artifactId>
<version>1.2.7</version>
</dependency>
</dependencies>

Maven: Analyze Dependency

When a Maven project has many dependencies, it is some times difficult to analyze their usage or conflict without appropriate tools. Maven comes with the analyze goal, which allows the user to analyze the dependencies. To run the goal, enter the following command in the terminal after navigating to the project root folder:

> mvn dependency:analyze

As the results printout may be long, therefore a good idea is to save the analysis results into a readable format. To do this, open the pom file of the Maven project and add the following maven-dependency-plugin to the build/plugins section:

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<version>2.6</version>
<executions>
<execution>
<id>copy</id>
<phase>package</phase>
</execution>
</executions>
</plugin>

Now in the project root folder, run the following command:

> mvn dependency:analyze-report

The command will generate the report in the "target/analyze-report" folder in the project root folder.

Another useful goal is the analyze-duplicate, which checks and reports the duplicated dependencies:

> mvn dependency:analyze-duplicate

To list all dependencies, run the following command:

> mvn dependency:list

To list all the repositories use by the current project, run the following command:

> mvn dependency:list-repositories

To clean the local repository, run the following command:

> mvn dependency:purge-local-repository

Maven: Scope of dependency

By default, all dependencies defined in the pom file of a Maven project has a scope of compile. In other words, maven will download source and compile and propagate them when building the maven project. There are several other scopes which i like to discuss here:

Scope: provided

By this scope, the compile classes in the dependency related to the Maven project (e.g. by the "import" statements in the project's java codes) will be included during the build and runtime execution

Scope: runtime

By this scope, the dependency's jar file will be included in the runtime classpath during the execution of the maven project.

To find what is included in the runtime path of a Maven project, navigate to the project's root folder and run the following command:

> mvn dependency:build-classpath

Scope: test

By this scope, the dependency will not be included during build or runtime, but will be included for unit testing.

Maven: Transitive Dependency Version Conflict Resolution

To display the dependency tree of a project projects (which includes the dependencies of the project as well as the dependencies of the dependencies...), navigate to the project root folder (which contains the pom file) and run the following command:

> mvn dependency:tree

To save the dependency true to a file said dependency-tree.txt, run the following command:

> mvn dependency:tree > dependency-tree.txt

This command allows user to detect which transitive dependencies are referenced by the project. As a project many have quite a number of direct dependencies (at level 0 dependency, i.e.), which may in turn has dependency on some other library or components. It may happens that same transitive dependency but with different versions but be referenced by the project. In this case, a version conflict on the transitive dependency will arise. The next section explains which version of the transitive dependency maven will take when this happens.

Determine which version of the same dependency will be taken: Nearest First and First Found

If the project has two different dependencies A and B, and A and B both have another dependency C but with different versions of C. For example, if A depends on C:1.1.0 and B depends on C:1.1.2, then maven uses the follow rule to decide whether C:1.1.0 or C:1.1.2 is to be taken.

1. Nearest First: suppose A is at level 0 dependency to the project (i.e. the project directly depends on A) and B is at level 1 dependency to the same project (i.e. the project depends on another jar, said, D, which then depends on B). Then C:1.1.0 is be loaded as A is "nearer" to the project.
2. First Found: if A and B is at the same dependency level, but A is first found by the project (e.g. suppose A and B are at level 0, and A is included in the pom file before B), then C:1.1.0 will be taken.

Control which version of the same dependency to be taken: exclusion and optional

Suppose following the above procedure, and the maven takes in C:1.1.0, in this case, the project may crash because its dependency B requires C:1.1.2 (e.g. common exception such as "ClassDefNotFoundException"). In such scenerio, we can ask A to exclude C:1.1.0, then maven will take in C:1.1.2 instead.

<dependency>
<groupId>groupId.A</groupId>
<artifactId>A</artifactId>
<version>${A.version}</version>
<exclusion>
<groupId>groupId.C</groupId>
<artifactId>C</artifactId>
</exclusion>

Another way to to specify C as optional as A's pom file (i.e. include a tag <optional>true</optional>.in dependency section of C).

Thursday, November 27, 2014

Maven Project for Drools

This post summarizes my initial tryout of Maven project for Drools

Eclipse Plugin

Installing Eclipse plugins is optional, but it will make your job easier when using Drools in Eclipse.

Firstly, you might want to install the GEF (Graphical Editing Framework) on your eclipse or STS first. To do so, select "Help-->Install New Software", and enter the following link to add:

http://download.eclipse.org/tools/gef/updates/releases/

Set the GEF as the name. As show in the figure below:

Click Next and Finish to complete the the installation of GEF.

Next install the drools plugin in Eclipse. To do so, select "Help-->Install New Software". and enter the following link to add:

http://download.jboss.org/drools/release/5.5.0.Final/org.drools.updatesite/

Set the DROOLS as the name. As shown in the figure below:

Click Next and Finish to complete the installation of DROOLS.

Configure pom.xml in Maven project

The following is the dependencies setup for Maven project to use Drools:

<dependency>
<groupId>org.drools</groupId>
<artifactId>drools-core</artifactId>
<version>5.5.0.Final</version>
</dependency>
<dependency>
<groupId>org.drools</groupId>
<artifactId>drools-compiler</artifactId>
<version>5.5.0.Final</version>
</dependency>

To add maven plugins for compiling and executing the project:

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.6</source>
<target>1.6</target>
</configuration>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>1.2.1</version>
<executions>
<execution>
<goals><goal>exec</goal></goals>
</execution>
</executions>
<configuration>
<includeProjectDependencies>true</includeProjectDependencies>
<includePluginDependencies>false</includePluginDependencies>
<executable>java</executable>
<classpathScope>compile</classpathScope>
<mainClass>com.memeanalytics.drools_hello_world.App</mainClass>
</configuration>
</plugin>

Where to add the Drool rule files in the project root folder

To create Drools rules *.drl files, first create a "src/main/resources" folder under the project root folder, and the create and save *.drl files into it. The package names in the *.drl files is up to the developer to define. Once they are in that folder, they can be retrieved directly by their name in method such as ResourceFactory.newResourceClassPathResource() method. For example, for "src/main/resources/simplerule.drl", we can write ResourceFactory.newResourceClassPathResource("simplerule.drl").

Tryout Maven Project Source Codes

Below is the source codes for the tryout

https://dl.dropboxusercontent.com/u/113201788/storm/drools-hello-world.zip

Further information can refer to the following link:

http://download.jboss.org/drools/release/5.5.0.Final/org.drools.updatesite/

Simple ways to understand Drools conditions
Drools conditions are implemented in a grammar named, MVEL (MVFLEX Expression Language). The following rule in a rule file:

rule "purchase greater than 15"
when
$p : Purchase ( total > 15)
then
System.out.println("there exists some purchase having total > 15");
end

This rule can be interpreted as "when there exists some purchase having a total > 15, then print the message". The condition specifies "there exists some purchase having a total > 15" while the consequence is valid java statement.

Sunday, November 23, 2014

Write and test a Trident non transactional topology in Storm

Trident provides high-level abstraction data model for Storm with the concepts of base function, filter, projection, aggregate, grouping, etc. Though it adds overhead to storm but it makes Storm easier to implement as well as provides support such as at least oncely processing or exactly oncely processing.

The trident non transactional topology to be implemented is extremely simple, a dummy spout (derived from IBatchSpout) emits batch (size:10) of tuples having the form of ["{CountryName}", "{Rank}"]. The tuples emitted contains illegal country names which needs to be filtered away. The tuples having the same country in partitions belonging to the same batch is groupped together. Then the frequency of a particular country appearing in the same batch is counted and printed out.

The source codes of the project can be downloaded from the link below:

https://dl.dropboxusercontent.com/u/113201788/storm/trident-test.tar.gz

To start, create a Maven project (in my case, with groupId="memeanalytics", artifactId="trident-test"), and modify the pom.xml file as shown below:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>com.memeanalytics</groupId>
  <artifactId>trident-test</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <packaging>jar</packaging>

  <name>trident-test</name>
  <url>http://maven.apache.org</url>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>
  
  <repositories>
  <repository>
  <id>clojars</id>
  <url>http://clojars.org/repo</url>
  </repository>
  </repositories>

  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>
    <dependency>
     <groupId>storm</groupId>
     <artifactId>storm</artifactId>
     <version>0.9.0.1</version>
     <scope>provided</scope>
    </dependency>
  </dependencies>
  
  <build>
  <plugins>
  <plugin>
  <groupId>org.codehaus.mojo</groupId>
  <artifactId>exec-maven-plugin</artifactId>
  <version>1.2.1</version>
  <executions>
  <execution>
  <goals>
  <goal>exec</goal>
  </goals>
  </execution>
  </executions>
  <configuration>
  <includeProjectDependencies>true</includeProjectDependencies>
  <includePluginDependencies>false</includePluginDependencies>
  <executable>java</executable>
  <classpathScope>compile</classpathScope>
  <mainClass>${main.class}</mainClass>
  </configuration>
  </plugin>
  <plugin>
  <artifactId>maven-assembly-plugin</artifactId>
  <version>2.2.1</version>
  <configuration>
  <descriptorRefs>
  <descriptorRef>jar-with-dependencies</descriptorRef>
  </descriptorRefs>
  <archive>
  <manifest>
  <mainClass></mainClass>
  </manifest>
  </archive>
  </configuration>
  <executions>
  <execution>
  <id>make-assembly</id>
  <phase>package</phase>
  <goals>
  <goal>single</goal>
  </goals>
  </execution>
  </executions>
  </plugin>
  </plugins>
  </build>
</project>

The pom specifies where to download storm as well as maven plugins for building, executing (exec-maven-plugin), and packaging (maven-assembly-plugin) the java project. Now lets create the spout which emits tuples in batch:

package com.memeanalytics.trident_test;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Random;

import backtype.storm.task.TopologyContext;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Values;
import storm.trident.operation.TridentCollector;
import storm.trident.spout.IBatchSpout;

public class RandomWordSpout implements IBatchSpout {
 private static final long serialVersionUID = 1L;

 private static String[] countries=new String[]{
  "China",
  "USA",
  "Ruassia",
  "UK",
  "France",
  "Rubbish",
  "Garbage"
 };
 
 private static Integer[] ranks=new Integer[]{
  1,
  2,
  3,
  4,
  5
 };
 
 private Map<Long, List<List<Object>>> dataStore=new HashMap<Long, List<List<Object>>>();
 
 private int batchSize;
 
 public RandomWordSpout(int batchSize)
 {
  this.batchSize = batchSize;
 }
 
 public void open(Map conf, TopologyContext context) {
  // TODO Auto-generated method stub
 }

 public void emitBatch(long batchId, TridentCollector collector) {
  // TODO Auto-generated method stub
  List<List<Object>> batch=dataStore.get(batchId);
  if(batch == null)
  {
   final Random rand=new Random();
   batch=new ArrayList<List<Object>>();
   for(int i=0; i < batchSize; ++i)
   {
    batch.add(new Values(
      countries[rand.nextInt(countries.length)],
      ranks[rand.nextInt(ranks.length)]
      ));
   }
   dataStore.put(batchId, batch);
  }
  
  for(List<Object> tuple : batch)
  {
   collector.emit(tuple);
  }
 }

 public void ack(long batchId) {
  // TODO Auto-generated method stub
  dataStore.remove(batchId);
 }

 public void close() {
  // TODO Auto-generated method stub
  
 }

 public Map getComponentConfiguration() {
  return null;
 }

 public Fields getOutputFields() {
  return new Fields("Country","Rank");
 }

}

The spout basically create batch based on the batchId and emits the tuples in that batch. When acknowledgement is received, the acknowledge batch having the batchId is then removed from the spout. Note that the spout will emit tuples containing non-country name such as "Rubbish" and "Garbage"

Now we will create a set of Trident operations including CountryFilter (which filters away tuples containing non-country name), Print (which prints values of the count of tuples containing particular country in a batch). The code is as shown below:

package com.memeanalytics.trident_test;

import backtype.storm.tuple.Values;
import storm.trident.operation.BaseFilter;
import storm.trident.operation.BaseFunction;
import storm.trident.operation.TridentCollector;
import storm.trident.tuple.TridentTuple;

public class TridentComps {

 public static class CountryFilter extends BaseFilter{
  private static final long serialVersionUID = 1L;

  public boolean isKeep(TridentTuple tuple) {
   // TODO Auto-generated method stub
   String country_candidate = tuple.getString(0);
   return !country_candidate.equals("Garbage") && !country_candidate.equals("Rubbish");
  }
 }
 
 public static class CountrySplit extends BaseFunction{

  private static final long serialVersionUID = 1L;

  public void execute(TridentTuple tuple, TridentCollector collector) {
   // TODO Auto-generated method stub
   String country_comps=tuple.getString(0);
   for(String country_candidate : country_comps.split("\\s"))
   {
    collector.emit(new Values(country_candidate.trim()));
   }
  }
  
 }
 
 public static class Print extends BaseFilter{

  private static final long serialVersionUID = 1L;

  public boolean isKeep(TridentTuple tuple) {
   // TODO Auto-generated method stub
   System.out.println(tuple);
   return true;
  }
  
 }
}

Now we are ready to implement the main class:

package com.memeanalytics.trident_test;

import storm.trident.TridentTopology;
import storm.trident.operation.builtin.Count;
import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.StormSubmitter;
import backtype.storm.generated.AlreadyAliveException;
import backtype.storm.generated.InvalidTopologyException;
import backtype.storm.generated.StormTopology;
import backtype.storm.tuple.Fields;

public class App 
{
    public static void main( String[] args ) throws Exception
    {
     Config config=new Config();
 config.setMaxPendingSpouts(20);
        if(args.length==0)
        {
         LocalCluster cluster=new LocalCluster();
         cluster.submitTopology("TridentDemo", config, buildTopology());
         
         try{
          Thread.sleep(10000);
         }catch(InterruptedException ex)
         {
          ex.printStackTrace();
         }
         
         cluster.killTopology("TridentDemo");
         cluster.shutdown();
        }
        else 
        {
  config.setNumWorkers(3);
         try{
          StormSubmitter.submitTopology(args[0], config, buildTopology());
         }catch(AlreadyAliveException ex)
         {
          ex.printStackTrace();
         }catch(InvalidTopologyException ex)
         {
          ex.printStackTrace();
         }
        }
    }
    
    private static StormTopology buildTopology()
    {
     RandomWordSpout spout=new RandomWordSpout(10);
     TridentTopology topology=new TridentTopology();
     
     topology.newStream("TridentTxId", spout).shuffle().each(new Fields("Country"), new TridentComps.CountryFilter()).groupBy(new Fields("Country")).aggregate(new Fields("Country"), new Count(), new Fields("Count")).each(new Fields("Count"), new TridentComps.Print()).parallelismHint(2);
     
     return topology.build();
    }
}

The static method buildTopology() creates a Trident non transactional topology, which uses the spout created as data source, the tuples are then filtered by the CountryFilter, and the groupped by the "Country" field value within each batch, a frequency count is then generated via the aggregate method. Finally it is then printed out into the console. (Note that the "aggregate(new Fields("Country"), new Count(), new Fields("Count")" will lead the TridentComps.Print to print out the "Count" value, if you want to print the country as well, then change it to "aggregate(new Fields("Country"), new Count(), new Fields("Country", "Count")")

The main() method is quite straightforward, if there is arguments in the command line, then the project should be packaged into jar and submitted into a storm cluster, otherwise run a local storm cluster and submit the topology there to run. To run locally, navigate to the project root folder and run the following command:

> mvn compile exec:java -Dmain.class=com.memeanalytics.trident_test.App

To run in a storm cluster, make sure the zookeeper cluster and storm cluster is running (following instructions at this link: http://czcodezone.blogspot.sg/2014/11/setup-storm-in-cluster.html), run the following command:

> mvn clean install

After that a trident-test-0.0.1-SNAPSHOT-jar-with-dependencies.jar will be created in the "target" folder under the project root folder.

Now upload the jar by running the following command:

> $STORM_HOME/bin/storm jar [projectRootFolder]/target/trident-test-0.0.1-SNAPSHOT-jar-with-dependencies.jar com.memeanalytics.trident.trident_test.App TridentDemo

Saturday, November 22, 2014

Integrate Kafka with Storm

Create a Maven project (for example, with groupId = "com.memeanalytics" and artifactId = "kafka-consumer-storm"). and then modify the pom.xml file to include the following repository:

<repositories>
...
<repository>
<id>github-releases</id>
<url>http://oss.sonatype.org/content/repositories/github-releases/</url>
</repository>
<repository>
<id>clojars</id>
<url>http://clojars.org/repo</url>
</repository>
</repositories>

In the dependencies section of the pom.xml, include the storm-kafka-0.8-plus (which contains the KafkaSpout which is a spout in the storm cluster that act as consumer to Kafka) and storm-core:

<dependencies>
...
<dependency>
<groupId>net.wurstmeister.storm</groupId>
<artifactId>storm-kafka-0.8-plus</artifactId>
<version>0.4.0</version>
</dependency>
<dependency>
<groupId>storm</groupId>
<artifactId>storm-core<artifactId>
<version>0.9.0.1</version>
</dependency>


<dependency>
<groupId>commons-collections</groupId>
<artifactId>commons-collections</artifactId>
<version>3.2.1</version>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>15.0</version>
<dependency>

</dependencies>

In the build/plugins section of the pom.xml, include the maven-assembly-plugin [version:2.2.1] (contains maven plugin for jar packaging of the storm topology project, which can be submitted into storm cluster, instruction can be found at this link: http://czcodezone.blogspot.sg/2014/11/maven-pom-configuration-for-maven-build.html) and exec-maven-plugin [version:1.2.1] (contains maven plugin to execute java program, instruction can be found at this link: http://czcodezone.blogspot.sg/2014/11/maven-add-plugin-to-pom-configuration.html).

Now we are ready to create storm topology which pull data from Kafka messaging system. The topology will be very simple, it will use a KafkaSpout which reads message from Kafka broker and emit a tuple consisting the message to a very simple bolt which prints the content of the tuple out. The simple bolt, PrinterBolt, has the following implementation:

package com.memeanalytics.kafka_consumer_storm;

import org.apache.commons.lang.StringUtils;

import backtype.storm.topology.BasicOutputCollector;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseBasicBolt;
import backtype.storm.tuple.Tuple;

public class PrinterBolt extends BaseBasicBolt {
 private static final long serialVersionUID = 1L;

 public void execute(Tuple input, BasicOutputCollector collector) {
  // TODO Auto-generated method stub
  String word=input.getString(0);
  if(StringUtils.isBlank(word))
  {
   return;
  }
  
  System.out.println("Word: "+word);
 }

 public void declareOutputFields(OutputFieldsDeclarer declarer) {
  // TODO Auto-generated method stub
  
 }
}

Below is the implementation of the main class com.memeanalytics.kafka_consumer_storm.App, which creates a KafkaSpout which emits tuple to the PrinterBolt above (LocalCluster is used but will be change to StormSubmitter in production environment):

package com.memeanalytics.kafka_consumer_storm;

import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.generated.AlreadyAliveException;
import backtype.storm.generated.InvalidTopologyException;
import backtype.storm.spout.SchemeAsMultiScheme;
import backtype.storm.topology.TopologyBuilder;
import storm.kafka.KafkaSpout;
import storm.kafka.SpoutConfig;
import storm.kafka.StringScheme;
import storm.kafka.ZkHosts;

public class App 
{
    public static void main( String[] args ) throws AlreadyAliveException, InvalidTopologyException
    {
        ZkHosts zkHosts=new ZkHosts("192.168.2.4:2181");
        
        String topic_name="test-topic";
        String consumer_group_id="id7";
        String zookeeper_root="";
        SpoutConfig kafkaConfig=new SpoutConfig(zkHosts, topic_name, zookeeper_root, consumer_group_id);
        
        kafkaConfig.scheme=new SchemeAsMultiScheme(new StringScheme());
        kafkaConfig.forceFromStart=true;
        
        TopologyBuilder builder=new TopologyBuilder();
        builder.setSpout("KafkaSpout", new KafkaSpout(kafkaConfig), 1);
        builder.setBolt("PrinterBolt", new PrinterBolt()).globalGrouping("KafkaSpout");
        
        Config config=new Config();
        
        LocalCluster cluster=new LocalCluster();
        
        cluster.submitTopology("KafkaConsumerTopology", config, builder.createTopology());
        
        try{
         Thread.sleep(60000);
        }catch(InterruptedException ex)
        {
         ex.printStackTrace();
        }
        
        cluster.killTopology("KafkaConsumerTopology");
        cluster.shutdown();
    }
}

As can be seen, the main component consists of create a SpoutConfig which is configured for the KafkaSpout object. The SpoutConfig in this case specifies the following information:

--zookeeper 192.168.2.4:2181
--topic test-topic
--consumer-group-id: id7
--from-beginning
--string-scheme

The complete source codes can be downloaded from the link below:

https://dl.dropboxusercontent.com/u/113201788/storm/kafka-consumer-storm.zip

Now before executing the project, we must have the kafka cluster running (follow instructions from this link: http://czcodezone.blogspot.sg/2014/11/setup-kafka-in-cluster.html) and a Kafka producer running (following instructions from this link: http://czcodezone.blogspot.sg/2014/11/write-and-test-simple-kafka-producer.html). Now compile and run the kafka-consumer-storm project by running the following command from the project's root directory in the terminal :

> mvn clean compile exec:java -Dmain.class=com.memeanalytics.kafka_consumer_storm.App

You will now see that words produced by the Kafka producer being printed by the storm's PrinterBolt.

Write and test a simple Kafka producer

First we would need to start a zookeeper cluster

Now create a Maven project in Eclipse or STS (e.g. groupId=com.memeanalytics artifactId=kafka-producer), and change the pom.xml to include the following dependencies and plugins:

<dependencies>
...
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.8.0</artifactId>
<version>0.8.1.1</version>
<exclusions>
<exclusion>
<groupId>javax.jms</groupId>
<artifactId>jms</artifactId>
</exclusion>
<exclusion>
<groupId>com.sun.jdmk</groupId>
<artifactId>jmxtools</artifactId>
</exclusion>
<exclusion>
<groupId>com.sun.jmx</groupId>
<artifactId>jmxri</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>1.2.1</version>

<executions>
<execution>
<goals>
<goal>exec</goal>
</goals>
</execution>
</executions>

<configuration>
<includeProjectDependencies>true</includeProjectDependencies>
<includePluginDependencies>false</includePluginDependencies>
<executable>java</executable>
<classpathScope>compile</classpathScope>
<mainClass>com.memeanalytics.kafka_producer.App</mainClass>
</configuration>
</plugin>

</plugins>
</build>

The dependency setting include the kafka pom and the plugin setting include the maven exec plugin.

The implementation of a kafka producer in java is very simple a straight forward, we first create a KafkaConfig object which the following initialization:

metadata.broker.list: 192.168.2.4:9092
serializer.class: kafka.serializer.StringEncoder
request.required.acks: 1

The broker list we only need to specify one broker and the rest will be automatically discovered. Since we are to produce string data to the kafka broker, the StringEncoder is used as the data serializer, We also specify that we like to have acknowleges for request sent.

Next is to create a Producer object that is configured by the KafkaConfig object above:

Producer<String, String> kafkaProducer=new Producer<String, String>(config);

The actual sending of data is performed by the following line:

KeyedMessage<String, String> data=new KeyedMessage<String, String>("test-topic", "MyDataTextMessage");
kafkaProducer.send(data);

The "test-topic" is the name of the topic and the "MyDataTextMessage" is the actual data sent to kafka brokers. After all the data has been sent, the kafkaProducer needs to be closed:

kafkaProducer.close();

You can download the complete java project codes from the link below:

https://dl.dropboxusercontent.com/u/113201788/storm/kafka-producer.zip

To run, navigate the root folder of the Maven project, and run the following command in a terminal:

> mvn compile exec:java

To test, make sure that the kafka cluster is setup and running (follow the instructions given in this link: http://czcodezone.blogspot.sg/2014/11/setup-kafka-in-cluster.html) open another terminal and run a kafka console consumer using the following command:

> $KAFKA_HOME/bin/kafka-console-consumer.sh --zookeeper 192.168.2.2:2181 --topic test-topic --from-beginning

Maven: add plugin to pom configuration for Maven to compile and execute a Maven java project.

This post shows how to add xml plugin section into the pom.xml file for a maven project, so that the result can be compiled and executed. Create a Maven project using Eclipse or STS with groupId = "com.trying" and artifactId = "hello-world" (these are just made-up names so that a hello-world project will be generated with namespace com.trying.hello_world). Let's say the project has a main class com.trying.hello_world.App, which contains the main() method, that looks like the following

public static void main(String[] args)
{
System.out.println("Hello World");
}

Now open the pom.xml file created in the project and add the following section to the end:

<build>
<plugins>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>1.2.1</version>

<executions>
<execution>
<goals>
<goal>exec</goal>
</goals>
</execution>
</executions>

<configuration>
<executable>java</executable>
<includeProjectDependencies>true</includeProjectDependencies>
<includePluginDependencies>false</includePluginDependencies>
<classpathScope>exec</classpathScope>
<mainClass>com.trying.hello_world.App</mainClass>
</configuration>
</plugin>
</plugins>
</build>

Save and close pom.xml. Now navigate the project root folder, then compile and execute the program in the terminal:

> cd $PROJECT_HOME/hello-world
> mvn compile exec:java

You should see the output "Hello World" as well as messages indicating the project is built successfully (the output classes are in the hello-world/target sub folder).

Thursday, November 20, 2014

Maven: add plugin to pom configuration for Maven to build a jar package in Eclipse and STS

To use maven to compile and build a jar package, create a maven project in Eclipse or STS. After the project is created add the following xml sections to the end of the pom.xml file in the project.

<build>
<plugins>
<plugin>

<artifactId>maven-assembly-plugin</artifactId>
<version>2.2.1</version>

<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass />
</manifest>
</archive>
</configuration>

<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>

</plugin>
</plugins>
</build>

After the implementation of java coding. To build and run using maven, navigate to the project folder in the terminal and run the following command:

> mvn compile exec:java -Dexec.classpathScope=compile -Dexec.mainClass=[mainClassFullPath]

where [mainClassFullPath] refers to the full name of the class containing the main() method.

Or

> mvn clean install

Tuesday, November 18, 2014

Setup Maven on Ubuntu and MacOS

This post contains how to set up maven on Ubuntu and MacOS as well as finding their installation location. On Ubuntu, run the following command in the terminal:

$mvn

The printout will indicate whether maven is installed or not. If not, run the following command to install:

$sudo apt-get install maven2

After maven is installed, its home directory is at /usr/share/maven2 (depending on your version)
To install maven on MacOS, run "$mvn --version" to check whether it is installed. If not, run the following command in the terminal to install:

$brew install maven

After the installation, to find the installation path of maven, type the following command:

$mvn --version