Wednesday, October 9, 2013

SQL Query to get the database size in MySQL

The following query will returns the the database name and corresponding database size in MySQL

SELECT table_schema "Data Base Name", sum( data_length + index_length ) / 1024 / 1024 "Data Base Size in MB" 
FROM information_schema.TABLES GROUP BY table_schema ; 

Import large SQL File into MySQL Server in XAMP / WAMP

To import a large SQL file into MySQL Server in WAMP, copy the SQL file into "C:\wamp\bin\mysql\mysql5.5.24\bin", and navigate to the "C:\wamp\bin\mysql\mysql5.5.24\bin" in the command line window, then in the command line enter:

mysql -u [username] -p [DatabaseName] < [SQLFileName]

for example, is the [username] is root and [DatabaseName] is mixertruck, and the [SQLFilename] is mixertruck.sql, then the command line is 

mysql -u root -p mixertruck < mixertruck.sql

if the mysql is on a remote server, then use the following command line:

mysql -u [username] -h [HostName] -p [DatabaseName] < [SQLFileName]

Read very large text file


The Large Text File Viewer is a very good tool to read very large file such as very large SQL file, it can be downloaded from:


Use MS SQL Server Management Studio library to import a large SQL file in C#

Suppose you have a SQL File that needs to be imported into a MS SQL Database Server using C#, firstly import the following libraries as references in your C# project from the folder "C:\Program Files\Microsoft SQL Server\100\SDK\Assemblies"


  • Microsoft.SqlServer.Smo.dll
  • Microsoft.SqlServer.ConnectionInfo.dll
  • Microsoft.SqlServer.Management.Sdk.Sfc
  • Microsoft.SqlServer.SqlEnum
Next use the following C# codes to import the SQL file

using System.Data.SqlClient;
using Microsoft.SqlServer.Management.Smo;
using Microsoft.SqlServer.Management.Common;
using System.IO;

private static void RunUT_ImportSQLFile()
{
 string script = "";

 string filepath = @"D:\temp\mixertruck.sql";

 if (!File.Exists(filepath))
 {
  Console.WriteLine("File [{0}] does not exist!", filepath);
  return;
 }

 StringBuilder script_sb = new StringBuilder();
 using (StreamReader reader = new StreamReader(filepath))
 {
  script = reader.ReadToEnd();
 }

 SqlConnectionStringBuilder sb = new SqlConnectionStringBuilder();
 sb.DataSource = @".\SQLEXPRESS";
 sb.InitialCatalog = "mixertruck";
 sb.UserID = "[username]";
 sb.Password = "[password]";
 string connection_string = sb.ToString();
 Console.WriteLine(connection_string);
 try
 {
  Console.WriteLine("Start importing data from: {0}", filepath);
  using (SqlConnection conn = new SqlConnection(connection_string))
  {
   Server server = new Server(new ServerConnection(conn));
   server.ConnectionContext.ExecuteNonQuery(script);
  }
  Console.WriteLine("Completed importing data from: {0}", filepath);
 }
 catch (Exception ex)
 {
  Console.WriteLine(ex.ToString());
 }
}


If the SQL file is really large the line "script=reader.ReadToEnd()" will throw memory exception, an alternative is as follows:
public static void RunUT_ImportLargeSQLFile()
{
 string filepath = @"D:\temp\mixertruck.sql";

 if (!File.Exists(filepath))
 {
  Console.WriteLine("File [{0}] does not exist!", filepath);
  return;
 }

 StringBuilder script_sb = new StringBuilder();
 string line = null;
 string query = null;
 

 SqlConnectionStringBuilder sb = new SqlConnectionStringBuilder();
 sb.DataSource = @".\SQLEXPRESS";
 sb.InitialCatalog = "mixertruck";
 sb.UserID = "[username]";
 sb.Password = "[password]";
 string connection_string = sb.ToString();
 int query_count=queries.Count;
 try
 {

  using (SqlConnection conn = new SqlConnection(connection_string))
  {
   Server server = new Server(new ServerConnection(conn));
   
   using (StreamReader reader = new StreamReader(filepath))
   {
    while ((line = reader.ReadLine())!= null)
    {
     script_sb.AppendLine(line);
     if (line.Contains(";"))
     {
      query = script_sb.ToString();
      script_sb.Clear();
      server.ConnectionContext.ExecuteNonQuery(query);
     }
    }
   }
  }

 }
 catch (Exception ex)
 {
  Console.WriteLine("Error executing: {0}\r\n{1}", query, ex.ToString());
 }
}

Use SqlConnectionStringBuilder

The following code shows how to use SqlConnectionStringBuilder to conveniently construct the connection string to connect to MS SQL Server (Default SQL Server instance is assumed)


public static void RunUT_Connection()
{
 SqlConnectionStringBuilder sb = new SqlConnectionStringBuilder();
 sb.DataSource = @".\SQLEXPRESS";
 sb.InitialCatalog = "mixertruck";
 sb.UserID = "[username]";
 sb.Password = "[password]";
 string connection_string = sb.ToString();
 Console.WriteLine(connection_string);
 try
 {
  using (SqlConnection conn = new SqlConnection(connection_string))
  {
   conn.Open();

   conn.Close();
  }
 }
 catch (Exception ex)
 {
  Console.WriteLine(ex.ToString());
 }
}

Using command line to import a large SQL file into SQL Server Express 2008

Suppose you have a large SQL file (file with extension *.sql) that you want to import into a database in SQL Server Express 2008. you can use the following command line:

sqlcmd -S [ServerName]\[ServerInstanceName] S -d [DatabaseName] -i [SQLFilePath]

For example, suppose the SQL file is located at D\temp\mixertruck.sql, and you use default server instance .\SQLEXPRESS which already has a database mixertruck, then the command line is as follows:

sqlcmd -S .\SQLEXPRESS -d mixertruck -i D:\temp\mixertruck.sql

Tuesday, October 8, 2013

How to install SQL Server Express 2008

Follow the following link:
http://www2.blackthornesw.com/discus/messages/932/88726.html

If you encounter error such as "Object reference not set to an instance of an object" during installation or uninstallation, the most probable cause would be you have installed earlier version of SQL Server Express (e.g. 2000, 2005, etc). To solve the problem, uninstall earlier version of the SQL Server Expression version, then install the SQL Server Express 2008. If you need those earlier version of SQL Server Expression, you can install them after you install SQL Server Express 2008.

Tuesday, September 10, 2013

Vehicle Routing Scheduler Library Released

Just complete a set up the website for the Vehicle Routing Scheduler Library released:

http://www.memeanalytics.com/vrplib/

The library currently consists of 16 meta-heuristic frameworks (including state-of-the-art meta-heuristic such as memetic algorithm, adaptive MA, hyper-heuristic, simulated annealing, memeplex-based algorithms, etc) each of which can solve 6 different vehicle routing problem variants at the moment which include: Capacitated VRP, VRP with Time Windows, Multiple Depot VRP, Stochastic VRP, VRP with Backhauls, VRP with Pickup and Delivery, or any mixes between these 6 variants. Its primary objective is to provide a platform for researchers and developers to design, experiment, and use vehicle routing solvers for existing and new vehicle routing challenges. Therefore it is designed to be modular, domain-driven, and with variety of tools and interfaces such as algorithm analysis, performance analysis, problem analysis, geo-distance calculation, map-based routing, route sheet, and so on, which target for real-world vehicle routing problem implementation. The library is built from scratch and based on several research works undertaken by the developers of this library.

Wednesday, September 4, 2013

A simple Python script for renaming filenames in the current folder

Suppose you have a set of image files which contains a certain text that you want to remove from the image filenames, below is the Python script to do it:
import os

for filename in os.listdir("."):
 if " (Small)" not in filename: continue
 os.rename(filename, filename.replace(" (Small)", ""))

Saturday, August 24, 2013

VS IDE Post-build Event: Copy Files

The following line (specified in the "Post-build event command line" under "Build Events" of the project property) will copies the dll generated in a Visual Studio project to a folder "Libs" in the solution directory:


xcopy $(TargetDir)*.dll "$(SolutionDir)Libs" /D /Y

Saturday, August 17, 2013

Calculate Geo Distance in SQL query


The follow SQL query computes the geo distance between two points ([LAT_CENTER], [LON_CENTER]) and ([LAT1], [LON1]), where ([LAT1], [LON1]) is taken from a datatable zipcensus:
SELECT ACOS(COS(centerlatrad)*COS(latrad)*COS(centerlongrad - longrad) + SIN(centerlatrad)*SIN(latrad)) * radius as distcirc FROM (SELECT latitude-([LAT_CENTER]) as difflat, longitude - ([LON_CENTER]) as difflong, latitude*PI()/180 as latrad, [LAT_CENTER]*PI()/180 as centerlatrad, longitude*PI()/180 as longrad, [LON_CENTER]*PI()/180 as centerlongrad, 3949.9 as radius FROM zipcensus zc WHERE latitude = [LAT1] AND longitude = [LON1]) zc) zc 

The follow SQL query computes the top 10 nearby points to ([LAT_CENTER], [LON_CENTER])
SELECT TOP 10 zc.*, ACOS(COS(centerlatrad)*COS(latrad)*COS(centerlongrad - longrad) + SIN(centerlatrad)*SIN(latrad)) * radius as distcirc FROM (SELECT latitude-([LAT_CENTER]) as difflat, longitude - ([LON_CENTER]) as difflong, latitude*PI()/180 as latrad, [LAT_CENTER]*PI()/180 as centerlatrad, longitude*PI()/180 as longrad, [LON_CENTER]*PI()/180 as centerlongrad, 3949.9 as radius FROM zipcensus zc) zc) zc ORDER BY distcirc

The formula uses several trigonometric functions, so the innermost query converts all the latitudes and longitudes to radians. In addition, this method uses the radius of the earth, which is taken to be 3,949.9 miles, the unit of the distance returned (i.e. distcirc) is in miles

Thursday, August 15, 2013

SQLCE query alternative for SQL Server SQL

In SQL Server, you can use convenient features such as COUNT(DISTINCT db_field) or YEAR(orderdate). However, SQL functions such as COUNT(DISTINCT ...) and YEAR are not available for SQL Server Compact Edition, therefore, one has to write alternatives to get them to work. below are the example.

SQLCE Alternative to COUNT(DISTINCT ..)

//In SQL Server, will not work in SQLCE 
string query_SQLServer="SELECT COUNT(DISTINCT householdid) FROM customer c JOIN order o ON c.customerid=o.customerid";

//In SQLCE, will work for both SQL Sever and SQLCE
string query_SQLCE="SELECT COUNT(*) FROM (SELECT DISTINCT householdid FROM customer c JOIN order o ON c.customerid=o.customerid) a";


SQLCE Alternative to YEAR()


//In SQL Server, will not work in SQLCE 
string query2_SQLServer="SELECT YEAR(orderdate) FROM customer c JOIN order o ON c.customerid=o.customerid";

//In SQLCE, will work for both SQL Sever and SQLCE
string query2_SQLCE="SELECT DATEPART(year, orderdate) FROM customer c JOIN order o ON c.customerid=o.customerid";

SQLCE Alternative to SQUARE()

//In SQL Server, will not work in SQLCE 
string query_SQLCE="SELECT SQUARE(amount) as amountsqr FROM orders";

//In SQLCE, will work for both SQL Sever and SQLCE
string query_SQLCE="SELECT POWER(amount, 2) as amountsqr FROM orders";

Tuesday, August 13, 2013

How to return the list of column names about a datatable from SQL Compact Edition Database file

Suppose you have a datatable called "calendar" inside your SQLCE database file, but you don't know about the column names of the database, the following code can be used to retrieve the column headers:

string query="SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'calendar'";
string conn_string="[YOUR DB CONNECTION STRING]";
try
{
 using (SqlCeConnection conn = new SqlCeConnection(conn_string))
 {
  conn.Open();
  using (SqlCeCommand cmd = new SqlCeCommand(query, conn))
  {
   SqlCeDataReader reader=cmd.ExecuteReader();
   while (reader.Read())
   {
    int fc=reader.FieldCount;
    for (int i = 0; i < fc; ++i)
    {
     Console.WriteLine("DB Column Header: {0}", reader.GetValue(i));
    }
   }
  }
  conn.Close();
 }
}
catch (Exception ex)
{
 Console.WriteLine("Error executing: {0}", query);
 Console.WriteLine(ex.ToString());
}

One interesting observation is that if one has a column named "national" within the "calendar" table and tries to retrieve it using "SELECT national from calendar", the query will fail as it seems that "national" may be a keyword in the SQLCE (which i am not sure at this point).

Monday, August 12, 2013

Increase the SQL Compact Edition database file size

By default the size of the sqlce database file is only 2GB, which may not be enough for some application. the following shows the connection string which allows to increase the file size to 4GB.

using System.Data.SqlServerCe;
using System.Data;
using System.IO;

string db_file_path="[YOUR DB FILE LOCATION]"; 
string db_pass = "[YOUR DB PASSWORD]";
int db_size_in_MB = 4000; //The absolute maximum size of the database is 4gb.
string conn_string = string.Format("DataSource=\"{0}\";Password=\"{1}\";Max Database Size={2}", db_file_path, db_pass, db_size_in_MB);

if(!File.Exists(db_file_path))
{
 SqlCeEngine en = new SqlCeEngine(conn_string);
 en.CreateDatabase();
}

string query= @"CREATE TABLE [product](
         [PRODUCTID] [int] NULL,
         [PRODUCTNAME] [nvarchar](50) NULL,
         [PRODUCTGROUPCODE] [nvarchar](50) NULL,
         [PRODUCTGROUPNAME] [nvarchar](50) NULL,
         [INSTOCKFLAG] [nvarchar](50) NULL,
         [FULLPRICE] [money] NULL
        ) ";

try
{
 using (SqlCeConnection conn = new SqlCeConnection(conn_string))
 {
  conn.Open();
  using (SqlCeCommand cmd = new SqlCeCommand(query, conn))
  {
   cmd.ExecuteNonQuery();
  }
  conn.Close();
 }
}
catch (Exception ex)
{
 Console.WriteLine("Error executing: {0}", query);
 Console.WriteLine(ex.ToString());
}

Sunday, August 11, 2013

C# to MS SQL Compact Edition Data Type Mapping

SQLCE DataTypeC# DataType
numericdecimal
moneydecimal
smallintInt16
intInt32
bigintInt64
floatfloat
doubledouble
nvarchar, ncharstring

Bulk Insert into MS SQL Compact Edition Database from CSV files using C#

In MS SQL Server, bulk insert into database from CSV files can be easily done by calling 'LOAD' command from SQL query. However, this command is not supported in MS SQL compact edition. By inserting line by line into sqlce database is obviously too slow for most application that consumes large amount of data. Below is a source code written in C# for sqlce which allows users to obtain performance similar to bulk insert in MS SQL Server

public void BulkInsertFromCSV(string file_path, string table_name, string connection_string, Dictionary<string, Type> data_types)
{
 string line;
 List<string> column_names = new List<string>();
 using (StreamReader reader = new StreamReader(file_path))
 {
  line = reader.ReadLine();
  string[] texts = line.Split('\t');
  foreach (string txt in texts)
  {
   column_names.Add(txt);
  }

   using (SqlCeConnection conn = new SqlCeConnection(connection_string))
   {
    SqlCeCommand cmd = new SqlCeCommand();
    SqlCeResultSet rs;
    SqlCeUpdatableRecord rec;
    conn.Open();
    cmd.Connection = conn;
    cmd.CommandText = table_name;
    cmd.CommandType = CommandType.TableDirect;

    rs = cmd.ExecuteResultSet(ResultSetOptions.Updatable);
    

    while ((line = reader.ReadLine()) != null)
    {
     texts = line.Split('\t');

     rec = rs.CreateRecord();

     for (int j = 0; j < column_names.Count; ++j)
     {
      string columnName = column_names[j];
      
      int ordinal = rec.GetOrdinal(columnName);
      string param_value = "";
      if (j < texts.Length)
      {
       param_value = texts[j];
      }

      Type data_type=data_types[columnName];
      if (data_type == typeof(Int32))
      {
       Int32 value=0;
       int.TryParse(param_value, out value);
       rec.SetInt32(ordinal, value);
      }
      else if (data_type == typeof(Int64))
      {
       Int64 value = 0;
       Int64.TryParse(param_value, out value);
       rec.SetInt64(ordinal, value);
      }
      else if (data_type == typeof(Int16))
      {
       Int16 value = 0;
       Int16.TryParse(param_value, out value);
       rec.SetInt16(ordinal, value);
      }
      else if (data_type == typeof(string))
      {
       rec.SetString(ordinal, param_value);
      }
      else if (data_type == typeof(double))
      {
       double value=0;
       double.TryParse(param_value, out value);
       rec.SetDouble(ordinal, value);
      }
      else if (data_type == typeof(float))
      {
       float value = 0;
       float.TryParse(param_value, out value);
       rec.SetFloat(ordinal, value);
      }
      else if (data_type == typeof(DateTime))
      {
       DateTime value;
       if(DateTime.TryParse(param_value, out value))
           rec.SetDateTime(ordinal, value);
      }
      else if (data_type == typeof(decimal))
      {
         decimal value;
         decimal.TryParse(param_value, out value);
         rec.SetDecimal(ordinal, value);
      }
     }

     rs.Insert(rec);
    }

    rs.Close();
    rs.Dispose();
    cmd.Dispose();
   
  }
 }
}

As can seen in the code above, the parameters for the method include:

file_path: full path to the CSV file, which is assumed to be tab-separated and has a header column which has the same names as the database column names
table_name: name of the database table into which the CSV file data will be populated
connection_string: the database connection string
data_types: a data type holder of Dictionary<string, Type>, for which the key specifies the name of the datatable column while the type specifies the type of data held by the column

The usage of the method is very simple, support the datatable schema looks like the following:
CREATE TABLE [product](
 [PRODUCTID] [int] NULL,
 [PRODUCTNAME] [nvarchar](50) NULL,
 [PRODUCTGROUPCODE] [nvarchar](50) NULL,
 [PRODUCTGROUPNAME] [nvarchar](50) NULL,
 [INSTOCKFLAG] [nvarchar](50) NULL,
 [FULLPRICE] [money] NULL
) 

And the CSV file C:\temp.csv looks like the following
PRODUCTID PRODUCTNAME PRODUCTGROUPCODE PRODUCTGROUPNAME INSTOCKFLAG FULLPRICE
10001  CA CALENDAR N 15
10002  CA CALENDAR N 10
10003  CA CALENDAR N 10
10004  BK BOOK Y 90
10005  BK BOOK Y 16
10006  BK BOOK N 10
10007  BK BOOK N 7
10008  BK BOOK N 11

The invoking of the method will be something like:
string connection_string="[YOUR DB CONNECTION STRING]";
Dictionary<string, Type> data_types=new Dictionary<string, Type>();
data_types["PRODUCTID"]=typeof(Int32);
data_types["PRODUCTNAME"]=typeof(string);
data_types["PRODUCTGROUPCODE"]=typeof(string);
data_types["INSTOCKFLAG"]=typeof(string);
data_types["FULLPRICE"]=typeof(float);
BulkInsertFromCSV(@"C:\temp.csv", "product", connection_string, data_types);

One more thing, i found that there is a bug that prevents insertion of float value to the datatable. In my implementation, the .net complains that "cannot insert Single to column (xxx). The column type is Float". Although i already specify using "SetFloat" but the errors simply do not go away, therefore, currently I am using "SetValue(ordinal, value)" in place of "SetFloat(ordinal, value)" which kind of solving my problem.

Javascript Video Player to play AVI, WMV, and other formats such as MP4

There are quite a number of javascript-based video player which leverage either HTML5 or flash player to player video files from the server. However, majority of them does not support playback of file format such as AVI and WMV. This post shows how one can create a simple page in javascript and PHP that support playback of most video formats including AVI and WMV.

Step 1: Download Flowplayer
Download the flow player from the following link:

http://flowplayer.org/download/

unzip the flowplayer to the folder "flowplayer" below your application root directory.

Step 2: Implement PHP video player page

Implement a video player page in PHP that playback the video, the source code of the PHP is shown below:

<?php

$folder_name=$_GET['dirno'];
$video_name=$_GET['vid'];

$path_parts = pathinfo($video_name);
$video_format= $path_parts['extension']; 
?>

<!DOCTYPE html>
<html>
<head>
  <title>Meme Analytics Video Demos</title>

  <?php if($video_format=='avi' || $video_format=='wmv'):?>
  <?php else: ?>
   <!-- player skin -->
   <link rel="stylesheet" type="text/css" href="flowplayer/skin/minimalist.css">

   <!-- site specific styling -->
   <style type="text/css">
   body { font: 12px "Myriad Pro", "Lucida Grande", sans-serif; text-align: center; padding-top: 5%; }
   .flowplayer { width: 80%; }
   </style>

   <!-- flowplayer depends on jQuery 1.7.1+ (for now) -->
   <script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1/jquery.min.js"></script>

   <!-- include flowplayer -->
   <script type="text/javascript" src="flowplayer/flowplayer.min.js"></script>
  <?php endif; ?>

</head>
<body>
  <?php if($video_format=='avi' || $video_format=='wmv'):?>
  <object classid="clsid:67DABFBF-D0AB-41fa-9C46-CC0F21721616" width="320" height="260" codebase="http://go.divx.com/plugin/DivXBrowserPlugin.cab">

  <param name="custommode" value="none" />

   <param name="previewImage" value="image.png" />
   <param name="autoPlay" value="false" />
   <param name="src" value="<?php echo $folder_name.'/'.$video_name; ?>" />

 <embed type="video/divx" src="<?php echo $folder_name.'/'.$video_name; ?>" custommode="none" width="320" height="260" autoPlay="false"  previewImage="image.png"  pluginspage="http://go.divx.com/plugin/download/">
 </embed>
 </object>
 <br />No video? <a href="http://www.divx.com/software/divx-plus/web-player" target="_blank">Download</a> the DivX Plus Web Player.
  <?php else: ?>
  <!-- the player -->
   <div class="flowplayer" data-swf="flowplayer/flowplayer.swf" data-ratio="0.4167">
      <video>
          <source src="http://xxx-xxx.com/demo/videos/<?php echo $folder_name.'/'.$video_name; ?>" type='video/<?php echo $video_format; ?>' />
         
      </video>
   </div>
  <?php endif; ?>
</body>
<html>

The PHP player page accept to parameters from the URL: dirno and vid, which are the directory name and the video filename, it then checks the extension of the video file, if the file is a AVI or WMV file, it invokes the DIVX web player, otherwise, the flowplayer is used to play the video.

Step 3: Use VLC instead of DIVX to player AVI online

In case the DIVX web player does not work with your AVI files, you can also try other options, below is the same PHP script rewritten to use VLC player for AVI.

<?php

$folder_name=$_GET['dirno'];
$video_name=$_GET['vid'];

$path_parts = pathinfo($video_name);
$video_format= $path_parts['extension']; 
?>

<!DOCTYPE html>
<html>
<head>
  <title>Video Demos</title>

  <?php if($video_format=='avi' || $video_format=='wmv'):?>
  <?php else: ?>
   <!-- player skin -->
   <link rel="stylesheet" type="text/css" href="flowplayer/skin/minimalist.css">

   <!-- site specific styling -->
   <style type="text/css">
   body { font: 12px "Myriad Pro", "Lucida Grande", sans-serif; text-align: center; padding-top: 5%; }
   .flowplayer { width: 80%; }
   </style>

   <!-- flowplayer depends on jQuery 1.7.1+ (for now) -->
   <script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1/jquery.min.js"></script>

   <!-- include flowplayer -->
   <script type="text/javascript" src="flowplayer/flowplayer.min.js"></script>
  <?php endif; ?>

</head>
<body>
  <?php if($video_format=='avi' || $video_format=='wmv'):?>
  
  <object type="application/x-vlc-plugin" pluginspage="http://www.videolan.org" data="http://xxx-xxx.com/demo/videos/<?php echo $folder_name.'/'.$video_name; ?>" width="400" height="300" id="video1">
     <param name="movie" value="http://xxx-xxx.com/demo/videos/<?php echo $folder_name.'/'.$video_name; ?>"/>
     <embed type="application/x-vlc-plugin" name="video1"
     autoplay="no" loop="no" width="400" height="300"
     target="http://xxx-xxx.com/demo/videos/<?php echo $folder_name.'/'.$video_name; ?>" />
  <br />
     <a href="http://xxx-xxx.com/demo/videos/<?php echo $folder_name.'/'.$video_name; ?>">Download Video</a>
  <br />
   Video Not Played? download <a href="http://www.videolan.org" target="_blank">VLC Player</a> 
</object>

  
  <?php else: ?>
  <!-- the player -->
   <div class="flowplayer" data-swf="flowplayer/flowplayer.swf" data-ratio="0.4167">
      <video>
          <source src="http://xxx-xxx.com/demo/videos/<?php echo $folder_name.'/'.$video_name; ?>" type='video/<?php echo $video_format; ?>' />
      </video>
   </div>
  <?php endif; ?>
</body>
</html>


Include Latex in Blogger

The original post is at:

http://webapps.stackexchange.com/questions/13053/mathjax-on-blogger

The basic step is to include the following javascript in the header section of the template (which can be accessed by Design→Edit HTML→Edit Template)

<script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js">
MathJax.Hub.Config({
 extensions: ["tex2jax.js","TeX/AMSmath.js","TeX/AMSsymbols.js"],
 jax: ["input/TeX", "output/HTML-CSS"],
 tex2jax: {
     inlineMath: [ ['$','$'], ["\\(","\\)"] ],
     displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
 },
 "HTML-CSS": { availableFonts: ["TeX"] }
});
</script>

With the body, the blogger can write something like the one below:

$f(x, y)=\frac{x^2}{y}$

which will be turned into:

$f(x, y)=\frac{x^2}{y}$

Criteria to find the best rules in association rule mining

Criteria 1: Support Level
In association rule mining (ARM), we can rank the association rules by their support level. In this way, for the combination of two product or items, we may measure the fitness of an association rule (Ci, Nj) by its support level in the database Support(Ci, Nj).

Support(Ci, Nj) can be measured as ActualCount(Ci, Nj) or ActualCount(Ci, Nj) / TotalCount

Criteria 2: Chi-Square
However, from statistics, we know that it is possible the high support level may be by chance. Therefore, we want to know that the high support level, Support(Ci, Nj), of an association rule is a true phenomenon instead of being randomly created by chance. To this end, we can measure the ChiSquare(Ci, Nj). The details of ChiSquare(Ci, Nj) calculation can be found in the following blog:

http://czcodezone.blogspot.sg/2013/08/three-approaches-to-measure-how.html

The higher the ChiSquare(Ci, Nj), the least likely that Support(Ci, Nj) is a random sample by chance. With this, if we combine the Support(Ci, Nj) and ChiSquare(Ci, Nj), we can find the best rules by selecting association rules with both high Support(Ci, Nj) and ChiSquare(Ci, Nj).

Criteria 3: Lift
Another criteria is lift, which can be measure by:

Lift(Ci, Nj) = [ActualCount(Ci, Nj) / ActualCount(Ci)] / [ActualCount(Nj) / TotalCount]

Lift tells us how much better the rule does rather than guessing, the higher Life(Ci, Nj), the better the rule (Ci, Nj)

Criteria 4: Confidence
Confidence is measured by:

Confidence(Ci, Nj) = ActualCount(Ci, Nj) / ActualCount(Ci)

It measure how often the rule (Ci, Nj) is true, given the Ci is true, the higher Confidence(Ci, Nj), the better the rule


Three approaches to measure how different are the different samples

Approach 1: Measure how far a random sample's average value is from the average of another sample. 

Given the following two samples:

Distribution(sample 1) is ~N(mu1, sigma1)
Distribution(sample 2) is ~N(mu2, sigma2)

In above, sample 1 may be a random sample drawn from sample 2.Now suppose we want to know how far the sample 1's average value mu1 is from sample sample 2's average. The difference can be measured by the following equation.

<difference>=2 * MIN( 1- NORMSDIST(<z-score>), NORMSDIST(<z-score>))

where <z-score> = (mu1 - mu2) / sigma2. NORMSDIST(x) measure the area below the normal distribution curve from negative infinity to x. MIN(x, y) returns the minimum of (x, y).

Approach 2: Chi-Square Test

The chi-square test provides another method for addressing "how different is different?". The chi-square test is appropriate when there are multiple dimensions being compared to each other. In other words, two samples are compared across different dimensions. The chi-square test does not create confidence intervals, because confidence intervals do not make as much sense across multiple dimensions.

Suppose we have two samples by dividing an overral sample: sample1 and sample2. Sample1 has N1 items, and sample2 has N2 items. Each sample can divided into sub-groups: C1, C2, C3 using a single dimension (or feature), we have

C1(sample1)+C2(sample1)+C3(sample1)=N1
C1(sample2)+C2(sample2)+C3(sample2)=N2

C1(sample1)+C1(sample2)=C1
C2(sample1)+C2(sample2)=C2
C3(sample1)+C3(sample2)=C3

From above, we will have C1+C2+C3 = N1+N2

Now we calculate the expected value of C1(sample1), ... C3(sample2) as follows:

E[C1(sample1)]=(C1 * N1) / (N1+N2)
E[C2(sample1)]=(C2 * N1) / (N1+N2)
E[C3(sample1)]=(C3 * N1) / (N1+N2)
E[C1(sample2)]=(C1 * N2) / (N1+N2)
E[C2(sample2)]=(C2 * N2) / (N1+N2)
E[C3(sample2)]=(C3 * N2) / (N1+N2)

Next we calculate the deviation of the actual value (e.g. C1(sample1)) from the expected value (e.g. E[C1(sample1))], as follows:

D[C1(sample1)]=C1(sample1) - E[C1(sample1)]
....
D[C3(sample2)]=C3(sample2) - E[C3(sample2)]

Next we calculate the chi-square values, as follows:

ChiSquare[C1(sample1)]=D[C1(sample1)]^2 / E[C1(sample1)]

With the chi-squares value calculated, we can calculate the chi-square for each sample, as follows:

ChiSquare(sample1)=ChiSquare[C1(sample1)]+ChiSquare[C2(sample1)]+ChiSquare[C3(sample1)]
ChiSquare(sample2)=ChiSquare[C1(sample2)]+ChiSquare[C2(sample2)]+ChiSquare[C3(sample2)]

Note that the above calculation can be extended to arbitrary number of samples, each sample may be a sub-category (or sub-set) of an overall sample. To generalize this, given the overall sample is measured into two different dimensions or categories, with the first dimension measured in three different values (C1, C2, C3) and second dimension measured in two different values (N1, N2), then the chi-square for each cell (e.g. the cell (C1, N1)) can be calculated as

ChiSquare(C1, N1)=D(C1, N1)^2 / E(C1, N1)

Where
D(C1, N1)=ActualCount(C1, N1) - E(C1, N1)
E(C1, N1)=(ActualCount(C1)*ActualCount(N1)) / TotalCount)

TotalCount=(ActualCount(N1)+ActualCount(N2)=ActualCount(C1)+ActualCount(C2)+ActualCount(C3)
ActualCount(C1)=ActualCount(C1, N1)+ActualCount(C1, N2)
ActualCount(N1)=ActualCount(C1, N1)+ActualCount(C2, N1)+ActualCount(C3, N1)

The higher the value of ChiSquare(C1, N1), the further the actual count of combination (C1, N1) deviates from the expected count of (C1, N1). In other words, the higher the ChiSquare(C1, N1), the more unexpected the combination of (C1, N1) is. Or we can say, the higher the ChiSquare(C1, N1), the least likely that ActualCount(C1, N1) occurs by chance from the overall sample distribution (i.e. its distribution is more significantly different from the overall sample).

The chi-square values follows the distribution known as chi-square distribution. The degree of freedom for the chi-square distribution, k, is measured by the product of one-minus-count of each dimension, for example, the dimension 1 (C1, C2, C3) has 3 category, dimension 2 (N1, N2) has 2 category, the k value is therefore:

k=(3-1) * (2-1)=3



Approach 3: Student-t Test
This is described in another post.




Relationship between confidence level, confidence interval

Suppose a particular phenomenon follows a normal distribution, ~(mu, sigma), two points on the horizontal axis of the normal distribution curve define the ends of the confidence interval, and the area under the curve, between the two points, measures the confidence.

For normal distribution ~(mu, sigma), about 95.5% fall within two standard deviation (sigma * 2). By tradition, statistical significance is often taken at the 95% level, and this occurs at the (1.96 * sigma) deviations from the average.

The distance from a value to the average, measured in standard deviation, is called the z-score

z-score=(value - mu) / sigma

using z-score, the confidence level can be computed by:

<confidence level>=NORMSDIST(<z-score>) - NORMSDIST(-<z-score>)

where NORMSDIST(<z-score>) measures the area from negative inifiity to <z-score> under the normal distribution curve.

The 95% confidence level occurs at confidence interval (z-score=-1.96, z-score=1.96). The confidence of 95% implies that the confidence interval ranges 1.96 times the standard deviaion on either side of the average. In other words, the confidence interval for the phenomenon (e.g. a poll) with 95% confidence ranges from (mu - 1.96 * sigma) to (mu + 1.96 * sigma)


Saturday, August 10, 2013

Convert query from MS SQL Server to MS SQL Server Compact Edition 3.5

Issue 1: Data Type
In MS SQL Server, when creating a datatable, user can specify data type [char], [smalldatetime] and [varchar] as shown in the example below:

CREATE TABLE [product](
 [PRODUCTID] [int] NULL,
 [PRODUCTNAME] [varchar](50) NULL,
 [PRODUCTGROUPCODE] [varchar](50) NULL,
 [PRODUCTGROUPNAME] [varchar](50) NULL,
 [INSTOCKFLAG] [char](50) NULL,       
 [FULLPRICE] [money] NULL
) 

However, when port the above query to MS SQL Server Compact, the data types [char] and [varchar] must be changed to [nchar] and [nvarchar] (i.e. unicode-based [char] and [varchar], and the [smalldatetime] must be changed to [datetime] as shown in the example below:

CREATE TABLE [product](
 [PRODUCTID] [int] NULL,
 [PRODUCTNAME] [nvarchar](50) NULL,
 [PRODUCTGROUPCODE] [nvarchar](50) NULL,
 [PRODUCTGROUPNAME] [nvarchar](50) NULL,
 [INSTOCKFLAG] [nchar](50) NULL,
 [FULLPRICE] [money] NULL
) 

Issue 2:

Thursday, August 8, 2013

C#: Replace in Files

Below shows the method that replace a text in files within a directory which is iteratively searched:
public void ReplaceInFiles(string dir_path, Dictionary<string, Dictionary<int, string>> results, string old_text, string new_text, string extension=null)
{
 DirectoryInfo dir = new DirectoryInfo(dir_path);
 DirectoryInfo[] sub_dirs = dir.GetDirectories();
 FileInfo[] files = dir.GetFiles();

 foreach (FileInfo file in files)
 {
  string file_path=file.FullName;
  if (!string.IsNullOrEmpty(extension) && file.Extension != extension) continue;

  Dictionary<int, string> found_locations = null;
  using (StreamReader reader = new StreamReader(file.OpenRead()))
  {
   string line;
   int line_index=0;
   while ((line = reader.ReadLine()) != null)
   {
    line_index++;
    if (line.Contains(old_text))
    {
     if (results.ContainsKey(file_path))
     {
      found_locations = results[file_path];
     }
     else
     {
      found_locations = new Dictionary<int, string>();
      results[file_path] = found_locations;
     }
     found_locations[line_index] = line;
    }
   }
  }

  if(found_locations!=null)
  {
   string file_content=null;
   using(StreamReader reader =new StreamReader(file.OpenRead()))
   {
    file_content=reader.ReadToEnd();
   }
   file_content=file_content.Replace(old_text, new_text).TrimEnd(Environment.NewLine.ToCharArray());
   using(StreamWriter writer=new StreamWriter(file.OpenWrite()))
   {
    writer.WriteLine(file_content);
    writer.Flush();
   }
  }
  
 }

 foreach (DirectoryInfo sub_dir in sub_dirs)
 {
  string sub_dir_path=sub_dir.FullName;
  ReplaceInFiles(sub_dir_path, results, old_text, new_text, extension);
 }
}

Below is the code snippet which shows how to use the method above:
string dir_path="C:\temp";
string text_to_find="SimuKit.DomainModels\";
string text_to_replace="packages\SimuKit.DomainModels\";
string extension=".sln";
Dictionary<string, Dictionary<int, string>> results=new Dictionary<string, Dictionary<int, string>>();

ReplaceInFiles(dir_path, results, text_to_find, text_to_replace, extension);
foreach (string file_path in results.Keys)
{
 Dictionary<int, string> line_content = results[file_path];
 
 foreach (int line_index in line_content.Keys)
 {
  Console.writeline("{0} is replaced in file {1} in line {2}, original content: {3}", text_to_find, file_path, line_index, line_content[line_index]);
 }
}

C#: Find in Files

The following method returns the list of files in a directory and the line location within the files that contains a particular text string. it iteratively search the sub directories of the directory.

public void FindInFiles(string dir_path, Dictionary<string, Dictionary<int, string>> results, string text_to_find, string extension=null)
{
 DirectoryInfo dir = new DirectoryInfo(dir_path);
 DirectoryInfo[] sub_dirs = dir.GetDirectories();
 FileInfo[] files = dir.GetFiles();

 foreach (FileInfo file in files)
 {
  string file_path=file.FullName;
  if (!string.IsNullOrEmpty(extension) && file.Extension != extension) continue;

  using (StreamReader reader = new StreamReader(file.OpenRead()))
  {
   string line;
   int line_index=0;
   while ((line = reader.ReadLine()) != null)
   {
    line_index++;
    if (line.Contains(text_to_find))
    {
     Dictionary<int, string> found_locations = null;
     if (results.ContainsKey(file_path))
     {
      found_locations = results[file_path];
     }
     else
     {
      found_locations = new Dictionary<int, string>();
      results[file_path] = found_locations;
     }
     found_locations[line_index] = line;
    }
   }
  }
 }

 foreach (DirectoryInfo sub_dir in sub_dirs)
 {
  string sub_dir_path=sub_dir.FullName;
  FindInFiles(sub_dir_path, results, text_to_find, extension);
 }
}
The following code snippet shows how to use this method:
string dir_path="C:\temp";
string text_to_find="SimuKit.DomainModels\";
string extension=".sln";
Dictionary<string, Dictionary<int, string>> results=new Dictionary<string, Dictionary<int, string>>();

FindInFiles(dir_path, results, text_to_find, extension);
foreach (string file_path in results.Keys)
{
 Dictionary<int, string> line_content = results[file_path];
 
 foreach (int line_index in line_content.Keys)
 {
  Console.writeline("{0} appears in file {1} in line {2}, content: {3}", text_to_find, file_path, line_index, line_content[line_index]);
 }
}

Monday, August 5, 2013

Enumerate files in a directory using PHP

<?php

if ($handle = opendir('./files')) 
{
    while (false !== ($entry = readdir($handle))) 
 {
  echo $entry;
    }

    closedir($handle);
}
?>

Data Encryption in PHP

The encryption.inc includes the encryption utility which prevents the user from deciphering the token. The source codes of the encryption.inc is shown below:

<?php

class Encryption {
    var $skey = '[YOUR-SECRET-KEY]';

    public  function safe_b64encode($string) {
        $data = base64_encode($string);
        $data = str_replace(array('+','/','='),array('-','_',''),$data);
        return $data;
    }

    public function safe_b64decode($string) {
        $data = str_replace(array('-','_'),array('+','/'),$string);
        $mod4 = strlen($data) % 4;
        if ($mod4) {
            $data .= substr('====', $mod4);
        }
        return base64_decode($data);
    }

    public  function encode($value){ 
        if(!$value){return false;}
        $text = $value;
        $iv_size = mcrypt_get_iv_size(MCRYPT_RIJNDAEL_256, MCRYPT_MODE_ECB);
        $iv = mcrypt_create_iv($iv_size, MCRYPT_RAND);
        $crypttext = mcrypt_encrypt(MCRYPT_RIJNDAEL_256, $this->skey, $text, MCRYPT_MODE_ECB, $iv);
        return trim($this->safe_b64encode($crypttext)); 
    }

    public function decode($value){
        if(!$value){return false;}
        $crypttext = $this->safe_b64decode($value); 
        $iv_size = mcrypt_get_iv_size(MCRYPT_RIJNDAEL_256, MCRYPT_MODE_ECB);
        $iv = mcrypt_create_iv($iv_size, MCRYPT_RAND);
        $decrypttext = mcrypt_decrypt(MCRYPT_RIJNDAEL_256, $this->skey, $crypttext, MCRYPT_MODE_ECB, $iv);
        return trim($decrypttext);
    }
}

?>
To use the above facility, the following is the demo
$converter = new Encryption;
$raw='Hello World';
$encoded = $converter->encode($raw);
$decoded = $converter->decode($encoded);  

Send HTML Email in PHP

ini_set("SMTP","[YOUR-EMAIL-SERVER-ADDRESS]");  // this line 184 for me  

$to      = '[YOUR-TO-EMAIL-ACCOUNT]';
$subject = 'This is the subject line';
$message = '

Hello World!


'; $message .=''; $headers = "MIME-Version: 1.0" . "\r\n"; $headers .= "Content-type:text/html;charset=iso-8859-1" . "\r\n"; $headers .= 'From: <from@example.com>' . "\r\n"; $headers .= 'Cc: cc@example.com' . "\r\n"; mail($to, $subject, $message, $headers);

A simple tutorial on creating software registration and download page with encrypted one day valid token using PHP

This tutorial shows how to create a registration page that will automatically send user an automatically generated software download link using encrypted one day valid token after user register. For simplicity, the database logging is omitted.

Step 1: Create a registration form page
Create a simple registration form page registration.htm as in the follows:

<html>
<body>
Software Application Download Registration <br>

<form id="regForm" action="submit.php" method="post">

<table>
  <tbody>
  <tr>
    <td><label for="fname">First Name:</label></td>
    <td><div class="input-container"><input name="fname" id="fname" type="text" /></div></td>
  </tr>
  <tr>
    <td><label for="lname">Last Name:</label></td>
    <td><div class="input-container"><input name="lname" id="lname" type="text" /></div></td>
  </tr>
  <tr>
    <td><label for="email">Your Email:</label></td>
    <td><div class="input-container"><input name="email" id="email" type="text" /></div></td>
  </tr>
  <tr>
    <td><label for="company">Company:</label></td>
    <td><div class="input-container"><input name="company" id="company" type="text" /></div></td>
  </tr>
  <tr>
  <td> </td>
  <td><input type="submit" class="greenButton" value="Register and Download" /><img id="loading" src="img/ajax-loader.gif" alt="working.." />
</td>
  </tr>
  
  
  </tbody>
</table>

</form>
</body>
</html>

Step 2: Create registration submission processing server script in PHP
Create the submit.php page which processes the registration information from the  registration.htm page. the submit.php page sends the registered user an email with a link to download the software, the link is valid for the current date (using the concept of token), the submit.php also cc the email to [YOUR-EMAIL-CC-ACCOUNT] (As this is a simple demo, the validation of registration information is not included here)

<?php

require_once('encryption.inc');

$token="dfjsl@ew".date('Ymd')."oqiej".date('Y')."!#@#@1".date('m')."2331";

$converter = new Encryption;
$encoded = $converter->encode($token);

$mail_server="[YOUR-MAIL-SERVER-IP-ADDRESS]";

ini_set("SMTP",$mail_server);  

$to      = $_POST['email'];
$cc     = '[YOUR-EMAIL-CC-ACCOUNT]';

$subject = 'XXX Application Download';
$message = '<html><body><h2>User registration information: </h2><br />';
$message .= '<table>';
$message .='<tr><td><b>First Name:</b></td><td>'.$_POST['fname'].'</td></tr>';
$message .='<tr><td><b>Last Name: </b></td><td>'.$_POST['lname'].'</td></tr>';
$message .='<tr><td><b>Email: </b></td><td>'.$_POST['email'].'</td></tr>';
$message .='<tr><td><b>Company: </b></td><td>'.$_POST['company'].'</td></tr>';
$message .='</table>';

$elink = '[YOUR-WEBSITE-DOMAIN-AND-DIRECTORY]/download.php?token='.urlencode($encoded).'&attachment=vrpmac';
$message .= 'Thank you for your interest. Below is a one-day expired download link to the software.<br/>';
$message .= '<a href="'.$elink.'">'.'Download'.'</a><br>';
 
$message .='</body></html>';
 
$headers = "MIME-Version: 1.0" . "\r\n";
$headers .= "Content-type:text/html;charset=iso-8859-1" . "\r\n";

$headers .= 'From: <[YOUR-EMAIL-FROM-ACCOUNT]>' . "\r\n";
$headers .= 'Bcc: ' . $cc . "\r\n";

mail($to, $subject, $message, $headers);

echo msg(1,"registered.php?token=".urlencode($encoded).'&email='.urlencode($_POST['email']));


function msg($status,$txt)
{
 return '{"status":'.$status.',"txt":"'.$txt.'"}';
}
?>

The $token variable contains a token which makes the software download page generated valid for one day, to increase the security you may want to create more sophisticated token using encryption and random numbers.

The [YOUR-MAIL-SERVER-IP-ADDRESS] refers to your mail server address. The [YOUR-EMAIL-CC-ACCOUNT] refers to a cc account when the registration email is sent to the registered user. The [YOUR-WEBSITE-DOMAIN-AND-DIRECTORY] is the root directory which contains the download.php page (which is the page that provides the software download service), The [YOUR-EMAIL-FROM-ACCOUNT] is the from field in the email sent to the registered user.

The encryption.inc includes the encryption utility which prevents the user from deciphering the token. The source codes of the encryption.inc is shown below:

<?php

class Encryption {
    var $skey = '[YOUR-SECRET-KEY]';

    public  function safe_b64encode($string) {
        $data = base64_encode($string);
        $data = str_replace(array('+','/','='),array('-','_',''),$data);
        return $data;
    }

    public function safe_b64decode($string) {
        $data = str_replace(array('-','_'),array('+','/'),$string);
        $mod4 = strlen($data) % 4;
        if ($mod4) {
            $data .= substr('====', $mod4);
        }
        return base64_decode($data);
    }

    public  function encode($value){ 
        if(!$value){return false;}
        $text = $value;
        $iv_size = mcrypt_get_iv_size(MCRYPT_RIJNDAEL_256, MCRYPT_MODE_ECB);
        $iv = mcrypt_create_iv($iv_size, MCRYPT_RAND);
        $crypttext = mcrypt_encrypt(MCRYPT_RIJNDAEL_256, $this->skey, $text, MCRYPT_MODE_ECB, $iv);
        return trim($this->safe_b64encode($crypttext)); 
    }

    public function decode($value){
        if(!$value){return false;}
        $crypttext = $this->safe_b64decode($value); 
        $iv_size = mcrypt_get_iv_size(MCRYPT_RIJNDAEL_256, MCRYPT_MODE_ECB);
        $iv = mcrypt_create_iv($iv_size, MCRYPT_RAND);
        $decrypttext = mcrypt_decrypt(MCRYPT_RIJNDAEL_256, $this->skey, $crypttext, MCRYPT_MODE_ECB, $iv);
        return trim($decrypttext);
    }
}

?>

The registered.php is a page displayed to the user after user submit the registration form. The page informs user and he has successfully registered (You can include the download link here if you like). The page checks whether the token is included in the url, if not or if the token is not valid, then the user is redirected to the registration.htm page. The source code of the registered.php is as below:

<?php
 require_once('encryption.inc');
 
 if(!isset($_GET['token']))
 {
  header( 'Location: registration.htm' ) ;
 }
 $message="dfjsl@ew".date('Ymd')."oqiej".date('Y')."!#@#@1".date('m')."2331";
 $converter = new Encryption;
 $encode=$_GET['token'];
 $decoded = $converter->decode($encode);  
 if(strcmp($decoded, $message) != 0)
 {
  header( 'Location: registration.htm' ) ;
 }
?>


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>You've been registered!</title>
<link rel="stylesheet" type="text/css" href="reg.css">
</head>

<body>

<div class="registered">

<h1>The application link will be sent to your email account <?php echo $_GET['email']; ?></h1>
<br /><br /><br />

<br /><br /><br /><br />
<h2>Thank you for your registration. </h2>
</div>

</body>
</html>

Step 3: Create software download page
Below is the source code for the software download page, the download script first checks whether the token is valid or not, if not then redirect user to the registration page. Otherwise, it sends the software.

<?php


require_once('encryption.inc');
 
if(!isset($_GET['token']))
{
 header( 'Location: registration.htm' ) ;
}
$message="dfjsl@ew".date('Ymd')."oqiej".date('Y')."!#@#@1".date('m')."2331";
$converter = new Encryption;
$encode=$_GET['token'];
//$encoded = $converter->encode($message);
$decoded = $converter->decode($encode);  
if(strcmp($decoded, $message) != 0)
{
 header( 'Location: registration.htm' ) ;
}

if(isset($_GET['attachment']))
{
 if($_GET['attachment']=='vrpmac')
 { 
  $file = 'uyk0293fwie0.zip';
  if (file_exists($file)) 
  {
   header('Content-Description: File Transfer');
   header('Content-Type: application/octet-stream');
   header('Content-Disposition: attachment; filename=painter2.zip');
   header('Content-Transfer-Encoding: binary');
   header('Expires: 0');
   header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
   header('Pragma: public');
   header('Content-Length: ' . filesize($file));
   ob_clean();
   flush();
   readfile($file);
  }
 }
}

?>

Load local html file into the winform browser

string url = @"C:\temp.html";
url = url.Replace(@"\", @"/");
url = url.Replace(@" ", @"%20");
url = "file:///" + url;

webBrowser1.Navigate(url);

Sunday, August 4, 2013

Incorporate Google Translate into your C# program

Step 1: Download and add the Google .Net Client to your C# solution.

Download the google-api-dotnet-client-1.4.0-beta.samples from

https://code.google.com/p/google-api-dotnet-client/wiki/Downloads

Unzip the package and add the "GoogleApis.SampleHelper" and "Translate.TranslateText" projects to your C# solution,

Step 2: Configure your C# project for Google Translate.

Add the following libraries from the "google-api-dotnet-client-1.4.0-beta.samples/Lib" or "google-api-dotnet-client-1.4.0-beta.samples/Lib/ThirdParty" folder to your C# project:

DotNetOpenAuth
Google.Apis
Google.Apis.Authentication.OAuth2
Newtonsoft.Json
System.Net.Http
System.Net.Http.Exteions
System.Net.Http.Primitives
System.Net.Http.WebRequest
Zlib.Portable
System.Threading.Tasks

Add "System.Net" and "GoogleApis.SampleHelper" references to your C# project as well.

Next copy the ""Google.Apis.Translate.v2.cs" from the "Translate.TranslateText" project to your C# project.

Next right-click your C# project and selects "Manage NuGet Packages", in the "Manage NuGet Packages" dialog, search and install the following packages:

Microsoft BCL Build Components
Microsoft BCL Portability Package
Microsoft HTTP Client Libraries

Step 3: Activate your google translate services.
You need to activate your google translate service in order to use google translate in your C# program. go to the Google API Console at:

code.google.com/apis/console‎

Select the "Services" menu from the right panel, and turn on the Google translate service from there. now copy your google api access key by selecting the "API Access" from the right panel, and copying the key in the line starting with "API Key:"

Step 4: Implement the coding
Below i provide a simple singleton class for you to use the translation service.

using Google.Apis.Samples.Helper;
using Google.Apis.Services;
using Google.Apis.Translate.v2;
using Google.Apis.Translate.v2.Data;
using TranslationsResource = Google.Apis.Translate.v2.Data.TranslationsResource;

public sealed class TranslationManager
{
 private static TranslationManager mInstance = null;
 private static object mSyncObj = new object();

 public static TranslationManager Instance
 {
  get
  {
   if (mInstance == null)
   {
    lock (mSyncObj)
    {
     mInstance = new TranslationManager();
    }
   }
   return mInstance;
  }
 }

 private TranslationManager()
 {

 }

 private string GetApiKey()
 {
  return "[Your-API-Key]";
 }

 public KeyValuePair<string, string> Translate(string srcText, string target_language = "en")
 {
  string[] text = new string[1] { srcText };
  Dictionary<string, KeyValuePair<string, string>> translation=Translate(text, target_language);
  return translation[srcText];
 }

 public Dictionary<string, KeyValuePair<string, string>> Translate(string[] srcText, string target_language = "en")
 {

  // Create the service.
  var service = new TranslateService(new BaseClientService.Initializer()
  {
   ApiKey = GetApiKey(),
   ApplicationName = "Translate API Your App"
  });

  TranslationsListResponse response = service.Translations.List(srcText, target_language).Execute();
  Dictionary<string, KeyValuePair<string, string>> translations = new Dictionary<string, KeyValuePair<string, string>>();

  int counter = 0;
  foreach (TranslationsResource translation in response.Translations)
  {
   translations[srcText[counter]] = new KeyValuePair<string, string>(translation.TranslatedText, translation.DetectedSourceLanguage);
   counter++;
  }

  return translations;
 }
}

Below is a simple example of how to use the TranslationManager class:

string sourceText = "早上好";
string target_language = "en";

KeyValuePair<string, string> translation=TranslationManager.Instance.Translate(sourceText, target_language);
Console.WriteLine("Translated: {0}", translation.Key);
Console.WriteLine("Source Language: {0}", translation.Value);

Thursday, July 25, 2013

Expert System Shell in Agent-Based Modeling: Demo using Zombie Sim game

Have completed my expert system shell in Java and C#, so rewrote this simple Zombie Sim game (which is originally by http://wr3cktangle.wordpress.com/2008/05/29/zombie-sim-open-source/)  using agent-based modeling, in which instead of hand-coded C# rules, expert system is used for the agent (both human and zombie) decision making.


The demo video can be found at:
http://youtu.be/PWcW6ZuLLJE

The expert system shell support major features such as forward, backward chaining, etc. Below is the code snippet for the human agent expert shell code:

RuleInferenceEngine rie = new RuleInferenceEngine();

Rule rule = new Rule("Rule 1");
rule.AddAntecedent(new IsClause("IsTargetVisible", "yes"));
rule.AddAntecedent(new IsClause("IsTargetAvailable", "yes"));
rule.AddAntecedent(new IsClause("IsCourageous", "yes"));
rule.AddAntecedent(new IsClause("IsUnderAttack", "no"));
rule.setConsequent(new IsClause("Action", "PursuitTarget"));
rie.AddRule(rule);

rule = new Rule("Rule 2");
rule.AddAntecedent(new IsClause("IsUnderAttack", "yes"));
rule.AddAntecedent(new IsClause("IsTargetAvailable", "yes"));
rule.setConsequent(new IsClause("Action", "Attack"));
rie.AddRule(rule);

rule = new Rule("Rule 3");
rule.AddAntecedent(new IsClause("IsTargetAvailable", "yes"));
rule.AddAntecedent(new IsClause("IsCourageous", "no"));
rule.AddAntecedent(new IsClause("IsUnderAttack", "no"));
rule.setConsequent(new IsClause("Action", "RunawayFromTarget"));
rie.AddRule(rule);

rule = new Rule("Rule 4");
rule.AddAntecedent(new IsClause("IsTargetAvailable", "no"));
rule.AddAntecedent(new IsClause("IsFighting", "yes"));
rule.setConsequent(new IsClause("IsFighting", "no"));
rie.AddRule(rule);
Below is the partial code the invoke the expert system for decision making in the human agent
mExpertSystem.ClearFacts();

mExpertSystem.AddFact(new IsClause("IsUnderAttack", is_under_attack ? "yes" : "no"));
mExpertSystem.AddFact(new IsClause("IsTargetVisible", mTarget != null && closest < SpotDistance ? "yes" : "no"));
mExpertSystem.AddFact(new IsClause("IsTargetAvailable", mTarget != null ? "yes" : "no"));
mExpertSystem.AddFact(new IsClause("IsCourageous", mWorld.Random.Next(9) + 1 > mCourage ? "yes" : "no"));

List<clause> unproved_conditions = new List<clause>();

Clause conclusion = mExpertSystem.Infer("Action", unproved_conditions);

if (conclusion == null)
{
 System.Console.WriteLine(mExpertSystem.Facts.ToString());
 Console.WriteLine("/////////////////////////////////////");
 return;
}

if (conclusion.getValue() == "Attack")
{
 mFighting = true;
 mMode = MentalState.Aggressive;
 mTarget.IsAttacked((int)Math.Floor(mWorld.Random.NextDouble() * mStrength));
}
Below are some images showing the ZombieSim Game rewritten using Expert System Shell and Agent-Based Modeling:

Association Rule Learning: Job Posting Skill Set Learning and Employment Matching

Just completed an association rule learning application using job posted on websites as the data source, the current application can do the following:

1. Crawl data from job sites such as indeed, jobdb, etc.
2. Help answer: if you know a particular skill, what other skills you should possess to make you more employable in a particular region or country (e.g., Singapore)
3. Help answer: if you what a particular career, what skill sets you should learn or know to make you successful in employment in a particular region or country (e.g. Singapore)
4. Help answer: if you wish to join a particular company in a particular region or country (Singapore in the demo), what skill sets will increase your chance of being hired by that company
5. Help answer: if you has a particular skill, what companies in a particular region or country (e.g. Singapore) you can join
6. Help answer: if you are working in a particular region or country, what skill sets will make you highly employable here.

The demo video can be viewed at the following link:
http://youtu.be/G52pjiefI5c

Below is a some screenshots:
Crawl Job Sites

Job Post
Learned Skill Set Association

Learned Skill Set - Employment Matching

Association Rule Learning Applications



Sentiment Analysis: Singeat.com Comments Sentiment Analysis

Just completed a demo of sentiment analysis machine learning on comments posted on Singeat.com (新加坡美食网), the comments are in chinese, the sentiment analysis is done to be able to output for multiple criteria such as:

Service (good, bad, neutral, etc)
Taste (good, bad, neutral, etc)
Surrounding (good, bad, neutral, etc)
Overral Rating (good, bad, neutral, etc)

The language is auto-detected (therefore not limited to chinese and english), default model is based on Naive Bayes Classifier and Decision Tree, SVM and neural network , transfer learning are under progress.

The demo video can be viewed at:

http://youtu.be/XRANZezqu24

Below are some screenshots:
Crawl Data

View Comments

Sentiment Analysis (Chinese)

Sentiment Analysis (Chinese)

Sentiment Analysis (English)

Sentiment Analysis (Japanese)


Monday, July 22, 2013

Return the filename from the full file path in MFC

Below is the code snippet for a C++ function to return the filename from the full file path in MFC:

#include "StdAfx.h"

CString CAppUtilManager::ExtractName(CString strFullPath) const
{
 int pos=strFullPath.ReverseFind('\\');
 return strFullPath.Right(strFullPath.GetLength()-pos-1);
}

Return the list of files from a directory in MFC

Below is the code snippet that implements a C++ function to return the list of files in a directory in MFC:


#include "StdAfx.h"

void CAppUtilManager::GetFiles(const CString& parent_folder, const CString& filters, CStringArray& files, BOOL bRecursive) const
{
 CFileFind fFind;
 BOOL bWorking=fFind.FindFile(parent_folder+_T("\\")+filters);

 while(bWorking==TRUE)
 {
  bWorking=fFind.FindNextFile();
  if(fFind.IsDots())
  {
   continue;
  }
  if(fFind.IsDirectory())
  {
   if(bRecursive==TRUE)
   {
    GetFiles(fFind.GetFilePath(), filters, files, bRecursive);
   }
  }
  else
  {
   files.Add(fFind.GetFilePath());
  }
 }
 fFind.Close();
}

Return the list of sub folders from a directory in MFC

Below is the code snippet for a C++ function to return a list of sub folders from a directory in MFC:

#include "StdAfx.h"

void CAppUtilManager::GetSubFolders(const CString& parent_folder, CStringArray& sub_folders) const
{
 CFileFind fFind;
 BOOL bWorking=fFind.FindFile(parent_folder+_T("\\*.*"));

 while(bWorking==TRUE)
 {
  bWorking=fFind.FindNextFile();
  if(fFind.IsDots())
  {
   continue;
  }
  if(fFind.IsDirectory())
  {
   sub_folders.Add(fFind.GetFilePath());
  }
 }
 fFind.Close();
}

Open a Folder Browser with Create Button in MFC

Below is the code snippet that implements a C++ function to open a folder browser in MFC with a Create Folder button, and return the path to the selected folder.


#include "StdAfx.h"
#include "shlobj.h"

CString CAppUtilManager::BrowseForDirectory() const
{
 int MAX_PATH=256;
 TCHAR display_name[MAX_PATH];
 TCHAR path[MAX_PATH];
    BROWSEINFO bi = { 0 };
    bi.lpszTitle = _T("Select an existing or created folder");
 bi.pszDisplayName=display_name;
 bi.ulFlags |= BIF_NEWDIALOGSTYLE;
    LPITEMIDLIST pidl = SHBrowseForFolder(&bi);

 CString directory_path(_T(""));

    if(pidl != 0)
    {
        // get the name of the folder and put it in path
        SHGetPathFromIDList (pidl, path);

        //Set the current directory to path
        directory_path=path;

        // free memory used
        IMalloc * imalloc = 0;
        if ( SUCCEEDED( SHGetMalloc ( &imalloc )) )
        {
            imalloc->Free ( pidl );
            imalloc->Release ( );
        }
    }

 return directory_path;
}

Having TinyXML working with MFC

If you are using TinyXml with MFC, remember to add the following line

#include "stdafx.h"

to the top of the the source files tinystr.cpp, tinyxml.cpp, and tinyxmlerror.cpp, tinyxmlparser.cpp to prevent compilation error in MFC project.

Performing Statistical Hypothesis Test in C++ using AlgLilb: Student's t-test and Wilcoxon

Sometimes when comparing the performance of two algorithms such as in control, optimization, machine learning, etc. the comparison is done by running a number of simulations run on a set of benchmark problems for these algorithms, the statistical performance metrics are then derived from these simulation runs to compare their performances. However, it is usually not sufficient to claim one algorithm/method is better simply based on the average values of the performance metrics. In other words, performance comparison should also consider statistical hypothesis tests such as Student's t-test and Wilcoxon. Details of these methods can be found here:

https://en.wikipedia.org/wiki/Student's_t-test
http://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test

This post is about how to do statistical hypothesis test in C++ using alglib

Step 1: Download AlgLib

Download the AlgLib from the following link:

http://www.4shared.com/zip/ZWXFztx-/alglib-250cpp.html

Step 2: Add AlgLib to C++ project

In this case, I am using VS2008 C++ IDE, unzip the downloaded AlgLib to the project solution folder, and add it to the C++ project by the following properties configuration:

1) Properties->Configuration Properties->C++->General->Additional Include Directories->$(ProjectDir)alglib-2.5.0.cpp\out
2) Properties->Configuration Properties->Linker->General->Additional Library Directories->$(ProjectDir)alglib-2.5.0.cpp\out
3) Properties->Configuration Properties->Linker->Input->libalglib.lib

Step 3: Student's t-test in C++
Suppose you implement your code in a source file main.cpp, define the Student's t-test as shown below in

#include "studentttests.h"
//data1: vector containing simulation results of a performance metric (say MetricA) for algorithm 1
//data2: vector containing simulation results of a performance metric (say MetricA) for algorithm 2
//if left-tail is less than the confidence threshold, left-tail rejected, and we have MetricA (algorithm 1) > MetricA (algorithm 2)
//if right-tail is less than the confidence threshold, right-tail rejected, and we have MetricA (algorithm 1) < MetricA (algorithm 2)
void ComputeStudentT(const std::vector<double>& data1, const std::vector<double>& data2, double& bothtails, double& lefttail, double& righttail)
{
 if(data1.empty() || data2.empty())
 {
  return;
 }

 ap::real_1d_array x;
 ap::real_1d_array y;
 
 int n=static_cast<int>(data1.size());
 x.setlength(n);
 for(int i = 0; i != n; i++)
 {
  x(i) = data1[i];
 }

 int m=static_cast<int>(data2.size());
 y.setlength(m);
 for(int i=0; i != m; ++i)
 {
  y(i)=data2[i];
 }

 /*************************************************************************
 Two-sample unpooled test

 This test checks three hypotheses about the mean of the given samples. The
 following tests are performed:
  * two-tailed test (null hypothesis - the means are equal)
  * left-tailed test (null hypothesis - the mean of the first sample  is
    greater than or equal to the mean of the second sample)
  * right-tailed test (null hypothesis - the mean of the first sample is
    less than or equal to the mean of the second sample).

 Test is based on the following assumptions:
  * given samples have normal distributions
  * samples are independent.
 Dispersion equality is not required

 Input parameters:
  X - sample 1. Array whose index goes from 0 to N-1.
  N - size of the sample.
  Y - sample 2. Array whose index goes from 0 to M-1.
  M - size of the sample.

 Output parameters:
  BothTails   -   p-value for two-tailed test.
      If BothTails is less than the given significance level
      the null hypothesis is rejected.
  LeftTail    -   p-value for left-tailed test.
      If LeftTail is less than the given significance level,
      the null hypothesis is rejected.
  RightTail   -   p-value for right-tailed test.
      If RightTail is less than the given significance level
      the null hypothesis is rejected.

   -- ALGLIB --
   Copyright 18.09.2006 by Bochkanov Sergey
 *************************************************************************/
 
 unequalvariancettest(x, n, y, m, bothtails, lefttail, righttail);
}

For ComputeStudentT() method, the parameter data1 is a vector containing simulation results of a performance metric (say MetricA) for algorithm 1, which is obtained from simulation runs on a benchmark problem (suppose there are 30 simulation runs, then data1 is a vector of length 30), while data2 is a vector containing results of MetricA for algorithm 2, which is obtained from simulation runs on the same benchmark problem.

Below shows how one can use the CompareStudentT() in the coding

RunSimulationsToObtainMetricAForAlgorithm1();
RunSimulationsToObtainMetricAForAlgorithm2();

std::vector<double> data1;
LoadMetricAForAlgorithm1IntoVector(data1);

std::vector<double> data2;
LoadmetricAForAlgorithm2IntoVector(data2);

double bothtails=0, lefttail=0, righttail=0;

double p_threshold=0.05; //set p threshold to 0.05 for 95% confidence level

ComputeStudentT(data1, data2, bothtails, lefttail, righttail);

/*
* two-tailed test (null hypothesis - the means are equal)
* left-tailed test (null hypothesis - the mean of the first sample  is
  greater than or equal to the mean of the second sample)
* right-tailed test (null hypothesis - the mean of the first sample is
  less than or equal to the mean of the second sample).
*/
if(bothtails < p_threshold)
{
 //null hypothesis rejected, the mean of data1 is either greater or less than tat of data2
 if(lefttail < p_threshold && righttail > p_threshold)
 {
  std::cout << "The true mean of MetricA(algorithm1) is smaller than tat of MetricA(algorithm2)" << std::endl;
 }
 else if(lefttail > p_threshold && righttail < p_threshold)
 {
  std::cout << "The true mean of MetricA(algorithm1) is greater than tat of MetricA(algorithm2)" << std::endl;
 }
 else
 {
  std::cerr << "error: t stat failed" << std::endl;
  exit(0);
 }
}


Step 4: Wilcoxon test in C++ 

Below shows the Wilcoxon test method in C++, the interface and usage of ComputeWilcoxon() method is same as ComputeStudentT() method

#include "wsr.h"

//if left-tail is less than the confidence threshold, left-tail rejected, and we have data1 > data2
//if right-tail is less than the confidence threshold, right-tail rejected, and we have data2 < data1
void ComputeWilcoxon(const std::vector<double>& data1, const std::vector<double<& data2, double& bothtails, double& lefttail, double& righttail)
{
 if(data1.empty() || data2.empty())
 {
  return;
 }

 ap::real_1d_array x;
 ap::real_1d_array y;
 
 int n=static_cast<int>(data1.size());
 int m=static_cast<int>(data2.size());

 if(n > m)
 {
  n=m;
 }

 x.setlength(n);
 for(int i = 0; i != n; i++)
 {
  x(i) = (data1[i] - data2[i]);
 }

 double assumed_median=0; //the given value

 /*************************************************************************
 Wilcoxon signed-rank test

 This test checks three hypotheses about the median  of  the  given sample.
 The following tests are performed:
  * two-tailed test (null hypothesis - the median is equal to the  given
    value)
  * left-tailed test (null hypothesis - the median is  greater  than  or
    equal to the given value)
  * right-tailed test (null hypothesis  -  the  median  is  less than or
    equal to the given value)

 Requirements:
  * the scale of measurement should be ordinal, interval or  ratio (i.e.
    the test could not be applied to nominal variables).
  * the distribution should be continuous and symmetric relative to  its
    median.
  * number of distinct values in the X array should be greater than 4

 The test is non-parametric and doesn't require distribution X to be normal

 Input parameters:
  X       -   sample. Array whose index goes from 0 to N-1.
  N       -   size of the sample.
  Median  -   assumed median value.

 Output parameters:
  BothTails   -   p-value for two-tailed test.
      If BothTails is less than the given significance level
      the null hypothesis is rejected.
  LeftTail    -   p-value for left-tailed test.
      If LeftTail is less than the given significance level,
      the null hypothesis is rejected.
  RightTail   -   p-value for right-tailed test.
      If RightTail is less than the given significance level
      the null hypothesis is rejected.

 To calculate p-values, special approximation is used. This method lets  us
 calculate p-values with two decimal places in interval [0.0001, 1].

 "Two decimal places" does not sound very impressive, but in  practice  the
 relative error of less than 1% is enough to make a decision.

 There is no approximation outside the [0.0001, 1] interval. Therefore,  if
 the significance level outlies this interval, the test returns 0.0001.

   -- ALGLIB --
   Copyright 08.09.2006 by Bochkanov Sergey
 *************************************************************************/
 wilcoxonsignedranktest(x, n, assumed_median, bothtails, lefttail, righttail);
}


Access SQLite using C++

This is a description of accessing SQLite using C++

Step 1: Download the CppSQLite

For me, I have been using the version which can be downloaded from:

http://www.4shared.com/zip/aNL2EdT_/sqlite.html

Step 2: Build the CppSQLite library

In this example, I am using VS2008 C++ IDE,

1) Extract the downloaded content to the folder sqlite
2) Open sqlite\SQLite_Static_Library\SQLite_Static_Library.sln
3) You may need to change the runtime libraries of the project to match those of the executable that will include the libraries, in order to avoid linker errors.
4) Build the library.

Step 3: Add the CppSQLite to your project

1) copy the sqlite folder to your solution folder
2) Include the library directory in your application (Project->Properties->Configuration Properties->Linker->General->Additional Library Directories->["$(ProjectDir)sqlite\SQLite_Static_Library\release"])
3) Include the static library built in step 2 (Project->Properties->Configuration Properties->Linker->Input->Additional Dependencies->[SQLite_Static_Library.lib])
4) Add the C++ wrapper to your project.  It's in the sqlite root dir:

      CppSQLite3.h
      CppSQLite3.cpp

Step 4: Implement code to access SQLite

The following a simple singleton class written in C++:

#ifndef _H_DB_MANAGER_H
#define _H_DB_MANAGER_H

#include <sstream>
#include <ctime>
#include "CppSQLite3.h"

class DBManager
{
public:
 virtual ~DBManager()
 {
  
 }

private:
 DBManager()
 {
  
 }

 DBManager(const DBManager& rhs) { }
 DBManager& operator= (const DBManager& rhs) { return *this; }

public:
 void record_data(int attr1_value, const std::string& attr2_value, int attr3_value)
 {
  std::ostringstream oss;
  oss << "INSERT INTO demo_table (attr1, attr2, attr3) VALUES (" << attr1_value << ", '" << attr2_value << "', " << attr3_value << ");" ;
  try{
   mDB.execDML(oss.str().c_str());
  }catch(CppSQLite3Exception& e)
  {
   std::cerr << e.errorCode() << ": " << e.errorMessage() << "\n";
  }
 }

public:
 static DBManager* getSingletonPtr()
 {
  static DBManager theInstance;
  return &theInstance;
 }

public:
 void open(const char* dbname)
 {
  try{
   remove(dbname);
   mDB.open(dbname);

   mDB.execDML("CREATE TABLE demo_table (id INTEGER PRIMARY_KEY, attr1 INTEGER, attr2 TEXT, attr3 INTEGER);");
  
  }catch (CppSQLite3Exception& e)
  {
   std::cerr << e.errorCode() << ":" << e.errorMessage() << "\n";
  }
 }
 void close()
 {
  mDB.close();
 }

private:
 CppSQLite3DB mDB;
};
#endif

Below is a simple explanation of the database manager class
  • The open() method recreate the database, and then create a datatable "demo_table" in the database file with three fields: attr1, attr2, attr3, attr1 and attr3 are INTEGER while attr2 is a TEXT. 
  • The record_data() method record a single row into the database "demo_table" 
  • The close() method should be called at the end of database operation to create database connection


To use the singleton class above, it is very easy, below is a simple example:
#include "DBManager.h"
DBManager::getSingletonPtr()->open("demo.db");
DBManager::getSingletonPtr()->record_data(1, "Hello World", 2);
DBManager::getSingletonPtr()->close();

Add Code Syntax Highlighter to your Blogger

Go to the "Template" menu and click "Edit Html" below the current template, add the following lines above the </head>
 
 

In your content, surround the code block by <pre> element, like to one shown below:

<pre class="brush: csharp">private void Download(string url)
{
    WebClient client = new WebClient();
    client.DownloadFileCompleted +=    new AsyncCompletedEventHandler(client_DownloadFileCompleted);
    client.DownloadFileAsync(new Uri(url), @"c:\temp.html");
}

void client_DownloadFileCompleted(object sender, AsyncCompletedEventArgs e)
{
    //do something here
}
</pre>

which will be rendered as:
private void Download(string url)
{
    WebClient client = new WebClient();
    client.DownloadFileCompleted +=    new AsyncCompletedEventHandler(client_DownloadFileCompleted);
    client.DownloadFileAsync(new Uri(url), @"c:\temp.html");
}

void client_DownloadFileCompleted(object sender, AsyncCompletedEventArgs e)
{
    //do something here
}
If your code block contains template such as the one below:

Dictionary<string, string> some_dict=new Dictionary<string, string>();

Change "Dictionary<string, string>" to "Dictionary&lt;string, string&gt;" so that the code syntax highlighter won't display errorly.

One further thing that i found out is that if you are using dynamic view template for your blogger, the code syntax highlighter probably won't work

Sunday, July 21, 2013

Singeat.com Map-based Desktop using Web Crawling Technique

Last friday, I decided to have a relaxed night with my younger brother by going out and rewarding ourselves a good meal. Being lazy and nerdy, my tendency for finding a nice place to eat is usually by searching online. We searched singeat.com (新加坡美食网)for a nice hotpot corner near the place we stay so that we only commute a short distance back and forth (yeah that lazy). singeat.com is nice, offering many comprehensive information, e.g. food category, search area, photos, comments, groupon, order, and other comprehensive information such as surrouding, main cuisines, parking, and so on.

But then when it comes to find a hotpot restaurant near the place we stay, it becomes troublesome. Firstly, the place we stay is far from the city down town, which means not a lot of restaurants around. Secondly, they don't offer a map which one can select the restaurants near a particular location user specifies (e.g. something like gothere.sg and nearby.sg). Thirdly, while navigating page by page certainly generate traffic flows for singeat.com, it makes impatient web users like me even more impatient. Therefore, while enjoying hotpot with my brother on friday night, I decided to do something, maybe save myself the trouble experienced when using singeat.com - by writing an map-based application which shows any type of restaurant with rating from nearby the place i indicated on the map. It took my saturday to come up with a desktop application that does what i want :) The following list down the steps i used to create this app.

Step 1: What I need to decide before creating the application
The first thing i need to decide is what type of application i am to create: since this is for my personal usage, and i want to do it quickly, i decided to write the application in C# Winform. 

The second thing i need is how am i going to obtain data from singeat.com: I am not aware of any web services or rss exposed by singeat.com that will allow me to retrieve most of the information from their website, which means i need to write a web crawler to help me crawl and organize information from their website. 

The third thing i need is to display those information on a map on the desktop which user can interact such as performing search, view rating and photos and so on.

Step 2: Technologies to deploy in my application
As the application development is going to be a RAD (yeah i completed in one day), I need to use tools and libraries that are already out there so as to help me shorten the time of development. The following are the tools that i ultimately used in the application:


  • Html Agility Pack: parsing the html web pages downloaded from singeat for information and photos
  • GMap.Net: display map on winform as well as performing geocoding and reverse geocoding.
  • Custom TabControl: a tab control that makes winform tab looks better (since i uses quite a number of tab controls in this app, i make some extra efforts to make them look nicer)

Step 3: Implement codes to crawl singeat.com
In order to retrieve information from singeat.com, my implementation code first go to their search page and crawl the search fields such as SearchFood, SearchAreas, etc. This is done using Html Agility Pack to parse the hidden input fields and <select> elements on the search page.

Once i obtained the list of search terms, the crawler codes crawl the search page by sending various queries using different combination of search terms (to minimize the traffic, pages are cached locally as well as limiting the total queries and interval for queries). 

Next for each queried search page, my implemented code parse the html link for the individual restaurant page, and the crawler uses these link to visit and download these individual restaurant pages (again, cache is used to prevent traffic)

When each individual restaurant page is downloaded, the parser parses html table and form inputs for further information such as address, restaurant cuisines, rooms, seats, comments, links to photos and so on. The links to photos and comments are then further extracted to download photos if any to the cache. 

Once all the information for a restaurant are obtained, the information is encapsulated into a restaurant object completed with photo locations, comments, original link, restaurant order link and so on.

Step 4: Implement code to display restaurant information on the map
Since the singeat.com restaurant page does not contains information such as (latitude, longitude), reverse geocoding must be performed to obtain the actual geo location, this is done using GMap.Net, which also provides a winform map control for display the restaurants. 

Another important step is to calculate nearby restaurant, this can be done easily using equation that caculates the geo distance between the current location and restaurant location. 

Step 5: Put every thing together
Below shows some screenshots from the application:

Figure 1: Set Search Query using SearchFood and SearchArea

Figure 2: Display restaurants of various types on the map
Figure 3: Display restaurant names
Figure 4: Change to Bing Map from Google Map
Figure 5: Display restaurant details when clicked
Figure 6: Display comments on the restaurant
Figure 7: Display photos related to the restaurant
Figure 8: Display Search by Region and Radius
Figure 9: Refine Search Further using Filter



Below is the short video: