Sunday, July 21, 2013

Singeat.com Map-based Desktop using Web Crawling Technique

Last friday, I decided to have a relaxed night with my younger brother by going out and rewarding ourselves a good meal. Being lazy and nerdy, my tendency for finding a nice place to eat is usually by searching online. We searched singeat.com (新加坡美食网)for a nice hotpot corner near the place we stay so that we only commute a short distance back and forth (yeah that lazy). singeat.com is nice, offering many comprehensive information, e.g. food category, search area, photos, comments, groupon, order, and other comprehensive information such as surrouding, main cuisines, parking, and so on.

But then when it comes to find a hotpot restaurant near the place we stay, it becomes troublesome. Firstly, the place we stay is far from the city down town, which means not a lot of restaurants around. Secondly, they don't offer a map which one can select the restaurants near a particular location user specifies (e.g. something like gothere.sg and nearby.sg). Thirdly, while navigating page by page certainly generate traffic flows for singeat.com, it makes impatient web users like me even more impatient. Therefore, while enjoying hotpot with my brother on friday night, I decided to do something, maybe save myself the trouble experienced when using singeat.com - by writing an map-based application which shows any type of restaurant with rating from nearby the place i indicated on the map. It took my saturday to come up with a desktop application that does what i want :) The following list down the steps i used to create this app.

Step 1: What I need to decide before creating the application
The first thing i need to decide is what type of application i am to create: since this is for my personal usage, and i want to do it quickly, i decided to write the application in C# Winform. 

The second thing i need is how am i going to obtain data from singeat.com: I am not aware of any web services or rss exposed by singeat.com that will allow me to retrieve most of the information from their website, which means i need to write a web crawler to help me crawl and organize information from their website. 

The third thing i need is to display those information on a map on the desktop which user can interact such as performing search, view rating and photos and so on.

Step 2: Technologies to deploy in my application
As the application development is going to be a RAD (yeah i completed in one day), I need to use tools and libraries that are already out there so as to help me shorten the time of development. The following are the tools that i ultimately used in the application:


  • Html Agility Pack: parsing the html web pages downloaded from singeat for information and photos
  • GMap.Net: display map on winform as well as performing geocoding and reverse geocoding.
  • Custom TabControl: a tab control that makes winform tab looks better (since i uses quite a number of tab controls in this app, i make some extra efforts to make them look nicer)

Step 3: Implement codes to crawl singeat.com
In order to retrieve information from singeat.com, my implementation code first go to their search page and crawl the search fields such as SearchFood, SearchAreas, etc. This is done using Html Agility Pack to parse the hidden input fields and <select> elements on the search page.

Once i obtained the list of search terms, the crawler codes crawl the search page by sending various queries using different combination of search terms (to minimize the traffic, pages are cached locally as well as limiting the total queries and interval for queries). 

Next for each queried search page, my implemented code parse the html link for the individual restaurant page, and the crawler uses these link to visit and download these individual restaurant pages (again, cache is used to prevent traffic)

When each individual restaurant page is downloaded, the parser parses html table and form inputs for further information such as address, restaurant cuisines, rooms, seats, comments, links to photos and so on. The links to photos and comments are then further extracted to download photos if any to the cache. 

Once all the information for a restaurant are obtained, the information is encapsulated into a restaurant object completed with photo locations, comments, original link, restaurant order link and so on.

Step 4: Implement code to display restaurant information on the map
Since the singeat.com restaurant page does not contains information such as (latitude, longitude), reverse geocoding must be performed to obtain the actual geo location, this is done using GMap.Net, which also provides a winform map control for display the restaurants. 

Another important step is to calculate nearby restaurant, this can be done easily using equation that caculates the geo distance between the current location and restaurant location. 

Step 5: Put every thing together
Below shows some screenshots from the application:

Figure 1: Set Search Query using SearchFood and SearchArea

Figure 2: Display restaurants of various types on the map
Figure 3: Display restaurant names
Figure 4: Change to Bing Map from Google Map
Figure 5: Display restaurant details when clicked
Figure 6: Display comments on the restaurant
Figure 7: Display photos related to the restaurant
Figure 8: Display Search by Region and Radius
Figure 9: Refine Search Further using Filter



Below is the short video:

No comments:

Post a Comment