Hierarchical Clustering of Large-Scale Short Conversations Based on Domain Ontol

Started by aruljothi, Apr 11, 2009, 07:05 PM

Previous topic - Next topic

aruljothi

With the rapid development of the internet and communication technology, huge data is accumulated. Short text such as conversation in chatting room and email is common in such data. It is useful to cluster such short documents to get the structure of the data or to help building other data mining applications. But most of the current clustering algorithms can not get acceptable clustering accuracy since key words appear with a low frequency in short documents. It is also difficult to process high-dimensional text data in very large databases. In this paper, we propose a hierarchical clustering algorithm which uses domain ontology to improve clustering accuracy. This clustering algorithm is also parallel and frequent-concept based which makes it scalable to very large high-dimensional text data. Our experimental study shows that this algorithm is more accurate than other hierarchical clustering algorithms when clustering short conversations. Furthermore, this algorithm has good scalability and it can be used to process even huge data.

Quick Reply

Warning: this topic has not been posted in for at least 120 days.
Unless you're sure you want to reply, please consider starting a new topic.

Note: this post will not display until it has been approved by a moderator.

Name:
Email:
Verification:
Please leave this box empty:
Type the letters shown in the picture
Listen to the letters / Request another image

Type the letters shown in the picture:

Shortcuts: ALT+S post or ALT+P preview