Abstract:
With the increased confidence in the use of the Internet and the World Wide Web,
the number of electronic commerce (e-commerce) transactions is growing rapidly.
Therefore, finding useful patterns and rules of users’ behaviors has become the
critical issue for e-commerce and can be used to tailor e-commerce services in order
to successfully meet the customers’ needs. This paper proposes an approach to
integrate Web content mining into Web usage mining. The textual content of web
pages is captured through extraction of frequent word sequences, which are
combined with Web server log files to discover useful information and association
rules about users’ behaviors. The results of this approach can be used to facilitate
better recommendation, Web personalization, Web construction, Website
organization, and Web user profiling.
Machine summary:
The textual content of web pages is captured through extraction of frequent word sequences, which are combined with Web server log files to discover useful information and association rules about users' behaviors.
Association Rule, Clustering, Frequent Word Sequence, Web Content Mining, Web Usage Mining Introduction Electronic commerce (e-commerce) is the use of computers and telecommunication technologies to share business information, maintain business relationships, and conduct business transactions.
For example, the classification of web pages is a typical application of content mining techniques (Shen, Cong, Sun, and Lu, 2003).
Therefore, the purpose of this paper is to propose a system to find useful association rules by integrating Web document analysis into Web usage mining.
The logfile analysis technologies include association rule mining (Lin, Alvarez, and Ruiz, 2000), sequential pattern mining (Zhou, Hui, and Chang, 2004), clustering and classification.
Clustering and classification on Web server log file is a process that group the users, Web pages, or user requests on the basis of the access request similarities.
This clustering algorithm is the model based on the algorithm called ‘CFWS’ (Document Clustering Based on Frequent Word Sequences) proposed by Li, Chung, and Holt (2007).
Integration The integration step is to integrate the Web document cluster information into log file.
Table 3 Integrated Log File with Web Page Cluster Label IP address, Date, Time, URL, Document Cluster 213.
Text document clustering based on frequent word meaning sequences.