基于web使用挖掘.doc
约84页DOC格式手机打开展开
基于web使用挖掘,摘要近年来,数据挖掘(data mining,简称dm),受到国际人工智能与数据库界的广泛重视。但是随着网络时代的到来,传统的数据挖掘的对象发生了改变,这对于数据挖掘和知识发现提出了新的挑战,web挖掘正是这样的背景下提出的。web挖掘就是从web世界的各种数据中识别出有效的、新颖的、潜在有用的,以及最终可理解的模式的...
内容介绍
此文档由会员 违规屏蔽12 发布
摘 要
近年来,数据挖掘(Data Mining,简称DM),受到国际人工智能与数据库界的广泛重视。但是随着网络时代的到来,传统的数据挖掘的对象发生了改变,这对于数据挖掘和知识发现提出了新的挑战,Web挖掘正是这样的背景下提出的。Web挖掘就是从Web世界的各种数据中识别出有效的、新颖的、潜在有用的,以及最终可理解的模式的过程。Web挖掘已经成为Web信息决策的重要手段,而Web使用挖掘因为其获得挖掘数据的便利性及准确性,更是成为Web挖掘中的重要研究方向之一。
目前我国的互联网已经十分普及,成为人们获取各种信息的主要手段之一。互联网与实体经济不断融合,利用互联网改造和提升传统产业,带动了传统产业结构调整和经济增长方式的转变,互联网已经成为我国发展低碳经济的新型战略性产业。工信部发布的互联网产业数据显示,截至2009年底,国内网站数量达到323万个,年增长率12.3%,网民人数达到4.04亿,信息产业占国内生产总值的比重达到10%左右。随着互联网产业的不断发展,网站之间的竞争达到了白热化程度,如何在日益激烈的网站竞争中脱颖而出是网站决策者面临的主要问题。“以用户为核心”的网站构建思想已经成为趋势。这就需要网站经营者了解用户对于网站访问的感受,同时根据用户的需要及时对于网站进行合理的改进,从而赢得用户的青睐。日志文件是网站能够直接获得的最为全面的用户访问记录,日志文件中记录了用户访问过程的全部信息。Web使用挖掘正是从Web日志文件中发现用户的访问习惯和访问模式,从而对于网站的运行布局和结构进行优化,进而提升网站的用户满意度。
本文结合“江苏招生考试网”的真实运行数据,通过Web使用挖掘技术对于网站的运行日志文件进行全面的挖掘分析,从中发现用户的访问习惯和访问模式,进而发现网站的运行现状以及页面之间的关联性、时序性,最终根据挖掘结果帮助网站决策者制定优化策略,这对于网站适应未来发展趋势、加快自身发展、应对竞争和挑战有着极具价值的现实意义。
论文创新之处主要体现于:全面梳理了Web使用挖掘的相关理论知识;针对Web使用挖掘的整个过程进行了深入探讨,特别针对数据预处理中的主要问题提出相应的解决办法;在理论研究的基础上,综合运用计算机技术、数据库技术、数据挖掘等手段,建立了“基于Web使用挖掘的网站优化系统”,为Web使用挖掘的实际应用做出了有益的尝试。
关键词:数据挖掘,Web使用挖掘,数据预处理,关联规则
Abstract
In recent years, Data Mining has being paid fairly attention by international artificial intelligence and data base field. With web age’s coming, objects of traditional data mining change, which brings the new challenge to data mining as well as knowledge discovery. And Web mining, introduced from such a background, that is a course of recognizing effective, new, potencially useful, comprehensible mode. It has become a significant means for web information decision-making, meanwhile, become an essential academic interest of web mining for mining data’s convenience and accuracy. Our country internetwork’s preva lence promotes itself to become one of the main manners for people achineving kinds of information. It brings along traditional industry’s structural readjustment and economic growth manner’s tranforming through gradual convergence of internet and the real economy, or utilization of transforming, advancing traditional industry. Internetwork has become our country’s new type strategic industry of low-carbon economy development. Internet industrial data announced by Ministry of Industry and Information Technology shows that until the end of 2009, domestic web sites reach 3,230,000; annual rate of growth is 12.3%; netizen reach 4.04 hundred million; information industry holds about 10% in GDP. With the gradual development of internet industry, competitions between web sites is to the fierce degree. How to occupy the top point in this fierce competition is a main problem confronted by web decision-makers. “User-centering”, the trend of web buliding, demands web operators understanding users’ visiting recept, then according to it, transforming relative improvement for users’ satisfication. Log files are the most direct complete records of user visit and contain the whole information about user visiting process. Hence,Web mining finds out users’ visiting habit and visting mode from log files in order to realize web running placement and structural optimization, and then rising users’ satisfication degree. This paper does an entire mining analysis of web running log files, basing on the real data from “Jiang Su enrollment examination web site”. It assits web site dicision-maker to make optimizational strategy finally by mining consequnce which is formed by discovering relevance & timing sequnence between web site running actuality and page layout. Therefore, data mining remains actural valuable meaning to the respect of web site adapting to futural trend, self-development fastering, competition&chanllenge confronting. This paper’s creation: comprehesively combs data mining relatie theoratical knowledge; explores the overall process of web minning, especially on the resolving methods to data pretreatment; builds “web mining optimizaition system”, basing on the theoratical study and means of applying computer technology, data base technology and data mining; does an profitable attempt of web mining practice and application..
近年来,数据挖掘(Data Mining,简称DM),受到国际人工智能与数据库界的广泛重视。但是随着网络时代的到来,传统的数据挖掘的对象发生了改变,这对于数据挖掘和知识发现提出了新的挑战,Web挖掘正是这样的背景下提出的。Web挖掘就是从Web世界的各种数据中识别出有效的、新颖的、潜在有用的,以及最终可理解的模式的过程。Web挖掘已经成为Web信息决策的重要手段,而Web使用挖掘因为其获得挖掘数据的便利性及准确性,更是成为Web挖掘中的重要研究方向之一。
目前我国的互联网已经十分普及,成为人们获取各种信息的主要手段之一。互联网与实体经济不断融合,利用互联网改造和提升传统产业,带动了传统产业结构调整和经济增长方式的转变,互联网已经成为我国发展低碳经济的新型战略性产业。工信部发布的互联网产业数据显示,截至2009年底,国内网站数量达到323万个,年增长率12.3%,网民人数达到4.04亿,信息产业占国内生产总值的比重达到10%左右。随着互联网产业的不断发展,网站之间的竞争达到了白热化程度,如何在日益激烈的网站竞争中脱颖而出是网站决策者面临的主要问题。“以用户为核心”的网站构建思想已经成为趋势。这就需要网站经营者了解用户对于网站访问的感受,同时根据用户的需要及时对于网站进行合理的改进,从而赢得用户的青睐。日志文件是网站能够直接获得的最为全面的用户访问记录,日志文件中记录了用户访问过程的全部信息。Web使用挖掘正是从Web日志文件中发现用户的访问习惯和访问模式,从而对于网站的运行布局和结构进行优化,进而提升网站的用户满意度。
本文结合“江苏招生考试网”的真实运行数据,通过Web使用挖掘技术对于网站的运行日志文件进行全面的挖掘分析,从中发现用户的访问习惯和访问模式,进而发现网站的运行现状以及页面之间的关联性、时序性,最终根据挖掘结果帮助网站决策者制定优化策略,这对于网站适应未来发展趋势、加快自身发展、应对竞争和挑战有着极具价值的现实意义。
论文创新之处主要体现于:全面梳理了Web使用挖掘的相关理论知识;针对Web使用挖掘的整个过程进行了深入探讨,特别针对数据预处理中的主要问题提出相应的解决办法;在理论研究的基础上,综合运用计算机技术、数据库技术、数据挖掘等手段,建立了“基于Web使用挖掘的网站优化系统”,为Web使用挖掘的实际应用做出了有益的尝试。
关键词:数据挖掘,Web使用挖掘,数据预处理,关联规则
Abstract
In recent years, Data Mining has being paid fairly attention by international artificial intelligence and data base field. With web age’s coming, objects of traditional data mining change, which brings the new challenge to data mining as well as knowledge discovery. And Web mining, introduced from such a background, that is a course of recognizing effective, new, potencially useful, comprehensible mode. It has become a significant means for web information decision-making, meanwhile, become an essential academic interest of web mining for mining data’s convenience and accuracy. Our country internetwork’s preva lence promotes itself to become one of the main manners for people achineving kinds of information. It brings along traditional industry’s structural readjustment and economic growth manner’s tranforming through gradual convergence of internet and the real economy, or utilization of transforming, advancing traditional industry. Internetwork has become our country’s new type strategic industry of low-carbon economy development. Internet industrial data announced by Ministry of Industry and Information Technology shows that until the end of 2009, domestic web sites reach 3,230,000; annual rate of growth is 12.3%; netizen reach 4.04 hundred million; information industry holds about 10% in GDP. With the gradual development of internet industry, competitions between web sites is to the fierce degree. How to occupy the top point in this fierce competition is a main problem confronted by web decision-makers. “User-centering”, the trend of web buliding, demands web operators understanding users’ visiting recept, then according to it, transforming relative improvement for users’ satisfication. Log files are the most direct complete records of user visit and contain the whole information about user visiting process. Hence,Web mining finds out users’ visiting habit and visting mode from log files in order to realize web running placement and structural optimization, and then rising users’ satisfication degree. This paper does an entire mining analysis of web running log files, basing on the real data from “Jiang Su enrollment examination web site”. It assits web site dicision-maker to make optimizational strategy finally by mining consequnce which is formed by discovering relevance & timing sequnence between web site running actuality and page layout. Therefore, data mining remains actural valuable meaning to the respect of web site adapting to futural trend, self-development fastering, competition&chanllenge confronting. This paper’s creation: comprehesively combs data mining relatie theoratical knowledge; explores the overall process of web minning, especially on the resolving methods to data pretreatment; builds “web mining optimizaition system”, basing on the theoratical study and means of applying computer technology, data base technology and data mining; does an profitable attempt of web mining practice and application..