基于贝叶斯理论的社会.doc
约82页DOC格式手机打开展开
基于贝叶斯理论的社会,摘要随着web2.0技术不断发展和完善,社会化标注系统随之而产生。社会化标注秉承了web2.0所提出的用户自由性和主动性的特征。在社会化标注环境下,用户可以根据自己对相关信息资源的理解添加合适的标签,同时用户可以参考其他人使用过的标签进行标注。这种标注机制的实现,使得信息用户可以根据自己对资源的需求来对其进行选择,并根...
内容介绍
此文档由会员 违规屏蔽12 发布
摘 要
随着Web2.0技术不断发展和完善,社会化标注系统随之而产生。社会化标注秉承了web2.0所提出的用户自由性和主动性的特征。在社会化标注环境下,用户可以根据自己对相关信息资源的理解添加合适的标签,同时用户可以参考其他人使用过的标签进行标注。这种标注机制的实现,使得信息用户可以根据自己对资源的需求来对其进行选择,并根据自己对资源认识来对其进行组织,体现社会化标注系统的主动性和个性化的特点。
由于社会化标注本身是一种自下而上的标注,这就使得这种 “合适”的标签并没有统一规则予以约束,明明用少数几个词组就可以明确的描述出资源,但由于用户的知识背景以及理解程度的差异,往往对信息资源进行标注时生成的标签出现歧义、同义、同形多义等现象。同时,以往很少被标注过的网络资源往往被当前浏览信息的用户所忽略,这样会导致大量具有重大价值的网络资源被忽略掉,这些现象都会给新进入的用户搜索和获取信息资源带来了极大的困扰。
针对以上这些问题,本文利用贝叶斯理论并结合相关主题聚类算法对社会化标注环境中的信息资源主题进行有效地挖掘,将大量用户对特定资源进行标注所产生的标签集进行一定的清除和归类,最终在特定资源下得出只含有少数具有代表性的标签集合。本文的主要贡献有如下几个方面:
(1) 根据社会化标注所存在的一词多义、同义词等现象将文本挖掘理论中的隐含语义挖掘理论应用到社会化标注上来,通过构建资源-标签矩阵来挖掘两者间的语义空间,有效解决了用户标注过程中的词义混乱现象;
(2) 利用三层贝叶斯网络,构建基于隐狄利克雷的主题分配,并在此基础上挖掘潜在的主题并对其进行有效地分类汇总;
(3) 结合贝叶斯理论的先验知识及样本空间,并提出主题空间分类,对资源的属性识别进行进一步细化,使前两方面的工作得到进一步改善。
以上研究不但丰富了信息组织和检索的相关理论,而且为信息主题及用户偏好的识别提供了有效的途径。
关键词 社会化标注;主题聚类;隐含语义;层级贝叶斯
Abstract
With the development and improvement of Web 2.0 technology, social tagging emerged. Social tagging proposed by adhering to the characteristics of freedom and initiative about users’ behaviors. Marked in the social environment, users set their own understanding of the relevant information resources to add the right tags, and users can refer other people to mark the label used. Mechanism to achieve this mark, making information users according to their demand for resources to select them, and according to their knowledge of resources to them, to embody the initiative of social tagging systems and personal characteristics.
However, due to social tagging itself is a bottom-up label, which prompted this "right" tag, and there is no uniform rules to be binding, you can use a few phrases to describe the specific resources obviously, but because of the user's knowledge and understanding of differences in background, often marked on the information resources generated when the label ambiguity, synonymy, polysemy and so on with the form. At the same time ,in the past rarely had marked the current view of network resources is often ignored by users of information, this will cause a lot of great value to the network resources are ignored, these phenomena will give new users access to search and bring access to information resources great distress.
For these questions, this paper Bayesian clustering algorithm combined with the topic of social tagging environment the theme of information resources effectively mining large amounts of user annotation results for a particular resource sets generated some label Clear and specific resources are classified eventually come to contain only a small number of representative labels set. The main contribution of this paper has the following aspects:
(1) Marked by the presence of the community of polysemy, synonyms, and so the theory of the text mining mining theory applied to the latent semantic social tagging up. It solve user’s semantic confusing effectivly in the process of annotation by building resources – tag matrix to mining t semantic space between them ;
(2) Use of three Bayesian network and build a topic based on latent Dirichlet allocation, and on this basis, the subject of mining and its potential to effectively subtotals;
(3) Bayesian theory with the prior knowledge and sample space, and put forward the topic of space classification, identification of resources for further refinement of the property, so that the first two aspects have been further improved.
Above research not only enriched the information organization and retrieva l relevant theory, but also for information theme and user preferences recognition provides an effective way.
Keywords Social tagging; Topic Clustering; Latent Semantic Analysis; Bayesian hierarchical model
目 录
摘 要 I
Abstract II
目 录 IV
CONTENTS VI
第1章 绪论 1
1.1 研究的背景与意义 1
1.2 研究现状 3
1.2.1 社会化标注国内外研究现状 3
1.2.2 Web文本主题挖掘技术研究现状 6
1.3 研究内容、技术路线及组织结构 6
1.3.1 研究内容 6
1.3.2 技术路线 7
1.3.3 论文的组织结构 9
1.4 创新点 9
第2章 社会化标注系统概述及其相关贝叶斯算法 11
2.1 社会化标注概述 11
2.1.1 社会化标注概念 11
2.1.2 社会化标注的要素 13
2.1.3 社会&..
随着Web2.0技术不断发展和完善,社会化标注系统随之而产生。社会化标注秉承了web2.0所提出的用户自由性和主动性的特征。在社会化标注环境下,用户可以根据自己对相关信息资源的理解添加合适的标签,同时用户可以参考其他人使用过的标签进行标注。这种标注机制的实现,使得信息用户可以根据自己对资源的需求来对其进行选择,并根据自己对资源认识来对其进行组织,体现社会化标注系统的主动性和个性化的特点。
由于社会化标注本身是一种自下而上的标注,这就使得这种 “合适”的标签并没有统一规则予以约束,明明用少数几个词组就可以明确的描述出资源,但由于用户的知识背景以及理解程度的差异,往往对信息资源进行标注时生成的标签出现歧义、同义、同形多义等现象。同时,以往很少被标注过的网络资源往往被当前浏览信息的用户所忽略,这样会导致大量具有重大价值的网络资源被忽略掉,这些现象都会给新进入的用户搜索和获取信息资源带来了极大的困扰。
针对以上这些问题,本文利用贝叶斯理论并结合相关主题聚类算法对社会化标注环境中的信息资源主题进行有效地挖掘,将大量用户对特定资源进行标注所产生的标签集进行一定的清除和归类,最终在特定资源下得出只含有少数具有代表性的标签集合。本文的主要贡献有如下几个方面:
(1) 根据社会化标注所存在的一词多义、同义词等现象将文本挖掘理论中的隐含语义挖掘理论应用到社会化标注上来,通过构建资源-标签矩阵来挖掘两者间的语义空间,有效解决了用户标注过程中的词义混乱现象;
(2) 利用三层贝叶斯网络,构建基于隐狄利克雷的主题分配,并在此基础上挖掘潜在的主题并对其进行有效地分类汇总;
(3) 结合贝叶斯理论的先验知识及样本空间,并提出主题空间分类,对资源的属性识别进行进一步细化,使前两方面的工作得到进一步改善。
以上研究不但丰富了信息组织和检索的相关理论,而且为信息主题及用户偏好的识别提供了有效的途径。
关键词 社会化标注;主题聚类;隐含语义;层级贝叶斯
Abstract
With the development and improvement of Web 2.0 technology, social tagging emerged. Social tagging proposed by adhering to the characteristics of freedom and initiative about users’ behaviors. Marked in the social environment, users set their own understanding of the relevant information resources to add the right tags, and users can refer other people to mark the label used. Mechanism to achieve this mark, making information users according to their demand for resources to select them, and according to their knowledge of resources to them, to embody the initiative of social tagging systems and personal characteristics.
However, due to social tagging itself is a bottom-up label, which prompted this "right" tag, and there is no uniform rules to be binding, you can use a few phrases to describe the specific resources obviously, but because of the user's knowledge and understanding of differences in background, often marked on the information resources generated when the label ambiguity, synonymy, polysemy and so on with the form. At the same time ,in the past rarely had marked the current view of network resources is often ignored by users of information, this will cause a lot of great value to the network resources are ignored, these phenomena will give new users access to search and bring access to information resources great distress.
For these questions, this paper Bayesian clustering algorithm combined with the topic of social tagging environment the theme of information resources effectively mining large amounts of user annotation results for a particular resource sets generated some label Clear and specific resources are classified eventually come to contain only a small number of representative labels set. The main contribution of this paper has the following aspects:
(1) Marked by the presence of the community of polysemy, synonyms, and so the theory of the text mining mining theory applied to the latent semantic social tagging up. It solve user’s semantic confusing effectivly in the process of annotation by building resources – tag matrix to mining t semantic space between them ;
(2) Use of three Bayesian network and build a topic based on latent Dirichlet allocation, and on this basis, the subject of mining and its potential to effectively subtotals;
(3) Bayesian theory with the prior knowledge and sample space, and put forward the topic of space classification, identification of resources for further refinement of the property, so that the first two aspects have been further improved.
Above research not only enriched the information organization and retrieva l relevant theory, but also for information theme and user preferences recognition provides an effective way.
Keywords Social tagging; Topic Clustering; Latent Semantic Analysis; Bayesian hierarchical model
目 录
摘 要 I
Abstract II
目 录 IV
CONTENTS VI
第1章 绪论 1
1.1 研究的背景与意义 1
1.2 研究现状 3
1.2.1 社会化标注国内外研究现状 3
1.2.2 Web文本主题挖掘技术研究现状 6
1.3 研究内容、技术路线及组织结构 6
1.3.1 研究内容 6
1.3.2 技术路线 7
1.3.3 论文的组织结构 9
1.4 创新点 9
第2章 社会化标注系统概述及其相关贝叶斯算法 11
2.1 社会化标注概述 11
2.1.1 社会化标注概念 11
2.1.2 社会化标注的要素 13
2.1.3 社会&..