mapreduce:在大集群上处理数据[外文及翻译].rar
mapreduce:在大集群上处理数据[外文及翻译],mapreduce:在大集群上处理数据[外文及翻译]包含中文翻译和英文原文,内容详细完整,建议下载参考!中文: 16573 字英文: 34600字符摘要mapreduce是一种编程模型,并且是一种联合处理和产生大数集的执行过程。用户指定一个映射(map)函数,用来处理一个产生其他key/value媒介对的key/val...
该文档为压缩文件,包含的文件列表如下:
内容介绍
原文档由会员 xiaowei 发布
Mapreduce:在大集群上处理数据[外文及翻译]
包含中文翻译和英文原文,内容详细完整,建议下载参考!
中文: 16573 字
英文: 34600字符
摘要
MapReduce是一种编程模型,并且是一种联合处理和产生大数集的执行过程。用户指定一个映射(map)函数,用来处理一个产生其他key/value媒介对的key/value对;用户指定一个化简(reduce)函数,合并所有的媒介value和key。这篇论文将表明,许多现实世界的任务都可以用这个模型描述。以这个函数形式写出来的程序都是自动并行化的,并且执行在家用计算机组成的云中。这个实时系统有以下功能:保存分离的数据;部署程序在一组机器上执行;处理机器错误;管理机器之间的通信。这允许程序员无需任何并行和分布式系统的经验,就能很容易地使用大分布系统的资源。我们的MapReduce程序运行在许多家用计算机组成的云上,并且高度分级化。一个典型的MapReduce计算,在数以千计的计算机上处理吉兆字节的数据。程序员会发现此系统容易使用,即数以百计的MapReduce程序被植入,每天超过一千个MapReduce被实施在Google的云上 ......
Abstract
MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the pro-gram's execution across a set of machines, handling ma-chine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce pro-grams have been implemented and upwards of one thou-sand MapReduce jobs are executed on Google's clusters every day ......
包含中文翻译和英文原文,内容详细完整,建议下载参考!
中文: 16573 字
英文: 34600字符
摘要
MapReduce是一种编程模型,并且是一种联合处理和产生大数集的执行过程。用户指定一个映射(map)函数,用来处理一个产生其他key/value媒介对的key/value对;用户指定一个化简(reduce)函数,合并所有的媒介value和key。这篇论文将表明,许多现实世界的任务都可以用这个模型描述。以这个函数形式写出来的程序都是自动并行化的,并且执行在家用计算机组成的云中。这个实时系统有以下功能:保存分离的数据;部署程序在一组机器上执行;处理机器错误;管理机器之间的通信。这允许程序员无需任何并行和分布式系统的经验,就能很容易地使用大分布系统的资源。我们的MapReduce程序运行在许多家用计算机组成的云上,并且高度分级化。一个典型的MapReduce计算,在数以千计的计算机上处理吉兆字节的数据。程序员会发现此系统容易使用,即数以百计的MapReduce程序被植入,每天超过一千个MapReduce被实施在Google的云上 ......
Abstract
MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the pro-gram's execution across a set of machines, handling ma-chine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce pro-grams have been implemented and upwards of one thou-sand MapReduce jobs are executed on Google's clusters every day ......