mp3下载器的设计与实现,mp3下载器的设计与实现1.3万字 36页包括开题报告,任务书,代码摘 要搜索引擎,作为访问互联网的“网络门户”,是从www上快速而有效地获取信息资源的捷径。而网络爬虫作为搜索引擎的关键技术,它是一个自动提取,分析并过滤网页的程序,为搜索引擎从万维网上下载网页,是搜索引擎的重要组成。文件传输,作为网络应用中最主要的功能...
原文档由会员 usactu 发布
1.3万字 36页
摘 要
Design and Implement of MP3 Download
Search engine, as a visit to the Internet "portal”, is a shortcut to rapid and effective access to the information resources from the www. Web crawler technology is the key to search engine, it is an automatic extraction, analysis and filtering website procedures for search engine downloaded the webpage from the World Wide Web. File transfer, as the most important network application functions, also is the basis of resources sharing on the Internet. Download tools has become an indispensable tool on the Internet. Some important protocols like HTTP, FTP and so on are major support as the supporting for the transmission of documents, particularly those based on P2P technology, multi-tasking, multi-threaded, multi-source and breakpoint continuingly download mechanism greatly improves the network download speed; maximize the sharing of network resources.
This paper first introduces the main theory and technology which related to the
Theme, analyzes the principles of the web crawler and the mechanisms for downloading in deeply, improving the web crawler algorithm to satisfy with the application. To design and implement of an MP3 download, according to the improved algorithm of the web crawler,. The Web crawler on the Internet crawls MP3 link resources and related information (title, artist, album, etc.), and also stored the information in the forms of XML in local file, providing a basis for future inquiries and downloading. Implementing a download based on HTTP protocol and providing a mechanism for breakpoint continuingly, multi-tasking download and automatic rename the downloaded file. Then, having a test for the MP3 download; it shows that it achieved expected results.
Finally, the researcher would show a review and outlook of the topics.
Key Words: Search engine, Web Crawler, HTTP, P2P, Breakpoint Continuingly
目 录
1绪论 1
1.1 课题的背景和目的 1
1.2 国内外研究现状及趋势 1
1.2.1 搜索引擎 1
1.2.2 文件下载 2
1.3 课题研究的内容和意义 3
1.4 本文的结构 4
2 技术概述 5
2.1 正则匹配 5
2.2 XML 5
2.3 搜索引擎的原理 6
2.4 线程 7
2.4.1 线程 7
2.4.2 多线程 8
2.5 MP3标签信息 9
2.6 HTTP协议 9
2.7 PageRank算法 10
2.8 本章小结 11
3 系统的设计与实现 12
3.1 系统流程图 12
3.2 MP3爬虫算法 13
3.2.1 广度优先遍历策略 13
3.2.2 基于本课题的爬虫算法改进 14
3.2.3 解析HTML 15
3.3 MP3标签 15
3.3.1 MP3标签提取 15
3.3.2 MP3标签存储 17
3.4 文件下载 17
3.4.1 断点续传 17
3.4.2 批量下载 18
3.4.3 文件重命名 20
3.4.4 下载速度,进度,剩余下载时间的计算 21
3.5 .ini配置文件 22
3.6 delegate 和event自定义事件 22
3.7 本章小结 23
4 试验结果分析 24
4.1 网络爬虫 24
4.2 查询 25
4.3 文件下载 25
4.4 结果分析 26
4.5 本章小结 27
5 总结和展望 28
5.1 总结 28
5.2 展望 28
致 谢 30
参考文献 31
[1]张涛. 网络蜘蛛在智能搜索引擎中的设计与实现[D].兰州:兰州理工大学.2003.
[2]闫俊英. 垂直搜索引擎的研究与实现[D].哈尔滨:哈尔滨工业大学.2004.
[4]陈杰. 主题搜索引擎中网络蜘蛛搜索策略研究[D]. 杭州:浙江大学.2006.
[5]车志军. 人工智能在搜索引擎资源获取中的应用[D]. 杭州:浙江大学.2006.
1.3万字 36页
摘 要
Design and Implement of MP3 Download
Search engine, as a visit to the Internet "portal”, is a shortcut to rapid and effective access to the information resources from the www. Web crawler technology is the key to search engine, it is an automatic extraction, analysis and filtering website procedures for search engine downloaded the webpage from the World Wide Web. File transfer, as the most important network application functions, also is the basis of resources sharing on the Internet. Download tools has become an indispensable tool on the Internet. Some important protocols like HTTP, FTP and so on are major support as the supporting for the transmission of documents, particularly those based on P2P technology, multi-tasking, multi-threaded, multi-source and breakpoint continuingly download mechanism greatly improves the network download speed; maximize the sharing of network resources.
This paper first introduces the main theory and technology which related to the
Theme, analyzes the principles of the web crawler and the mechanisms for downloading in deeply, improving the web crawler algorithm to satisfy with the application. To design and implement of an MP3 download, according to the improved algorithm of the web crawler,. The Web crawler on the Internet crawls MP3 link resources and related information (title, artist, album, etc.), and also stored the information in the forms of XML in local file, providing a basis for future inquiries and downloading. Implementing a download based on HTTP protocol and providing a mechanism for breakpoint continuingly, multi-tasking download and automatic rename the downloaded file. Then, having a test for the MP3 download; it shows that it achieved expected results.
Finally, the researcher would show a review and outlook of the topics.
Key Words: Search engine, Web Crawler, HTTP, P2P, Breakpoint Continuingly
目 录
1绪论 1
1.1 课题的背景和目的 1
1.2 国内外研究现状及趋势 1
1.2.1 搜索引擎 1
1.2.2 文件下载 2
1.3 课题研究的内容和意义 3
1.4 本文的结构 4
2 技术概述 5
2.1 正则匹配 5
2.2 XML 5
2.3 搜索引擎的原理 6
2.4 线程 7
2.4.1 线程 7
2.4.2 多线程 8
2.5 MP3标签信息 9
2.6 HTTP协议 9
2.7 PageRank算法 10
2.8 本章小结 11
3 系统的设计与实现 12
3.1 系统流程图 12
3.2 MP3爬虫算法 13
3.2.1 广度优先遍历策略 13
3.2.2 基于本课题的爬虫算法改进 14
3.2.3 解析HTML 15
3.3 MP3标签 15
3.3.1 MP3标签提取 15
3.3.2 MP3标签存储 17
3.4 文件下载 17
3.4.1 断点续传 17
3.4.2 批量下载 18
3.4.3 文件重命名 20
3.4.4 下载速度,进度,剩余下载时间的计算 21
3.5 .ini配置文件 22
3.6 delegate 和event自定义事件 22
3.7 本章小结 23
4 试验结果分析 24
4.1 网络爬虫 24
4.2 查询 25
4.3 文件下载 25
4.4 结果分析 26
4.5 本章小结 27
5 总结和展望 28
5.1 总结 28
5.2 展望 28
致 谢 30
参考文献 31
[1]张涛. 网络蜘蛛在智能搜索引擎中的设计与实现[D].兰州:兰州理工大学.2003.
[2]闫俊英. 垂直搜索引擎的研究与实现[D].哈尔滨:哈尔滨工业大学.2004.
[4]陈杰. 主题搜索引擎中网络蜘蛛搜索策略研究[D]. 杭州:浙江大学.2006.
[5]车志军. 人工智能在搜索引擎资源获取中的应用[D]. 杭州:浙江大学.2006.