{"id":120,"date":"2020-07-08T08:30:28","date_gmt":"2020-07-08T00:30:28","guid":{"rendered":"http:\/\/www.gaoxigang.com\/?p=120"},"modified":"2020-07-08T08:30:28","modified_gmt":"2020-07-08T00:30:28","slug":"scrapy-%e7%88%ac%e8%99%ab%e6%a1%86%e6%9e%b6-%e7%ac%ac%e4%b8%80%e4%b8%aa%e7%88%ac%e8%99%ab%e9%a1%b9%e7%9b%ae","status":"publish","type":"post","link":"https:\/\/www.gaoxigang.com\/index.php\/2020\/07\/08\/scrapy-%e7%88%ac%e8%99%ab%e6%a1%86%e6%9e%b6-%e7%ac%ac%e4%b8%80%e4%b8%aa%e7%88%ac%e8%99%ab%e9%a1%b9%e7%9b%ae\/","title":{"rendered":"Scrapy \u722c\u866b\u6846\u67b6-\u7b2c\u4e00\u4e2a\u722c\u866b\u9879\u76ee"},"content":{"rendered":"<p><strong>\u521b\u5efa\u4e00\u4e2ascrapy\u9879\u76ee<\/strong><br \/>\n\u867d\u7136\u662f\u91c7\u7528cmd\u547d\u4ee4\u6765\u521b\u5efa\uff0c\u4f46\u662f\u53ef\u4ee5\u901a\u8fc7scrapy -h\u6765\u67e5\u8be2\u76f8\u5173\u7684\u5b50\u547d\u4ee4\uff0c\u6700\u540e\u53ef\u4ee5\u901a\u8fc7<span style=\"color: #ff0000;\"><strong>scrapy startproject douban<\/strong><\/span>\u65b9\u5f0f\u6765\u521b\u5efa\u9879\u76ee<\/p>\n<blockquote><p>C:\\Users\\Administrator\\Desktop&gt;scrapy -h<br \/>\nScrapy 1.7.3 &#8211; no active project<\/p>\n<p>Usage:<br \/>\nscrapy &lt;command&gt; [options] [args]<\/p>\n<p>C:\\Users\\Administrator\\Desktop&gt;scrapy startproject douban<br \/>\nNew Scrapy project &#8216;douban&#8217;, using template directory &#8216;d:\\anaconda3\\lib\\site-packages\\scrapy\\templates\\project&#8217;, created in:<br \/>\nC:\\Users\\Administrator\\Desktop\\douban<\/p>\n<p>You can start your first spider with:<br \/>\ncd douban<br \/>\nscrapy genspider example example.com<\/p><\/blockquote>\n<p><strong>\u521b\u5efaSpider\u89e3\u6790\u5668<\/strong><br \/>\n\u6839\u636e\u4e0a\u9762\u7684\u63d0\u793a\uff1acd douban\uff0c\u7136\u540e\u5728\u9879\u76ee\u4e2d\u6267\u884c\uff1a<span style=\"color: #ff0000;\"><strong>scrapy genspider example example.com<\/strong><\/span> \u5219\u53ef\u4ee5\u521b\u5efa\u4e00\u4e2aSpider\u5bf9\u8c61<\/p>\n<blockquote><p>C:\\Users\\Administrator\\Desktop&gt;cd douban<\/p>\n<p>C:\\Users\\Administrator\\Desktop\\douban&gt;scrapy genspider douban_spider movie.douban.com<br \/>\nCreated spider &#8216;douban_spider&#8217; using template &#8216;basic&#8217; in module:<br \/>\ndouban.spiders.douban_spider<\/p>\n<p>C:\\Users\\Administrator\\Desktop\\douban&gt;<\/p><\/blockquote>\n<p>\u7ec6\u5fc3\u7684\u5c0f\u4f19\u4f34\u4f1a\u53d1\u73b0\uff0cdouban_spider\u9ed8\u8ba4\u4f1a\u5b58\u50a8\u5230douban.spider\u76ee\u5f55\u4e2d\uff0c\u91c7\u7528pycharm IDE\u6253\u5f00\u4f1a\u53d1\u73b0\u9879\u76ee\u7684\u7ed3\u6784\u5982\u4e0b\uff1a<\/p>\n<p><strong>\u9879\u76ee\u529f\u80fd\u6a21\u5757\u4ecb\u7ecd<\/strong><br \/>\nscrapy.cfg\uff1a\u914d\u7f6e\u6587\u4ef6spiders\uff1a\u5b58\u653e\u4f60Spider\u6587\u4ef6\uff0c\u4e5f\u5c31\u662f\u4f60\u722c\u53d6\u7684py\u6587\u4ef6<\/p>\n<p>items.py\uff1a\u76f8\u5f53\u4e8e\u4e00\u4e2a\u5bb9\u5668\uff0c\u548c\u5b57\u5178\u8f83\u50cf<\/p>\n<p>middlewares.py\uff1a\u5b9a\u4e49DownloaderMiddlewares(\u4e0b\u8f7d\u5668\u4e2d\u95f4\u4ef6)\u548cSpider Middlewares(\u8718\u86db\u4e2d\u95f4\u4ef6)\u7684\u5b9e\u73b0<\/p>\n<p>pipelines.py\uff1a\u5b9a\u4e49ItemPipeline\u7684\u5b9e\u73b0\uff0c\u5b9e\u73b0\u6570\u636e\u7684\u6e05\u6d17\uff0c\u50a8\u5b58\uff0c\u9a8c\u8bc1\u3002<\/p>\n<p>settings.py\uff1a\u5168\u5c40\u914d\u7f6e<\/p>\n<p><strong>\u914d\u7f6eUser-Agent \u4f2a\u88c5\u8bf7\u6c42<\/strong><br \/>\nsettings.py \u914d\u7f6eUser-Agent\u76f8\u5173\u53c2\u6570\uff0c\u5426\u5219\u722c\u53d6\u4f1a\u51fa\u73b0\u5931\u8d25\u3002<\/p>\n<blockquote><p>DEFAULT_REQUEST_HEADERS = {<br \/>\n&#8216;Accept&#8217;: &#8216;text\/html,application\/xhtml+xml,application\/xml;q=0.9,*\/*;q=0.8&#8217;,<br \/>\n&#8216;Accept-Language&#8217;: &#8216;en&#8217;,<br \/>\n<span style=\"color: #ff0000;\">&#8216;User-Agent&#8217;:&#8217;Mozilla\/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko\/20100101 Firefox\/68.0&#8242;<\/span><br \/>\n}<\/p><\/blockquote>\n<p><strong>\u542f\u52a8\u722c\u866b\u83b7\u53d6\u6570\u636e<\/strong><br \/>\n\u5728cmd dos\u7a97\u53e3\u4e2d\u542f\u52a8\u722c\u866b\uff0c\u9ed8\u8ba4\u4f1a\u53bb\u4e0b\u8f7d\u5728douban_spider\u4e2d\u914d\u7684start_urls\u5730\u5740<\/p>\n<p>C:\\Users\\Administrator\\Desktop\\douban&gt;<strong><span style=\"color: #ff0000;\">scrapy crawl douban_spider<\/span><\/strong><\/p>\n<p>&nbsp;<\/p>\n<blockquote><p>import scrapy<\/p>\n<p>class DoubanSpiderSpider(scrapy.Spider):<br \/>\nname = &#8216;douban_spider&#8217;<br \/>\nallowed_domains = [&#8216;movie.douban.com&#8217;]<br \/>\n# \u6307\u5b9a\u542f\u52a8\u722c\u866b\u65f6\u8981\u4e0b\u8f7d\u7684\u9996\u9875\u5730\u5740<br \/>\nstart_urls = [&#8216;http:\/\/movie.douban.com\/top250&#8217;]<\/p>\n<p>def parse(self, response):<br \/>\nprint(response.text)<\/p><\/blockquote>\n<p>\u7248\u6743\u58f0\u660e\uff1a\u672c\u6587\u4e3aCSDN\u535a\u4e3b\u300clsqzedu\u300d\u7684\u539f\u521b\u6587\u7ae0\uff0c\u9075\u5faaCC 4.0 BY-SA\u7248\u6743\u534f\u8bae\uff0c\u8f6c\u8f7d\u8bf7\u9644\u4e0a\u539f\u6587\u51fa\u5904\u94fe\u63a5\u53ca\u672c\u58f0\u660e\u3002<br \/>\n\u539f\u6587\u94fe\u63a5\uff1ahttps:\/\/blog.csdn.net\/lsqzedu\/article\/details\/99697377<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u521b\u5efa\u4e00\u4e2ascrapy\u9879\u76ee \u867d\u7136\u662f\u91c7\u7528cmd\u547d\u4ee4\u6765\u521b\u5efa\uff0c\u4f46\u662f\u53ef\u4ee5\u901a\u8fc7scrapy -h\u6765\u67e5\u8be2\u76f8\u5173\u7684\u5b50\u547d\u4ee4\uff0c\u6700\u540e\u53ef [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4],"tags":[],"class_list":["post-120","post","type-post","status-publish","format-standard","hentry","category-biji"],"_links":{"self":[{"href":"https:\/\/www.gaoxigang.com\/index.php\/wp-json\/wp\/v2\/posts\/120","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.gaoxigang.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.gaoxigang.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.gaoxigang.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.gaoxigang.com\/index.php\/wp-json\/wp\/v2\/comments?post=120"}],"version-history":[{"count":0,"href":"https:\/\/www.gaoxigang.com\/index.php\/wp-json\/wp\/v2\/posts\/120\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.gaoxigang.com\/index.php\/wp-json\/wp\/v2\/media?parent=120"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.gaoxigang.com\/index.php\/wp-json\/wp\/v2\/categories?post=120"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.gaoxigang.com\/index.php\/wp-json\/wp\/v2\/tags?post=120"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}