1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112
| (venv) xadocker@xadocker-virtual-machine:~/PycharmProjects/untitled1/douban$ scrapy crawl movie_comment 2022-12-14 21:43:08 [scrapy.utils.log] INFO: Scrapy 2.7.1 started (bot: douban) 2022-12-14 21:43:08 [scrapy.utils.log] INFO: Versions: lxml 4.9.2.0, libxml2 2.9.14, cssselect 1.2.0, parsel 1.7.0, w3lib 2.1.1, Twisted 22.10.0, Python 3.8.0 (default, Dec 9 2021, 17:53:27) - [GCC 8.4.0], pyOpenSSL 22.1.0 (OpenSSL 3.0.7 1 Nov 2022), cryptography 38.0.4, Platform Linux-5.4.0-132-generic-x86_64-with-glibc2.27 2022-12-14 21:43:08 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'douban', 'NEWSPIDER_MODULE': 'douban.spiders', 'REQUEST_FINGERPRINTER_IMPLEMENTATION': '2.7', 'SPIDER_MODULES': ['douban.spiders'], 'TWISTED_REACTOR': 'twisted.internet.asyncioreactor.AsyncioSelectorReactor', 'USER_AGENT': 'User-Agent=Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; ' 'rv:1.9) Gecko/20080705 Firefox/3.0 Kapiko/3.0'} 2022-12-14 21:43:08 [asyncio] DEBUG: Using selector: EpollSelector 2022-12-14 21:43:08 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor 2022-12-14 21:43:08 [scrapy.utils.log] DEBUG: Using asyncio event loop: asyncio.unix_events._UnixSelectorEventLoop 2022-12-14 21:43:08 [scrapy.extensions.telnet] INFO: Telnet Password: 1d1e99a9f0b3f857 2022-12-14 21:43:08 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.memusage.MemoryUsage', 'scrapy.extensions.logstats.LogStats'] 2022-12-14 21:43:08 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2022-12-14 21:43:08 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2022-12-14 21:43:08 [scrapy.middleware] INFO: Enabled item pipelines: ['douban.pipelines.DoubanPipeline'] 2022-12-14 21:43:08 [scrapy.core.engine] INFO: Spider opened 2022-12-14 21:43:08 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2022-12-14 21:43:08 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2022-12-14 21:43:09 [filelock] DEBUG: Attempting to acquire lock 140420844892016 on /home/xadocker/.cache/python-tldextract/3.8.0.final__venv__0354b0__tldextract-3.4.0/publicsuffix.org-tlds/de84b5ca2167d4c83e38fb162f2e8738.tldextract.json.lock 2022-12-14 21:43:09 [filelock] DEBUG: Lock 140420844892016 acquired on /home/xadocker/.cache/python-tldextract/3.8.0.final__venv__0354b0__tldextract-3.4.0/publicsuffix.org-tlds/de84b5ca2167d4c83e38fb162f2e8738.tldextract.json.lock 2022-12-14 21:43:09 [filelock] DEBUG: Attempting to release lock 140420844892016 on /home/xadocker/.cache/python-tldextract/3.8.0.final__venv__0354b0__tldextract-3.4.0/publicsuffix.org-tlds/de84b5ca2167d4c83e38fb162f2e8738.tldextract.json.lock 2022-12-14 21:43:09 [filelock] DEBUG: Lock 140420844892016 released on /home/xadocker/.cache/python-tldextract/3.8.0.final__venv__0354b0__tldextract-3.4.0/publicsuffix.org-tlds/de84b5ca2167d4c83e38fb162f2e8738.tldextract.json.lock 2022-12-14 21:43:09 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://movie.douban.com/subject/35675082/comments?limit=20&status=P&sort=new_score> (referer: None) {'comment_date': '2022-08-08 23:11:30', 'comment_user': '陈', 'comment_vote': '775', 'comment_content': '尬住了,敷衍的打斗和过度的特效溢出屏幕的光污染。'}
{'comment_date': '2022-08-08 19:04:10', 'comment_user': 'Tei', 'comment_vote': '506', 'comment_content': 'ado个人演唱会'}
{'comment_date': '2022-09-25 19:19:57', 'comment_user': '次等水货', 'comment_vote': '272', 'comment_content': '作为民工漫里最长寿的一部是有道理的,对粉丝来说这是一场蓄谋已久的狂欢,红发动容,热血和激情澎湃,贝波打call真的可爱极了。对非粉来说也没有观看难度,剧情对每一个出场的角色都有照顾,乌塔是香克斯的女儿自始至终都不会变,这是一次温柔的家庭和解,也是对银幕内外泛滥的负面情绪的一场救赎,乌塔想要创造一个没有苦难的世界,毫不意外最终是梦境一场,但一次完整的、有起有兴的ADO演唱会也能让人心头一软。'}
{'comment_date': '2022-08-08 16:20:33', 'comment_user': '辣手修猫', 'comment_vote': '306', 'comment_content': '这是开了一场个人演唱会啊,我觉得这个很适合小朋友看,大人的话闭上眼睛听听音乐还是可以的,剧情几乎是为零。'}
{'comment_date': '2022-09-29 11:58:38', 'comment_user': '林微云', 'comment_vote': '233', 'comment_content': '缤纷的色彩,华丽的音符,仿佛在电影院听了一场Live演唱会,让人梦回大和歌姬时代过是阴谋的一体两面。你是愿意沉迷在甜美的歌声中死去,还是宁愿辛苦努力踏实过每一天?魔法音乐的这个哲思,要怎么回答才能安全地活下去'}
{'comment_date': '2022-09-21 23:46:44', 'comment_user': '犯罪嫌疑人', 'comment_vote': '463', 'comment_content': '这就是歌姬吧?'}
{'comment_date': '2022-09-22 20:32:11', 'comment_user': '桃桃林林', 'comment_vote': '169', 'comment_content': '等于看了一场演唱会,ADO的歌还是不错的。'}
{'comment_date': '2022-08-08 13:31:24', 'comment_user': '动物世界', 'comment_vote': '417', 'comment_content': '这也太粉丝向幼龄化了,海贼现在就疯狂过滤收集高浓缩粉丝吗?'}
{'comment_date': '2022-12-01 23:06:37', 'comment_user': 'Rocktemple', 'comment_vote': '116', 'comment_content': '又是被自我感动的东亚爹气死的一天'}
{'comment_date': '2022-12-02 21:27:24', 'comment_user': '问宝侠', 'comment_vote': '37', 'comment_content': '好漫长又随意的一部剧场版,槽点真的有比隔壁柯南少吗……加各种强行的设定也一定要促个三星吧。\n\n对池田秀一的声音都要有阴影了,又是这种被过度神话的装逼人物。另外,中文字幕强行翻译成航海王就很真的很能让人意识到,到底为什么这些不偷不杀不作恶的人要自称“海贼”。每次看乌塔和路飞就“为什么要当海贼”鸡同鸭讲地吵起来时,都很想打断他们,“其实他只是想当巡游世界的夺宝奇兵啦”。'}
{'comment_date': '2022-08-06 23:02:52', 'comment_user': '盛夏朝颜', 'comment_vote': '997', 'comment_content': '尾田这两年没少看女团吧'}
{'comment_date': '2022-08-09 16:47:33', 'comment_user': '血浆爱好者', 'comment_vote': '209', 'comment_content': '好烂的歌舞片。'}
{'comment_date': '2022-12-02 11:47:08', 'comment_user': 'dddd', 'comment_vote': '151', 'comment_content': '买red电影票送uta演唱会门票'}
{'comment_date': '2022-12-01 21:04:48', 'comment_user': 'Anything Goes!', 'comment_vote': '145', 'comment_content': '久违的在影院看电影,感谢海贼让我渡过近期最有意义的两个小时!\n乌塔那么出色,难怪路飞刚出海的时候,就嚷嚷着要找音乐家当伙伴😊'}
{'comment_date': '2022-08-06 10:45:48', 'comment_user': '几米米', 'comment_vote': '584', 'comment_content': '给香克斯个面子,第一次演电影啊!'}
{'comment_date': '2022-08-07 14:43:29', 'comment_user': '柠檬茶', 'comment_vote': '120', 'comment_content': '打斗还可以,剧情也就那么回事。'}
{'comment_date': '2022-12-01 23:01:10', 'comment_user': '星空', 'comment_vote': '39', 'comment_content': '还可以打磨的更好看,乌塔前面不用知道真相,后面知道想改变却被魔王吞噬,改成这样好能基本回归正常生活。'}
{'comment_date': '2022-12-01 21:30:02', 'comment_user': '一條魚佔滿了河', 'comment_vote': '33', 'comment_content': '★★☆ 一切自作主張的為你好,都是幼稚與傲慢的表現,以自由之名剝奪自由,不到記不得了,《海賊王:紅髮歌姬》的作畫算是最讓我驚艷的部分,但是在劇情上則太多意料之中,對於歌舞場面,在受到過《犬王》的全面震撼之後,就顯得平平無奇許多,對於熱血場面,劇情一直在用力頂,卻始終沒能讓我有熱血沸騰感,直到路飛和香克斯跨時空合力才算戳到了一下,遠沒有上一部劇場版後半段全程熱血衝腦的爽感。'}
{'comment_date': '2022-11-28 16:38:37', 'comment_user': '鬼腳七', 'comment_vote': '42', 'comment_content': '要不是最后想起来还要拍点战斗段落,我差点以为我又看了一遍龙与雀斑公主'}
{'comment_date': '2022-08-06 11:46:41', 'comment_user': '麻圆姬', 'comment_vote': '298', 'comment_content': '香克斯的面子必须要给'}
2022-12-14 21:43:09 [scrapy.core.engine] INFO: Closing spider (finished) 2022-12-14 21:43:09 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 344, 'downloader/request_count': 1, 'downloader/request_method_count/GET': 1, 'downloader/response_bytes': 13538, 'downloader/response_count': 1, 'downloader/response_status_count/200': 1, 'elapsed_time_seconds': 0.802649, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2022, 12, 14, 13, 43, 9, 384361), 'httpcompression/response_bytes': 68737, 'httpcompression/response_count': 1, 'log_count/DEBUG': 8, 'log_count/INFO': 10, 'memusage/max': 67321856, 'memusage/startup': 67321856, 'response_received_count': 1, 'scheduler/dequeued': 1, 'scheduler/dequeued/memory': 1, 'scheduler/enqueued': 1, 'scheduler/enqueued/memory': 1, 'start_time': datetime.datetime(2022, 12, 14, 13, 43, 8, 581712)} 2022-12-14 21:43:09 [scrapy.core.engine] INFO: Spider closed (finished)
|