1500字范文,内容丰富有趣,写作好帮手!
1500字范文 > python爬虫获取百变大侦探剧本数据

python爬虫获取百变大侦探剧本数据

时间:2019-07-15 11:29:04

相关推荐

python爬虫获取百变大侦探剧本数据

最近玩线上剧本杀,想着试试爬虫,生成剧本数据的excel文档,看看都有哪些本方便我挑,将具体过程的代码叙述如下。

我发现游戏有一个分享功能可以分享剧本的连接到微信,在微信中复制连接,得到类似/dm/playbook_detail?id=397(DM带本模式的剧本)或者/store/bookdetail/1882(普通模式剧本)。

以/store/bookdetail/1882为例,在网页上鼠标右击,菜单栏选择检查。点击network,刷新页面,可以看到很多请求。筛选Fetch/XHR,点击detail查看这个请求的Response,它是如下图所示的json数据。里面包含了大部分网页展示的信息。包括剧本id("id"),名称("name"),价格("cost"),游戏人数("num_player"),剧本海报("image")等等。

id"1882"name"盖茨比庄园迷案"series"无"series_id0image"/images/Fi9Q3psYwlDQnbMQKLDpbhJGs3v1.jpg"club_id1880estimated_time"4.0"story_text_length"13000"background_html"<div>这是一个关于爵士乐时代的故事,就像菲茨杰拉德小说里描绘的一样。香烟、酒精、音乐、狐步舞、飞女郎的细高跟和羽毛披肩、别墅里一场接着一场上演的流动的盛宴、上流社会千金一掷的纸醉金迷。</div><div>而这一切都不过是爵士时代最为浅显的外表,真正能够定义爵士时代的,是在爵士乐停止之后那无尽的寂静和空旷。</div><div>此刻盖茨比庄园正在办一场不眠的宴会,在这里我们诚挚地邀请您一起入局。</div><div><br></div><div>开局必读:</div><div>剧本文本量较大,一共五幕请留足时间。</div><div>游戏中有两对CP(菲茨和泽尔达、威廉和阿丽塔),并设有亲密的互动环节,建议可熟人组队。</div><div>警告阿婆老粉,阿加莎味浓厚!</div>"background"这是一个关于爵士乐时代的故事,就像菲茨杰拉德小说里描绘的一样。香烟、酒精、音乐、狐步舞、飞女郎的细高跟和羽毛披肩、别墅里一场接着一场上演的流动的盛宴、上流社会千金一掷的纸醉金迷。而这一切都不过是爵士时代最为浅显的外表,真正能够定义爵士时代的,是在爵士乐停止之后那无尽的寂静和空旷。此刻盖茨比庄园正在办一场不眠的宴会,在这里我们诚挚地邀请您一起入局。\n开局必读:剧本文本量较大,一共五幕请留足时间。游戏中有两对CP(菲茨和泽尔达、威廉和阿丽塔),并设有亲密的互动环节,建议可熟人组队。警告阿婆老粉,阿加莎味浓厚!"num_players5max_player5min_player5editor_rec""author_rec""updated_time"-08-11 23:06:52"time"西方"style"现实"level"困难"price"999999.00"ori_price"999999.00"cost29ori_cost39share_cost139ori_share_cost199onsale5share_price"999999.00"effect_atnullchatroom_id"5188372305"single_mode0user_level0mark"6.9"mark_cnt134publish_date"-08-11"age_level0has_truthtrueparent_playbook_id0chapter_id1chapter_name""chapter_image""has_previou_story0isbn""price_infoObject { cost: 29, ori_cost: 39, share_cost: 139, … }pay_type1discount"7.4"vip_free0presell0series_name"无"series_uri""authors[ {…} ]author_id310author"ZNJ"signed1characters[ {…}, {…}, {…}, {…}, {…} ]custom_tag""adult_onlyfalsegift0unlock_free_enable1unlock_free_cost0read_progress0share5own0played0share_total_cost139purchase0playbook_id"1882"commentnullerror_descriptionnullroom_count"2"

因此我们只需要按照剧本id遍历所有剧本,获取这些数据再存储到excel表格中就可以达成我们的目的。

环境配置:python3安装爬虫包urllib3,excel读写工具xlwt。

代码:

# -*- coding: utf-8 -*-import urllib3from urllib.parse import urlencodeimport jsonimport xlwtimport timeimport randomdef main():http=urllib3.PoolManager();# 创建新的workbook(其实就是创建新的excel)workbook = xlwt.Workbook(encoding= 'ascii')# 创建新的sheet表worksheet = workbook.add_sheet("百变大侦探全剧本数据")stri=["剧本名","价格","难度等级","评分","风格","发生时代","人数","剧本字数","预计时间(h)","链接"]j=0;for st in stri:worksheet.write(0,j,stri[j])j=j+1row=1for id in range(1,3000,1):url="/api/playbook/"+str(id)+"/detail"r=http.request('GET', url)print(r.data)print(id)d=json.loads(r.data.decode('utf-8'));if 'name' in d:worksheet.write(row,0,d['name'])else:continue;if 'cost' in d:worksheet.write(row,1,d['cost'])else:continue;if 'level' in d:worksheet.write(row,2,d['level'])else:continue;if 'mark' in d:worksheet.write(row,3,d['mark'])else:continue;if 'style' in d:worksheet.write(row,4,d['style'])else:continue;if 'time' in d:worksheet.write(row,5,d['time'])else:continue;if 'num_players' in d:worksheet.write(row,6,d['num_players'])else:continue;if 'story_text_length' in d:worksheet.write(row,7,d['story_text_length'])else:continue;if 'estimated_time' in d:worksheet.write(row,8,d['estimated_time'])else:continue;detailUrl="/store/bookdetail/"+str(id)worksheet.write(row,9,detailUrl)row=row+1#time.sleep(random.randint(10,30))workbook.save("百变大侦探全剧本数据.xls")if __name__ == '__main__':main()

跑完打开百变大侦探全剧本数据.xls,部分结果如图:

这时候发现一个问题,有的剧本是无效的,是官方的测试数据,所以再筛选一次,删除所有人数小于1以及字数为0的本:

# -*- coding: utf-8 -*-import urllib3from urllib.parse import urlencodeimport jsonimport xlwtimport timeimport random#此函数用来判定剧本是不是不合法的def check(d):if 'name' not in d:return Falseif 'story_text_length' not in d:return Falseelif int(d['story_text_length'])<1:return Falseif 'num_players' not in d:return Falseelif int(d['num_players'])<1:return Falsereturn Truedef main():http=urllib3.PoolManager();# 创建新的workbook(其实就是创建新的excel)workbook = xlwt.Workbook(encoding= 'ascii')# 创建新的sheet表worksheet = workbook.add_sheet("百变大侦探全剧本数据")stri=["剧本名","价格","难度等级","评分","风格","发生时代","人数","剧本字数","预计时间(h)","链接"]j=0;for st in stri:worksheet.write(0,j,stri[j])j=j+1row=1for id in range(1,3000,1):url="/api/playbook/"+str(id)+"/detail"r=http.request('GET', url)print(r.data)print(id)d=json.loads(r.data.decode('utf-8'));if check(d)==False:continueif 'name' in d:worksheet.write(row,0,d['name'])else:continue;if 'cost' in d:worksheet.write(row,1,d['cost'])else:continue;if 'level' in d:worksheet.write(row,2,d['level'])else:continue;if 'mark' in d:worksheet.write(row,3,d['mark'])else:continue;if 'style' in d:worksheet.write(row,4,d['style'])else:continue;if 'time' in d:worksheet.write(row,5,d['time'])else:continue;if 'num_players' in d:worksheet.write(row,6,d['num_players'])else:continue;if 'story_text_length' in d:worksheet.write(row,7,d['story_text_length'])else:continue;if 'estimated_time' in d:worksheet.write(row,8,d['estimated_time'])else:continue;detailUrl="/store/bookdetail/"+str(id)worksheet.write(row,9,detailUrl)row=row+1#time.sleep(random.randint(10,30))workbook.save("百变大侦探全剧本数据.xls")if __name__ == '__main__':main()

现在结果看起来正常了

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。