{
"info": {
"author": "['Eunhou Esther Song']",
"author_email": "eunhou.song@gmail.com",
"bugtrack_url": null,
"classifiers": [],
"description": "Naver News Crawler\n------------------\n\n\uc18c\uac1c\n----\n\n\uac80\uc0c9\uc5b4\uc640 \uc2dc\uc791 \ub0a0\uc9dc \ubc0f \ub9c8\uc9c0\ub9c9 \ub0a0\uc9dc\ub97c \uc785\ub825\ud558\uc5ec \ub124\uc774\ubc84 \ud3ec\ud0c8\uc5d0 \uac8c\uc2dc\ub41c \ub274\uc2a4\ub97c\n\uc2a4\ud06c\ub798\uc774\ud551 \ud558\ub294 \uc2a4\ud06c\ub9bd\ud2b8\uc774\ub2e4. \ub124\uc774\ubc84 \ud3ec\ud0c8\uc5d0 \uac8c\uc2dc\ub41c \ub274\uc2a4\ub294\nhttp://news.naver.com\n\uc73c\ub85c \uc2dc\uc791\ud558\uba70, \uc774 \ub274\uc2a4\ub4e4 \uc678\uc758 \ub274\uc2a4\ub294 \uc2a4\ud06c\ub798\uc774\ud551 \ud558\uc9c0 \uc54a\ub294\ub2e4.\n\n\ub3d9\uae30\n----\n\n\ub274\uc2a4\ub97c \ub2e4\uc591\ud55c \ubaa9\uc801\uc744 \uc704\ud574\uc11c \uc2a4\ud06c\ub798\uc774\ud551 \ud558\uace0\uc790\ud558\ub294 \uc218\uc694\uac00 \ub298\uc5c8\ub2e4. \ub2e4\ub9cc,\n\ud604\uc874\ud558\ub294 \uc2a4\ud06c\ub9bd\ud2b8\ub294 \ud2b9\uc815 \uae30\uac04 \uc548\uc758 \ub274\uc2a4\ub97c \uc2a4\ud06c\ub798\uc774\ud551 \ud560 \uc218 \uc5c6\ub2e4. \uc774\n\uc2a4\ud06c\ub9bd\ud2b8\ub294 \ud604\uc874\ud558\ub294 \uc2a4\ud06c\ub9bd\ud2b8\ub97c \uac1c\uc120, \ubcf4\uc644\ud558\uc5ec \ub0a0\uc9dc \ubc0f \uacb0\uacfc\ubb3c\uc758 \ud398\uc774\uc9c0\n\uc218\ub97c \uc9c0\uc815\ud560 \uc218 \uc788\uac8c \ud558\uc600\ub2e4. \uac00\ub839 2018-12-26\uc77c \u2019\ub3c5\ub3c4\u2019\uc5d0 \ub300\ud55c \ub274\uc2a4 \uac80\uc0c9\uc744\n\ud558\uc600\uc744\ub54c, \uacb0\uacfc\ubb3c\uc758 \uc2dc\uc791 \ud398\uc774\uc9c0\uc640 \ub9c8\uc9c0\ub9c9 \ud398\uc774\uc9c0\ub97c \uc9c0\uc815\ud574\uc11c \uc2a4\ud06c\ub808\uc774\ud551 \ud560\n\uc218 \uc788\uc73c\uba70, \ub0a0\uc9dc \uae30\uac04 \ub610\ud55c \uc9c0\uc815\ud560 \uc218 \uc788\ub2e4 (\uc608: 2018-12-26 \ubd80\ud130 2018-12-30\n\uae4c\uc9c0).\n\n\uacb0\uacfc\ubb3c\uc740 \ub274\uc2a4\uc758 \uc81c\ubaa9, \ub274\uc2a4 \ud68c\uc0ac, \ub0a0\uc9dc, \ud14c\uc2a4\ud2b8 \ub124\uac00\uc9c0\uc774\ub2e4.\n\n\uc124\uce58\n----\n\n\uc774 \uc2a4\ud06c\ub9bd\ud2b8\ub294 python 3.7\uc5d0\uc11c\ub9cc \uc9c0\uc6d0\ub41c\ub2e4.\n\n (sudo) pip3 install navernewscrawler\n\n\ud639\uc740 repository\uc5d0\uc11c setup.py\uc744 \ubcf5\uc0ac\ud558\uc5ec \uc9c1\uc811 \uc785\ub825\ud55c\ub2e4.\n\n python3.7 setup.py install\n\n\ucee4\ub9e8\ub4dc\uc5b4 \uc815\ub9ac\n-------------\n\n- -h \ud639\uc740 \u2013help: help message\ub97c \ubcfc \uc218 \uc788\ub2e4.\n- -bd \ud639\uc740 \u2013begindate: \uc2a4\ud06c\ub798\uc774\ud551 \uc2dc\uc791 \ub0a0\uc9dc\ub97c \uc9c0\uc815\ud55c\ub2e4. \ub144\ub3c4, \uc6d4, \uc77c\uc740\n \u2019-\u2019\uc73c\ub85c \uad6c\ubd84\ud55c\ub2e4. \uc608: 2018-12-26, 2018-06-19\n- -ed \ud639\uc740 \u2013enddate: \uc2a4\ud06c\ub798\uc774\ud551 \ub9c8\uc9c0\ub9c9 \ub0a0\uc9dc\ub97c \uc9c0\uc815\ud55c\ub2e4.\n- -p \ud639\uc740 \u2013page: \ub274\uc2a4 \uacb0\uacfc \ud398\uc774\uc9c0 \uc911 \uc2a4\ud06c\ub798\uc774\ud551 \ud560 \uccab \ud398\uc774\uc9c0\ub97c\n \uc9c0\uc815\ud55c\ub2e4. \ub514\ud3f4\ud2b8\ub294 \ud398\uc774\uc9c0 1\uc774\ub2e4.\n- -max\\_page \ud639\uc740 \u2013max\\_page: \ub274\uc2a4 \uacb0\uacfc \ud398\uc774\uc9c0 \uc911 \ub9c8\uc9c0\ub9c9 \ud398\uc774\uc9c0\ub97c\n \uc9c0\uc815\ud55c\ub2e4. \ub514\ud3f4\ud2b8\ub294 \ud398\uc774\uc9c0 5\uc774\ub2e4. \ud55c \ud398\uc774\uc9c0 \ub2f9 10\uac74\uc758 \ub274\uc2a4\uacb0\uacfc\ub97c \ubcfc\n \uc218 \uc788\uae30 \ub54c\ubb38\uc5d0, \ud558\ub8e8\uc5d0 50\uac74\uc758 \ub274\uc2a4\ub97c \uc2a4\ud06c\ub798\uc774\ud551 \ud55c\ub2e4.\n- -c \ud639\uc740 \u2013csv: \uc2a4\ud06c\ub798\uc774\ud551 \uacb0\uacfc\ub97c CSV\ud30c\uc77c\uc5d0 \uc800\uc7a5\ud55c\ub2e4.\n- -d \ud639\uc740 \u2013dump: \uc2a4\ud06c\ub798\uc774\ud551 \uacb0\uacfc\ub97c \ucf58\uc194\uc5d0 \ubcf4\uc5ec\uc900\ub2e4.\n\n\uc608\uc2dc\n----\n\n- \ucee4\ub9e8\ub4dc \uc785\ub825\uc2dc \uae30\ubcf8\uc73c\ub85c \uc2dc\uc791\ub0a0\uc9dc\\_news\\_scrape\\_\ub9c8\uc9c0\ub9c9\ub0a0\uc9dc.json\n \ud30c\uc77c\ub85c \uc800\uc7a5\ub41c\ub2e4. \uc608\ub97c \ub4e4\uc5b4 2018\ub14412\uc6d426\uc77c\ubd80\ud130 2019\ub14412\uc6d426\uc77c\uae4c\uc9c0\uc758\n \ub274\uc2a4\ub97c \uc2a4\ud06c\ub798\uc774\ud551 \ud558\uba74 \uacb0\uacfc\ub294\n 20181226\\_news\\_scrape\\_20191226.json\uc73c\ub85c \uc800\uc7a5\ub41c\ub2e4.\n\n- 2018-12-26, 2018-12-27 \uc774\ud2c0 \ub3d9\uc548 \ub3c5\ub3c4\uc640 \uad00\ub828\ub41c \ub274\uc2a4\ub97c \uacb0\uacfc\ubb3c \ud398\uc774\uc9c0\n 1\uc5d0\uc11c 3\uae4c\uc9c0 \uc2a4\ud06c\ub798\uc774\ud551 \ud55c\ub2e4.\n\n\n\n navernewscrawler \ub3c5\ub3c4 -bd 2018-12-26 -ed 2018-12-27 -p 1 -max_page 3 \n\n- 2018-12-26 \ud558\ub8e8 \ub3d9\uc548 \ub3c5\ub3c4\uc640 \uad00\ub828\ub41c \ub274\uc2a4\ub97c \uacb0\uacfc\ubb3c \ud398\uc774\uc9c0 1\uc5d0\uc11c 3\uae4c\uc9c0\n \uc2a4\ud06c\ub798\uc774\ud551 \ud55c\ub2e4. \uacb0\uacfc\ub294 csv \ud30c\uc77c\ub85c \uc800\uc7a5\ud55c\ub2e4.\n\n\n\n navernewscrawler \ub3c5\ub3c4 -bd 2018-12-26 -ed 2018-12-26 -p 1 -max_page 3 -c\n\n\uacb0\uacfc\ubb3c\n------\n\n [{'title': '\uc6b0\ub9ac\ub098\ub77c \uad6d\ud68c\uc758\uc6d0\ub4e4, \uc77c\ubcf8\uc5d0\uc11c \ubcf4\ub0b8 \ub3c5\ub3c4 \ubc29\ubb38 \ud56d\uc758 \uc11c\ud55c \ubc18\uc1a1\ud574',\n 'date': '2018-12-26 ',\n 'company': 'YTN',\n 'text': '\uc9c0\ub09c 10\uc6d4 22\uc77c \ub3c5\ub3c4\ub97c \ubc29\ubb38\ud55c \uad6d\ud68c \uad50\uc721\uc704\uc6d0\ud68c \uc18c\uc18d \uc758\uc6d0\ub4e4\uc774 \uc77c\ubcf8 \uc790\ubbfc\ub2f9 \uc18c\uc18d \uc911\uc758\uc6d0 \ub4f1\uc774 \ubcf4\ub0b8 \ud56d\uc758 \uc11c\ud55c\uc744 \ub418\ub3cc\ub824\ubcf4\ub0b8 \uc0ac\uc2e4\uc774 \ub4a4\ub2a6\uac8c \uc54c\ub824\uc84c\ub2e4.\uc77c\ubcf8 \uc5b8\ub860\uc5d0 \ub530\ub974\uba74, \uc77c\ubcf8 \uc790\ubbfc\ub2f9 \uc18c\uc18d \uc911\uc758\uc6d0\uc774\uc790 \\'\uc77c\ubcf8 \uc601\ud1a0\ub97c \uc9c0\ud0a4\uae30 \uc704\ud574 \ud589\ub3d9\ud558\ub294 \uc758\uc6d0 \uc5f0\ub9f9\\' \uc18c\uc18d \uc2e0\ub3c4 \uc694\uc2dc\ud0c0\uce74 \uc758\uc6d0\uc740 25\uc77c \uae30\uc790\ud68c\uacac\uc744 \uc5f4\uc5b4 \ud55c\uad6d \uad6d\ud68c\uc758\uc6d0\ub4e4\uc774 \ubc18\uc1a1\ud55c \uc11c\ud55c\uc744 \uacf5\uac1c\ud588\ub2e4.\\'\uc77c\ubcf8 \uc601\ud1a0\ub97c \uc9c0\ud0a4\uae30 \uc704\ud55c \uc758\uc6d0 \uc5f0\ub9f9\\'\uc740 \"\ub3c5\ub3c4\uac00 \ud55c\uad6d \ub545\uc778 \uadfc\uac70\ub97c \ub300\ub77c\"\ub294 \ub0b4\uc6a9\uc744 \ub2f4\uc740 \ud56d\uc758\uc11c\ud55c 13\ud1b5 \uc911 10\ud1b5\uc740 \ub72f\uc5b4\uc9c4 \ucc44\ub85c, \ub098\uba38\uc9c0\ub294 \ubd09\ud22c \uc5c6\uc774, \ub2e4\ub978 \ud55c \ud1b5\uc740 \ubc18\uc1a1\ub418\uc9c0 \uc54a\uc558\ub2e4\uace0 \ubc1d\ud614\ub2e4.\uc9c0\ub09c 10\uc6d4, \ub3c5\ub3c4\ub97c \ubc29\ubb38\ud588\ub358 \uad6d\ud68c \uad50\uc721\uc758\uc6d0\uc774\uc5c8\ub358 \uc774\ucc2c\uc5f4 \ubc14\ub978\ubbf8\ub798\ub2f9 \uc758\uc6d0\uc740 \ub3c5\ub3c4 \ubc29\ubb38 \ud6c4\uc5d0 \uc774\ubbf8 \ud56d\uc758 \uc11c\ud55c\uc744 \ubc1b\uc9c0 \uc54a\uaca0\ub2e4\ub294 \uc758\uc9c0\ub97c \ud45c\uba85\ud55c \ubc14 \uc788\ub2e4. \uc774\ucc2c\uc5f4 \uc758\uc6d0\uc740 \ub3c5\ub3c4\uac00 \uc6b0\ub9ac \ub545\uc778 \uadfc\uac70\ub97c \ub300\ub77c\ub294 \uc9c8\ubb38\uc5d0 \ub300\ud574 \"\ub2f5\ubcc0\ud560 \uc774\uc720\uac00 \uc5c6\ub2e4\"\uace0 \uc798\ub77c \ub9d0\ud558\uae30\ub3c4 \ud588\ub2e4.\uc774\ucc2c\uc5f4 \uc758\uc6d0\uc740 CBS\uc640\uc758 \uc778\ud130\ubdf0\uc5d0\uc11c \"(\ubc18\ub300\ub85c) \ub2f9\uc2e0\ub4e4\uc774 \ub3c5\ub3c4\uac00 \uc77c\ubcf8 \ub545\uc774\ub77c\uace0 \uc8fc\uc7a5\ud558\ub294 \uadfc\uac70\ub97c \ub300\ubd10\ub77c\"\ub77c\uba70 \"\uc77c\ubcf8\uc774 \uad70\uad6d\uc8fc\uc758\uc758 \uc57c\uc2ec\ub9cc \ub4dc\ub7ec\ub0b4\uace0 \uc788\ub2e4\"\uace0 \ub9d0\ud55c \ubc14 \uc788\ub2e4.[\uc0ac\uc9c4 = \uc77c\ubcf8 \uc601\ud1a0\ub97c \uc9c0\ud0a4\uae30 \uc704\ud574 \ud589\ub3d9\ud558\ub294 \uc758\uc6d0 \uc5f0\ub9f9, \uc774\ucc2c\uc5f4 \uc758\uc6d0 \ud2b8\uc704\ud130]YTN PLUS \ucd5c\uac00\uc601 \uae30\uc790 (weeping07@ytnplus.co.kr) \u25b6 24\uc2dc\uac04 \uc2e4\uc2dc\uac04 \ub274\uc2a4 \uc0dd\ubc29\uc1a1 \ubcf4\uae30 \u25b6 \ub124\uc774\ubc84 \uba54\uc778\uc5d0\uc11c YTN\uc744 \uad6c\ub3c5\ud574\uc8fc\uc138\uc694! [\uc800\uc791\uad8c\uc790(c) YTN & YTN PLUS \ubb34\ub2e8\uc804\uc7ac \ubc0f \uc7ac\ubc30\ud3ec \uae08\uc9c0]'},\n {'title': \"\u65e5\uc758\uc6d0\uc774 \u97d3\uc758\uc6d0\uc5d0 \ubcf4\ub0b8 '\ub3c5\ub3c4 \uc601\uc720\uad8c' \uc9c8\ubb38\uc11c \ubc18\uc1a1\",\n 'date': '2018-12-26 ',\n 'company': '\uc5f0\ud569\ub274\uc2a4',\n 'text': '(\ub3c4\ucfc4=\uc5f0\ud569\ub274\uc2a4) \uae40\uc815\uc120 \ud2b9\ud30c\uc6d0 = \uc77c\ubcf8 \uc5ec\uc57c \uc758\uc6d0\ub4e4\ub85c \uad6c\uc131\ub41c \ubaa8\uc784\uc774 \uc9c0\ub09c 10\uc6d4 \ub3c5\ub3c4\ub97c \ubc29\ubb38\ud55c \uc6b0\ub9ac\ub098\ub77c \uad6d\ud68c\uc758\uc6d0\ub4e4\uc5d0\uac8c \ud55c\uad6d\uc758 \ub3c5\ub3c4 \uc601\uc720\uad8c \uc8fc\uc7a5 \uadfc\uac70\ub97c \uc81c\uc2dc\ud558\ub77c\uba70 \ubcf4\ub0c8\ub358 \uacf5\uac1c\uc9c8\ubb38\uc11c\uac00 \ubc18\uc1a1\ub41c \uac83\uc73c\ub85c \ub098\ud0c0\ub0ac\ub2e4. 26\uc77c NHK \ub4f1\uc5d0 \ub530\ub974\uba74 \\'\uc77c\ubcf8 \uc601\ud1a0\ub97c \uc9c0\ud0a4\uae30 \uc704\ud574 \ud589\ub3d9\ud558\ub294 \uc758\uc6d0\uc5f0\ub9f9\\'(\uc774\ud558 \uc758\uc6d0\uc5f0\ub9f9)\uc758 \uc2e0\ub3c4 \uc694\uc2dc\ud0c0\uce74(\u65b0\u85e4\u7fa9\u5b5d\u00b7\uc790\ubbfc\ub2f9) \ud68c\uc7a5\uc740 \uc804\ub0a0 \uae30\uc790\ud68c\uacac\uc5d0\uc11c \uc9c0\ub09c\ub2ec \ubc1c\uc1a1\ud55c \uc9c8\ubb38\uc11c\uac00 \uadf8\ub300\ub85c \ubc18\uc1a1\ub410\ub2e4\uace0 \ubc1d\ud614\ub2e4. \uc758\uc6d0\uc5f0\ub9f9\uc740 \uc9c0\ub09c 10\uc6d4 22\uc77c \ud55c\uad6d\uc758 \uad6d\ud68c \uad50\uc721\uc704\uc6d0\ud68c \uc18c\uc18d \uc758\uc6d0\ub4e4\uc774 \ub3c5\ub3c4\ub97c \ubc29\ubb38\ud558\uc790 \ub2e4\uc74c \ub2ec \uc774\ub97c \uc6a9\ub0a9\ud560 \uc218 \uc5c6\ub2e4\uba70 \ud55c\uad6d \uce21\uc758 \uc601\uc720\uad8c \uadfc\uac70 \ub4f1\uc744 \uc81c\uc2dc\ud558\ub77c\ub294 \uc9c8\ubb38\uc11c\ub97c \ubcf4\ub0c8\ub2e4. \\'\ub3c5\ub3c4\ub294 \uc6b0\ub9ac\ub545\\'(\uc11c\uc6b8=\uc5f0\ud569\ub274\uc2a4) \uae40\uc8fc\uc131 \uae30\uc790 = \uc77c\ubcf8 \uc2dc\ub9c8\ub124(\u5cf6\u6839)\ud604\uc774 \\'\ub2e4\ucf00\uc2dc\ub9c8(\u7af9\u5cf6\u00b7\uc77c\ubcf8\uc774 \uc8fc\uc7a5\ud558\ub294 \ub3c5\ub3c4 \uba85\uce6d)\uc758 \ub0a0\\' \ud589\uc0ac\ub97c \uc8fc\ucd5c\ud55c 2017\ub144 2\uc6d4 22\uc77c \uc624\ud6c4 \uc11c\uc6b8 \uc885\ub85c\uad6c \uc8fc\ud55c\uc77c\ubcf8\ub300\uc0ac\uad00 \uc61b\ud130 \uc55e\uc5d0\uc11c \ub098\ub77c\uc0b4\ub9ac\uae30\uad6d\ubbfc\uc6b4\ub3d9\ubcf8\ubd80 \ucc38\uac00 \ud559\uc0dd\ub4e4\uc774 \uc77c\ubcf8\uc758 \ub3c5\ub3c4 \uce68\ud0c8 \uc57c\uc695\uc744 \uaddc\ud0c4\ud55c \ub4a4 \ub9cc\uc138\uc0bc\ucc3d\uc744 \ud558\uace0 \uc788\ub2e4. 2017.2.22 utzza@yna.co.kr \uc758\uc6d0\uc5f0\ub9f9\uc740 \ud55c\uad6d \uad6d\ud68c\uc758\uc6d0 13\uba85\uc5d0\uac8c \uc9c8\ubb38\uc11c\ub97c \ubcf4\ub0c8\uc9c0\ub9cc 12\ud1b5\uc774 \ubc18\uc1a1\ub410\ub2e4\uace0 \uc0b0\ucf00\uc774\uc2e0\ubb38\uc740 \uc804\ud588\ub2e4. \uc2e0\ub3c4 \ud68c\uc7a5\uc740 \uae30\uc790\ud68c\uacac\uc5d0\uc11c \uc9c8\ubb38\uc11c\uac00 \ubc18\uc1a1\ub41c \uac83\uc5d0 \ub300\ud574 \"\ub9e4\uc6b0 \uc720\uac10\"\uc774\ub77c\uba70 \"\ub3c5\uc120\uc801 \ud589\ub3d9\ubc16\uc5d0 \ud558\uc9c0 \uc54a\ub294 \uad6d\uac00\uc758 \ubbf8\ub798\ub294 \ub9e4\uc6b0 \uac71\uc815\uc2a4\ub7fd\ub2e4\"\uace0 \uc8fc\uc7a5\ud588\ub2e4. \uc2e0\ub3c4 \uc758\uc6d0\uc740 \"\ud55c\uc77c\uad00\uacc4\ub294 \ub2e4\ucf00\uc2dc\ub9c8(\u7af9\u5cf6\u00b7\uc77c\ubcf8\uc774 \uc8fc\uc7a5\ud558\ub294 \ub3c5\ub3c4\uc758 \uba85\uce6d) \ubb38\uc81c\uac00 \uadfc\uc6d0\uc5d0 \ubc15\ud600 \uc788\uc5b4 \uc774\uac83\uc774 \ube60\uc9c0\uc9c0 \uc54a\ub294 \ud55c \uc9c4\uc815\ud55c \uc2e0\ub8b0\ub85c\ub294 \uc774\uc5b4\uc9c0\uc9c0 \uc54a\uc744 \uac83\"\uc774\ub77c\uace0 \ub9d0\ud588\ub2e4\uace0 \ubc29\uc1a1\uc740 \ub367\ubd99\uc600\ub2e4. jsk@yna.co.kr\u25b6\ubb50 \ud558\uace0 \ub180\uae4c? #\ud765 \u25b6\uc1fc\ubbf8\ub354\ub274\uc2a4! \uc624\ub298 \ub9ce\uc774 \ubcf8 \ub274\uc2a4\uc601\uc0c1 \u25b6\ub124\uc774\ubc84 \ud648\uc5d0\uc11c [\uc5f0\ud569\ub274\uc2a4] \ucc44\ub110 \uad6c\ub3c5\ud558\uae30'}]\n\n### Json \ud30c\uc77c \uc77d\uae30\n\n import codecs\n import json\n with codecs.open('\ud30c\uc77c\uc774\ub984.json', 'r', 'utf-8') as f:\n news = json.load(f, encoding='utf-8')\n\nIntroduction\n------------\n\nThis is a script that scrapes Naver news results of a query word(s). The\nscraped results only include news published on naver news portal, which\nbegins with url\nhttp://news.naver.com.\nThis tool does not scrape results that do not begin with this url.\n\nThe scraped results include the title, text, date, and the media source.\n\nMotivation\n----------\n\nThere has been rise in demand for scraping news online, yet there has\nnot been a proper tool that allows scraping Korean news online. This\ntool allows users to scrape news published on Naver, one of the largest\nweb portals in South Korea. Pre-existing tools only allow crawling a\nsingle query result that does not allow collection of new results over\ntime. This tool allows collection of news published on Naver over a\nperiod of time, and also provides the user with the option to limit the\nscrape results per date. For instance, news results per day may reach\nmore than 40,000 page results, but the user can limit the scope by\nsetting the starting page and the ending page using command line\noptions.\n\nInstallation\n------------\n\nThis script only runs on Python 3.7.\n\n (sudo) pip3 install navernewscrawler\n\nOr you can download setup.py and directly install the file.\n\n python3.7 setup.py install\n\nCommands\n--------\n\n- -h or \u2013help: See the help message\n- -bd or \u2013begindate: Set the begin date in \u2018Y-M-D\u2019 format. ex:\n 2018-12-26, 2018-06-19. The default is 2018-12-26.\n- -ed or \u2013enddate: Set the end date. The default is 2018-12-26.\n- -p or \u2013page: Out of all news results, set the starting page. Default\n is 1.\n- -max\\_page or \u2013max\\_page: Out of all news results, set the end page.\n Default is 5.\n- -c or \u2013csv: Save the scraped results to CSV file.\n- -d or \u2013dump: Show the scraped results in console.\n\nExample\n-------\n\n- The default setting is that the output is stored in .json format.\n The name of the file is \u2018start date\\_news\\_scrape\\_end date\u2019. ex:\n 20181226\\_news\\_scrape\\_20191226.json\n\n- Below scrapes the news results querying \u2018\ub3c5\ub3c4\u2019(Dokdo Island) for two\n days: 2018-12-26 and 2018-12-27\n\n\n\n navernewscrawler \ub3c5\ub3c4 -bd 2018-12-26 -ed 2018-12-27 -p 1 -max_page 3 \n\n- Below scrapes the news results querying \u2018\ub3c5\ub3c4\u2019(Dokdo Island) for one\n day: 2018-12-26, and stores the results to CSV file.\n\n\n\n navernewscrawler \ub3c5\ub3c4 -bd 2018-12-26 -ed 2018-12-26 -p 1 -max_page 3 -c\n\nResults\n-------\n\n [{'title': '\uc6b0\ub9ac\ub098\ub77c \uad6d\ud68c\uc758\uc6d0\ub4e4, \uc77c\ubcf8\uc5d0\uc11c \ubcf4\ub0b8 \ub3c5\ub3c4 \ubc29\ubb38 \ud56d\uc758 \uc11c\ud55c \ubc18\uc1a1\ud574',\n 'date': '2018-12-26 ',\n 'company': 'YTN',\n 'text': '\uc9c0\ub09c 10\uc6d4 22\uc77c \ub3c5\ub3c4\ub97c \ubc29\ubb38\ud55c \uad6d\ud68c \uad50\uc721\uc704\uc6d0\ud68c \uc18c\uc18d \uc758\uc6d0\ub4e4\uc774 \uc77c\ubcf8 \uc790\ubbfc\ub2f9 \uc18c\uc18d \uc911\uc758\uc6d0 \ub4f1\uc774 \ubcf4\ub0b8 \ud56d\uc758 \uc11c\ud55c\uc744 \ub418\ub3cc\ub824\ubcf4\ub0b8 \uc0ac\uc2e4\uc774 \ub4a4\ub2a6\uac8c \uc54c\ub824\uc84c\ub2e4.\uc77c\ubcf8 \uc5b8\ub860\uc5d0 \ub530\ub974\uba74, \uc77c\ubcf8 \uc790\ubbfc\ub2f9 \uc18c\uc18d \uc911\uc758\uc6d0\uc774\uc790 \\'\uc77c\ubcf8 \uc601\ud1a0\ub97c \uc9c0\ud0a4\uae30 \uc704\ud574 \ud589\ub3d9\ud558\ub294 \uc758\uc6d0 \uc5f0\ub9f9\\' \uc18c\uc18d \uc2e0\ub3c4 \uc694\uc2dc\ud0c0\uce74 \uc758\uc6d0\uc740 25\uc77c \uae30\uc790\ud68c\uacac\uc744 \uc5f4\uc5b4 \ud55c\uad6d \uad6d\ud68c\uc758\uc6d0\ub4e4\uc774 \ubc18\uc1a1\ud55c \uc11c\ud55c\uc744 \uacf5\uac1c\ud588\ub2e4.\\'\uc77c\ubcf8 \uc601\ud1a0\ub97c \uc9c0\ud0a4\uae30 \uc704\ud55c \uc758\uc6d0 \uc5f0\ub9f9\\'\uc740 \"\ub3c5\ub3c4\uac00 \ud55c\uad6d \ub545\uc778 \uadfc\uac70\ub97c \ub300\ub77c\"\ub294 \ub0b4\uc6a9\uc744 \ub2f4\uc740 \ud56d\uc758\uc11c\ud55c 13\ud1b5 \uc911 10\ud1b5\uc740 \ub72f\uc5b4\uc9c4 \ucc44\ub85c, \ub098\uba38\uc9c0\ub294 \ubd09\ud22c \uc5c6\uc774, \ub2e4\ub978 \ud55c \ud1b5\uc740 \ubc18\uc1a1\ub418\uc9c0 \uc54a\uc558\ub2e4\uace0 \ubc1d\ud614\ub2e4.\uc9c0\ub09c 10\uc6d4, \ub3c5\ub3c4\ub97c \ubc29\ubb38\ud588\ub358 \uad6d\ud68c \uad50\uc721\uc758\uc6d0\uc774\uc5c8\ub358 \uc774\ucc2c\uc5f4 \ubc14\ub978\ubbf8\ub798\ub2f9 \uc758\uc6d0\uc740 \ub3c5\ub3c4 \ubc29\ubb38 \ud6c4\uc5d0 \uc774\ubbf8 \ud56d\uc758 \uc11c\ud55c\uc744 \ubc1b\uc9c0 \uc54a\uaca0\ub2e4\ub294 \uc758\uc9c0\ub97c \ud45c\uba85\ud55c \ubc14 \uc788\ub2e4. \uc774\ucc2c\uc5f4 \uc758\uc6d0\uc740 \ub3c5\ub3c4\uac00 \uc6b0\ub9ac \ub545\uc778 \uadfc\uac70\ub97c \ub300\ub77c\ub294 \uc9c8\ubb38\uc5d0 \ub300\ud574 \"\ub2f5\ubcc0\ud560 \uc774\uc720\uac00 \uc5c6\ub2e4\"\uace0 \uc798\ub77c \ub9d0\ud558\uae30\ub3c4 \ud588\ub2e4.\uc774\ucc2c\uc5f4 \uc758\uc6d0\uc740 CBS\uc640\uc758 \uc778\ud130\ubdf0\uc5d0\uc11c \"(\ubc18\ub300\ub85c) \ub2f9\uc2e0\ub4e4\uc774 \ub3c5\ub3c4\uac00 \uc77c\ubcf8 \ub545\uc774\ub77c\uace0 \uc8fc\uc7a5\ud558\ub294 \uadfc\uac70\ub97c \ub300\ubd10\ub77c\"\ub77c\uba70 \"\uc77c\ubcf8\uc774 \uad70\uad6d\uc8fc\uc758\uc758 \uc57c\uc2ec\ub9cc \ub4dc\ub7ec\ub0b4\uace0 \uc788\ub2e4\"\uace0 \ub9d0\ud55c \ubc14 \uc788\ub2e4.[\uc0ac\uc9c4 = \uc77c\ubcf8 \uc601\ud1a0\ub97c \uc9c0\ud0a4\uae30 \uc704\ud574 \ud589\ub3d9\ud558\ub294 \uc758\uc6d0 \uc5f0\ub9f9, \uc774\ucc2c\uc5f4 \uc758\uc6d0 \ud2b8\uc704\ud130]YTN PLUS \ucd5c\uac00\uc601 \uae30\uc790 (weeping07@ytnplus.co.kr) \u25b6 24\uc2dc\uac04 \uc2e4\uc2dc\uac04 \ub274\uc2a4 \uc0dd\ubc29\uc1a1 \ubcf4\uae30 \u25b6 \ub124\uc774\ubc84 \uba54\uc778\uc5d0\uc11c YTN\uc744 \uad6c\ub3c5\ud574\uc8fc\uc138\uc694! [\uc800\uc791\uad8c\uc790(c) YTN & YTN PLUS \ubb34\ub2e8\uc804\uc7ac \ubc0f \uc7ac\ubc30\ud3ec \uae08\uc9c0]'},\n {'title': \"\u65e5\uc758\uc6d0\uc774 \u97d3\uc758\uc6d0\uc5d0 \ubcf4\ub0b8 '\ub3c5\ub3c4 \uc601\uc720\uad8c' \uc9c8\ubb38\uc11c \ubc18\uc1a1\",\n 'date': '2018-12-26 ',\n 'company': '\uc5f0\ud569\ub274\uc2a4',\n 'text': '(\ub3c4\ucfc4=\uc5f0\ud569\ub274\uc2a4) \uae40\uc815\uc120 \ud2b9\ud30c\uc6d0 = \uc77c\ubcf8 \uc5ec\uc57c \uc758\uc6d0\ub4e4\ub85c \uad6c\uc131\ub41c \ubaa8\uc784\uc774 \uc9c0\ub09c 10\uc6d4 \ub3c5\ub3c4\ub97c \ubc29\ubb38\ud55c \uc6b0\ub9ac\ub098\ub77c \uad6d\ud68c\uc758\uc6d0\ub4e4\uc5d0\uac8c \ud55c\uad6d\uc758 \ub3c5\ub3c4 \uc601\uc720\uad8c \uc8fc\uc7a5 \uadfc\uac70\ub97c \uc81c\uc2dc\ud558\ub77c\uba70 \ubcf4\ub0c8\ub358 \uacf5\uac1c\uc9c8\ubb38\uc11c\uac00 \ubc18\uc1a1\ub41c \uac83\uc73c\ub85c \ub098\ud0c0\ub0ac\ub2e4. 26\uc77c NHK \ub4f1\uc5d0 \ub530\ub974\uba74 \\'\uc77c\ubcf8 \uc601\ud1a0\ub97c \uc9c0\ud0a4\uae30 \uc704\ud574 \ud589\ub3d9\ud558\ub294 \uc758\uc6d0\uc5f0\ub9f9\\'(\uc774\ud558 \uc758\uc6d0\uc5f0\ub9f9)\uc758 \uc2e0\ub3c4 \uc694\uc2dc\ud0c0\uce74(\u65b0\u85e4\u7fa9\u5b5d\u00b7\uc790\ubbfc\ub2f9) \ud68c\uc7a5\uc740 \uc804\ub0a0 \uae30\uc790\ud68c\uacac\uc5d0\uc11c \uc9c0\ub09c\ub2ec \ubc1c\uc1a1\ud55c \uc9c8\ubb38\uc11c\uac00 \uadf8\ub300\ub85c \ubc18\uc1a1\ub410\ub2e4\uace0 \ubc1d\ud614\ub2e4. \uc758\uc6d0\uc5f0\ub9f9\uc740 \uc9c0\ub09c 10\uc6d4 22\uc77c \ud55c\uad6d\uc758 \uad6d\ud68c \uad50\uc721\uc704\uc6d0\ud68c \uc18c\uc18d \uc758\uc6d0\ub4e4\uc774 \ub3c5\ub3c4\ub97c \ubc29\ubb38\ud558\uc790 \ub2e4\uc74c \ub2ec \uc774\ub97c \uc6a9\ub0a9\ud560 \uc218 \uc5c6\ub2e4\uba70 \ud55c\uad6d \uce21\uc758 \uc601\uc720\uad8c \uadfc\uac70 \ub4f1\uc744 \uc81c\uc2dc\ud558\ub77c\ub294 \uc9c8\ubb38\uc11c\ub97c \ubcf4\ub0c8\ub2e4. \\'\ub3c5\ub3c4\ub294 \uc6b0\ub9ac\ub545\\'(\uc11c\uc6b8=\uc5f0\ud569\ub274\uc2a4) \uae40\uc8fc\uc131 \uae30\uc790 = \uc77c\ubcf8 \uc2dc\ub9c8\ub124(\u5cf6\u6839)\ud604\uc774 \\'\ub2e4\ucf00\uc2dc\ub9c8(\u7af9\u5cf6\u00b7\uc77c\ubcf8\uc774 \uc8fc\uc7a5\ud558\ub294 \ub3c5\ub3c4 \uba85\uce6d)\uc758 \ub0a0\\' \ud589\uc0ac\ub97c \uc8fc\ucd5c\ud55c 2017\ub144 2\uc6d4 22\uc77c \uc624\ud6c4 \uc11c\uc6b8 \uc885\ub85c\uad6c \uc8fc\ud55c\uc77c\ubcf8\ub300\uc0ac\uad00 \uc61b\ud130 \uc55e\uc5d0\uc11c \ub098\ub77c\uc0b4\ub9ac\uae30\uad6d\ubbfc\uc6b4\ub3d9\ubcf8\ubd80 \ucc38\uac00 \ud559\uc0dd\ub4e4\uc774 \uc77c\ubcf8\uc758 \ub3c5\ub3c4 \uce68\ud0c8 \uc57c\uc695\uc744 \uaddc\ud0c4\ud55c \ub4a4 \ub9cc\uc138\uc0bc\ucc3d\uc744 \ud558\uace0 \uc788\ub2e4. 2017.2.22 utzza@yna.co.kr \uc758\uc6d0\uc5f0\ub9f9\uc740 \ud55c\uad6d \uad6d\ud68c\uc758\uc6d0 13\uba85\uc5d0\uac8c \uc9c8\ubb38\uc11c\ub97c \ubcf4\ub0c8\uc9c0\ub9cc 12\ud1b5\uc774 \ubc18\uc1a1\ub410\ub2e4\uace0 \uc0b0\ucf00\uc774\uc2e0\ubb38\uc740 \uc804\ud588\ub2e4. \uc2e0\ub3c4 \ud68c\uc7a5\uc740 \uae30\uc790\ud68c\uacac\uc5d0\uc11c \uc9c8\ubb38\uc11c\uac00 \ubc18\uc1a1\ub41c \uac83\uc5d0 \ub300\ud574 \"\ub9e4\uc6b0 \uc720\uac10\"\uc774\ub77c\uba70 \"\ub3c5\uc120\uc801 \ud589\ub3d9\ubc16\uc5d0 \ud558\uc9c0 \uc54a\ub294 \uad6d\uac00\uc758 \ubbf8\ub798\ub294 \ub9e4\uc6b0 \uac71\uc815\uc2a4\ub7fd\ub2e4\"\uace0 \uc8fc\uc7a5\ud588\ub2e4. \uc2e0\ub3c4 \uc758\uc6d0\uc740 \"\ud55c\uc77c\uad00\uacc4\ub294 \ub2e4\ucf00\uc2dc\ub9c8(\u7af9\u5cf6\u00b7\uc77c\ubcf8\uc774 \uc8fc\uc7a5\ud558\ub294 \ub3c5\ub3c4\uc758 \uba85\uce6d) \ubb38\uc81c\uac00 \uadfc\uc6d0\uc5d0 \ubc15\ud600 \uc788\uc5b4 \uc774\uac83\uc774 \ube60\uc9c0\uc9c0 \uc54a\ub294 \ud55c \uc9c4\uc815\ud55c \uc2e0\ub8b0\ub85c\ub294 \uc774\uc5b4\uc9c0\uc9c0 \uc54a\uc744 \uac83\"\uc774\ub77c\uace0 \ub9d0\ud588\ub2e4\uace0 \ubc29\uc1a1\uc740 \ub367\ubd99\uc600\ub2e4. jsk@yna.co.kr\u25b6\ubb50 \ud558\uace0 \ub180\uae4c? #\ud765 \u25b6\uc1fc\ubbf8\ub354\ub274\uc2a4! \uc624\ub298 \ub9ce\uc774 \ubcf8 \ub274\uc2a4\uc601\uc0c1 \u25b6\ub124\uc774\ubc84 \ud648\uc5d0\uc11c [\uc5f0\ud569\ub274\uc2a4] \ucc44\ub110 \uad6c\ub3c5\ud558\uae30'}]\n\n\n",
"description_content_type": "text/markdown",
"docs_url": null,
"download_url": "",
"downloads": {
"last_day": -1,
"last_month": -1,
"last_week": -1
},
"home_page": "https://github.com/ehsong/navernewscrawler",
"keywords": "",
"license": "MIT",
"maintainer": "",
"maintainer_email": "",
"name": "navernewscrawler",
"package_url": "https://pypi.org/project/navernewscrawler/",
"platform": "",
"project_url": "https://pypi.org/project/navernewscrawler/",
"project_urls": {
"Homepage": "https://github.com/ehsong/navernewscrawler"
},
"release_url": "https://pypi.org/project/navernewscrawler/0.0.3/",
"requires_dist": [
"bs4",
"requests"
],
"requires_python": "",
"summary": "Tool for crawling news on Naver",
"version": "0.0.3"
},
"last_serial": 4637635,
"releases": {
"0.0.1": [
{
"comment_text": "",
"digests": {
"md5": "1641104cb0c5574f29bd3c250f6c0e29",
"sha256": "25668292c5fc788170e2cbc9378bc4302508d957fae3d41352f823dd6c4581da"
},
"downloads": -1,
"filename": "navernewscrawler-0.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1641104cb0c5574f29bd3c250f6c0e29",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 5332,
"upload_time": "2018-12-27T10:11:53",
"url": "https://files.pythonhosted.org/packages/d7/4f/3bb5faf751047aaf8f51cf221042b3206f36bff8b3e798f68d846acc7561/navernewscrawler-0.0.1-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "b261893763e9601935800c29f7aeae65",
"sha256": "86db68db0022d37856b721ac9d910184dc6b9ba7fbccd73aaf4ae2a9f44bc662"
},
"downloads": -1,
"filename": "navernewscrawler-0.0.1.tar.gz",
"has_sig": false,
"md5_digest": "b261893763e9601935800c29f7aeae65",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 6364,
"upload_time": "2018-12-27T10:11:56",
"url": "https://files.pythonhosted.org/packages/27/c5/dcae0c025d70e7518c64a5f263a2898c54fbfe74b83fff7b77ca3deca045/navernewscrawler-0.0.1.tar.gz"
}
],
"0.0.2": [
{
"comment_text": "",
"digests": {
"md5": "61d4730ce89cefc8767fdfcc29cf1bfd",
"sha256": "4e675e1100ff2fd472e20d058198ba0c409124a05b4cf147f35fa1c85b51f995"
},
"downloads": -1,
"filename": "navernewscrawler-0.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "61d4730ce89cefc8767fdfcc29cf1bfd",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 9275,
"upload_time": "2018-12-27T10:35:22",
"url": "https://files.pythonhosted.org/packages/8e/2e/894b482240edd2a8edbb0db43e803ecceb847fe8367ece63bbcafe0c3a70/navernewscrawler-0.0.2-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "8e82055ab9183435e05056a6ac662683",
"sha256": "c11f0d3a7df3f5fd0b022eb5e4b0f6c1572785aae4918b408c9f8210a821956b"
},
"downloads": -1,
"filename": "navernewscrawler-0.0.2.tar.gz",
"has_sig": false,
"md5_digest": "8e82055ab9183435e05056a6ac662683",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 8769,
"upload_time": "2018-12-27T10:35:24",
"url": "https://files.pythonhosted.org/packages/83/ac/2e4b294fe39801c69b396d68015c0ccebc0d388926c560b477f4480121a0/navernewscrawler-0.0.2.tar.gz"
}
],
"0.0.3": [
{
"comment_text": "",
"digests": {
"md5": "02e56e8554665787d1e2ee57069928be",
"sha256": "702a0c54f03e95ce0448ac9ba967c01185fe85d101b92f3a789b3d897a235287"
},
"downloads": -1,
"filename": "navernewscrawler-0.0.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "02e56e8554665787d1e2ee57069928be",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 9275,
"upload_time": "2018-12-27T11:37:28",
"url": "https://files.pythonhosted.org/packages/bf/78/af02394898d2adaf32322b1ea18e07b0422377ac2056471c3b5270b90dbd/navernewscrawler-0.0.3-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "ad45c0bc473f423197be63234545e46b",
"sha256": "cf9110f692133da671d6f432e651aef4f453a398daf2130be92b0a3f95c17668"
},
"downloads": -1,
"filename": "navernewscrawler-0.0.3.tar.gz",
"has_sig": false,
"md5_digest": "ad45c0bc473f423197be63234545e46b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 8769,
"upload_time": "2018-12-27T11:37:31",
"url": "https://files.pythonhosted.org/packages/c9/74/f2a0b025cbd5fdc6f3b10904df52fa55e79e93fb0675ebc19ea24f78d8de/navernewscrawler-0.0.3.tar.gz"
}
]
},
"urls": [
{
"comment_text": "",
"digests": {
"md5": "02e56e8554665787d1e2ee57069928be",
"sha256": "702a0c54f03e95ce0448ac9ba967c01185fe85d101b92f3a789b3d897a235287"
},
"downloads": -1,
"filename": "navernewscrawler-0.0.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "02e56e8554665787d1e2ee57069928be",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 9275,
"upload_time": "2018-12-27T11:37:28",
"url": "https://files.pythonhosted.org/packages/bf/78/af02394898d2adaf32322b1ea18e07b0422377ac2056471c3b5270b90dbd/navernewscrawler-0.0.3-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "ad45c0bc473f423197be63234545e46b",
"sha256": "cf9110f692133da671d6f432e651aef4f453a398daf2130be92b0a3f95c17668"
},
"downloads": -1,
"filename": "navernewscrawler-0.0.3.tar.gz",
"has_sig": false,
"md5_digest": "ad45c0bc473f423197be63234545e46b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 8769,
"upload_time": "2018-12-27T11:37:31",
"url": "https://files.pythonhosted.org/packages/c9/74/f2a0b025cbd5fdc6f3b10904df52fa55e79e93fb0675ebc19ea24f78d8de/navernewscrawler-0.0.3.tar.gz"
}
]
}