{ "info": { "author": "Batista Harahap", "author_email": "batista@bango29.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Programming Language :: Python", "Programming Language :: Python :: 2.7", "Topic :: Text Processing :: Linguistic" ], "description": "# WordGraph\n\nThis is a word grapher made with [NLTK](http://www.nltk.org) and [TextBlob](http://textblob.readthedocs.org/).\n\nThe primary purpose of this module is to create a graph connection from a collection of words and documents. Each connection together with corresponding words will have its own weight relative to the whole set of documents.\n\nWeighting is performed by using [TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) (Term Frequency - Inverse Document Frequency) algorithm. The algorithm basically figures out which words or set of words with most frequencies relative to the document itself and a set of documents.\n\n## Installation\n\n```bash\n$ sudo pip install wordgrapher\n$ python\n>> import nltk\n>> nltk.download()\n```\n\nPlease download the stopwords module for NLTK in order to have better accuracy.\n\nIf you need Indonesian stopwords, you can download from https://github.com/pebbie/pebahasa/blob/master/indonesian like so:\n\n```bash\n$ wget https://raw.github.com/pebbie/pebahasa/master/indonesian\n$ mkdir -p ~/nltk_data/corpora/stopwords/\n$ mv indonesian ~/nltk_data/corpora/stopwords/\n```\n\n## Examples\n\n```python\nfrom word.grapher import WordGrapher\nimport pickle\nimport os\nimport time\n\n\ndoc = \"\"\"Museum jakarta, banyak peninggalan Zaman dulu, trus barang2nya udah Tua dan rapuh.. banyak cerita d museum ini\n tentang kota jakarta..di museum ini seringkali foto-foto karena tempatnya bersejarah bgt jadi harus di\n abadikan. Kayaknya museum sejarah jakarta menjadi spot 'penting' belakangan ini. tiap weekend pasti PENUH sama\n orang-orang yang mau foto-foto. gue sampe sempet ngantri cuma buat foto gedung doang.\"\"\"\ndocs = [\n \"\"\"museumnya bagus.., kalo lagi liburan rame banget, ga cuma wisatawan dari Indonesia, tapi dari luar\n negeri juga... luas banget lagi di dalemnya.... seru..!! hehehehe\"\"\"\n\n \"\"\"mau weekend tanpa mesti ngabisin duit ya dateng aja ke sini... kalo yang hobi fotografi juga banyak spot2 yang\n menarik untuk difoto disini... kalo pengen moto arsitekturnya disarankan dateng pagi2 buta soalnya kalo udah agak\n siangan pasti bakalan rame banget\"\"\",\n\n \"\"\"ini dia yang jarang dilakukan anak muda, habis hang out bareng teman2 langsung cabut ke mall atau main ke rumah\n teman. cobain sensasi berbeda dengan datang ke museum apalagi sebentar lagi ada konser avril. pasti seru banget\n ngajakin si doi idola kita main ke situ, supaya lebih tau juga gimana sejarah kota yang tempati untuk konser\n itu. :)\"\"\",\n\n \"\"\"weekend seru juga kalau berkunjung ke museum sejarah jakarta, selain karena di halamannya biasanya ramai\n orang-orang dan kita bisa hunting foto, di museum ini juga kalau weekend ada pertunjukkan mystery of batavia..\n tambah asyik deh kalau jalan-jalan ke sini :)\"\"\",\n\n \"\"\"Terletak di kawasan kota tua tepatnya dengan luas lebih dari 1.300 meter persegi. Dahulu gedung ini merupakan\n balai kota lalu diresmikan menjadi Museum Fatahillah pada 30 Maret 1974. Kita dapat menemui berbagai objek di\n museum ini seperti perjalanan sejarah Jakarta, replika peninggalan masa Tarumanegara dan Pajajaran, hasil\n penggalian arkeologi, koleksi tentang kebudayaan Betawi, numismatic, dan becak. Ada juga patung Dewa Hermes dan\n meriam Si Jagur yang dianggap mempunyai kekuatan magis serta bekas penjara bawah tanah yang dulu sempat digunakan\n pada zaman penjajahan Belanda. Benar-benar museum yang patut dikunjungi, kita bisa hunting foto ataupun sekedar\n menambah pengetahuan, apalagi dengan arsitekturnya yang klasik bergaya Belanda, akan menciptakan aura yang berbeda\n ketika kita masuk ke dalamnya.\"\"\",\n\n \"\"\"Merupakan aset bersejarah kebanggan Jakarta. Berada di komplek Kota Tua yang terkenal indah banget, museum ini\n juga jadi salah satu daya tarik banyak pengunjung, ngga cuma lokal tapi juga mancanegara. Pas masuk ke dalamnya,\n atmosfernya langsung beda karena gaya arsitektur Belanda yang masih dipertahankan. Kalau mau keliling bisa\n ditemenin sama guide ataupun bisa juga sendiri. Mau wisata pendidikan sejarah Jakarta sekaligus hunting foto\n semuanya lengkap di sini. :)\"\"\",\n\n \"\"\"Weekend pagi seru juga datang kesini bisa hunting foto2 yang keren dari tempat ini. Arsitektur bangunan disini\n unik2 dan super jadul seru rasanya kesini sekali2 for a change from the bustling city view that we have every\n day.\"\"\",\n\n \"\"\"Dikawasan ini sering sekali dijadikan tempat untuk hunting foto modeling maupun prewedding. Hal tersebut\n dikarenakan setting tempat yang kuno dan klasik dan bernuansa jaman kolonial. Di tempat ini ada beberapa museum\n seperti museum fatahilah yang menceritakan sejarah jakarta dan museum wayang yang memamerkan semua koleksi wayang\n yang terdapat di Indonesia. Di Hall atau lapangan bagian tengah dari kawasan ini selalu ramai dikunjungi oleh\n masyarakat yang ingin bersantai dengan menyewa sepeda tua yang dapat digunakan untuk berkeliling dengan tarif\n perjam sebesar 20ribu saja. Jika kita lapar maka ada cafe Batavia yang selalu ramai dikunjungi. Kawasan kota tua\n ini juga sering dijadikan salah satu tempat yang wajib dikunjungi oleh orang asing yang sedang berkunjung ke\n jakarta jadi tidak heran terdapat banyak sekali bule yang berlalu-lalang disini. Bagian yang menjadi favorit gwa\n adalah saat malam hari terdapat wisata malam yang memanjakan mata. Di depan museum Fatahillah sering dibuat\n pertunjukan laser yang sangat indah dengan lampu-lampu sorot yang menawan. :)\"\"\"\n]\n\ndef pickle_get(filename):\n try:\n statinfo = os.stat(filename)\n\n if statinfo.st_size > 0:\n f = open(filename)\n result = pickle.load(f)\n f.close()\n\n return result\n\n raise OSError\n except OSError:\n return None\n\n\ndef pickle_set(filename, obj):\n f = open(filename, 'wb')\n pickle.dump(obj, f)\n f.close()\n\n_start = time.time()\n\nfn = \"wg.pickle\"\nwg = pickle_get(filename=fn)\nif wg is None:\n wg = WordGrapher()\n wg.set_documents(docs=docs)\n\n d = \"\"\n for item in docs:\n d = \"%s %s\" % (d, item)\n wg.set_document(doc=d)\n\n wg.analyze(count=1000, percentage=True)\n pickle_set(filename=fn, obj=wg)\n\ngraph = wg.graph(word=\"banget\")\nelapsed = time.time() - _start\n\nprint graph\n\nprint \"\\nElapsed Time: %.18f\" % (elapsed)\n```\n\n### Benchmark\n\nThe above code ran for __639 seconds__ on its initial run. Once the TF-IDF score is calculated and pickled, it should\ntake way faster to run with my laptop achieving __0.03xxxx second__ in subsequent runs. This is still 1 core only, now\nadding codes to let it run concurrently.\n\n## Development History\n\n__0.1.0__ - Basic TF-IDF Methods\n__0.2.0__ - Working graph method\n__0.3.0__ - Reworked graph method and added conpig for concurrency although yet to be implemented", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/tistaharahap/WordGraph", "keywords": "tf-idf nlp graph machine learning", "license": "The MIT License (MIT)\n\nCopyright (c) 2013 Batista Harahap\n\nPermission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.", "maintainer": null, "maintainer_email": null, "name": "wordgrapher", "package_url": "https://pypi.org/project/wordgrapher/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/wordgrapher/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/tistaharahap/WordGraph" }, "release_url": "https://pypi.org/project/wordgrapher/0.3.1/", "requires_dist": null, "requires_python": null, "summary": "Word Graph utility built with NLTK and TextBlob", "version": "0.3.1" }, "last_serial": 841780, "releases": { "0.1.1": [ { "comment_text": "built for Darwin-12.4.0", "digests": { "md5": "bd3178929326a4250a547d361e307f36", "sha256": "36eb9d55f2d70aab0819da87d24d2a29668431596696a1f3de6be41cef3ae026" }, "downloads": -1, "filename": "wordgrapher-0.1.1.macosx-10.8-intel.tar.gz", "has_sig": false, "md5_digest": "bd3178929326a4250a547d361e307f36", "packagetype": "bdist_dumb", "python_version": "any", "requires_python": null, "size": 7882, "upload_time": "2013-08-11T20:16:24", "url": "https://files.pythonhosted.org/packages/6a/23/d24eb0f7f43a724d9fa802447f33588cd32efb8db02d86eefe81931ccc76/wordgrapher-0.1.1.macosx-10.8-intel.tar.gz" }, { "comment_text": "", "digests": { "md5": "9a4b9a56b3e9497a75fa871bb6986d94", "sha256": "758e8573e25589f8c7d93ee23d5a4bdc94ca083a4179a0a32bb594252c63d1bc" }, "downloads": -1, "filename": "wordgrapher-0.1.1.tar.gz", "has_sig": false, "md5_digest": "9a4b9a56b3e9497a75fa871bb6986d94", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6199, "upload_time": "2013-08-11T20:16:21", "url": "https://files.pythonhosted.org/packages/d8/39/6acf9976af664a3b75973c4a16d6594363f7f5c387b8f2f0e5dde3d5eb93/wordgrapher-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "built for Darwin-12.4.0", "digests": { "md5": "0380aed1a572b8d7ce9479ada5f6d89f", "sha256": "6a1125ec90726602f398d75f8da6df9c0a86922862aab0ae7f454c800f01efee" }, "downloads": -1, "filename": "wordgrapher-0.1.2.macosx-10.8-intel.tar.gz", "has_sig": false, "md5_digest": "0380aed1a572b8d7ce9479ada5f6d89f", "packagetype": "bdist_dumb", "python_version": "any", "requires_python": null, "size": 7900, "upload_time": "2013-08-12T03:48:00", "url": "https://files.pythonhosted.org/packages/f9/87/b7007cd160603805475b72a249d8a138d8c9941ba5dbee179b6b07d25583/wordgrapher-0.1.2.macosx-10.8-intel.tar.gz" }, { "comment_text": "", "digests": { "md5": "94653ee8cbd73a22d61cbb17553c2a7c", "sha256": "42a9cc8be2314ad7c8ad32c26b876322aa4d0a364b857406ee5e9d2450b7cab2" }, "downloads": -1, "filename": "wordgrapher-0.1.2.tar.gz", "has_sig": false, "md5_digest": "94653ee8cbd73a22d61cbb17553c2a7c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6208, "upload_time": "2013-08-12T03:47:57", "url": "https://files.pythonhosted.org/packages/2d/6d/51b70a1dba21287131d8340880f2b61bbf861732f22d9e9c23e3ce47d0c5/wordgrapher-0.1.2.tar.gz" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "67740f01b63f9c09007f249cf179e72f", "sha256": "888948db614b0e30c471440e405e336cb624dce75d2bc1a9bf89977b91cbe1f7" }, "downloads": -1, "filename": "wordgrapher-0.2.0.tar.gz", "has_sig": false, "md5_digest": "67740f01b63f9c09007f249cf179e72f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6680, "upload_time": "2013-08-12T07:13:25", "url": "https://files.pythonhosted.org/packages/5c/66/9786749618e69873c3e142cbf164ab1fd0b7a55ab42b9d97389ca9b5e911/wordgrapher-0.2.0.tar.gz" } ], "0.3.0": [ { "comment_text": "", "digests": { "md5": "faf09ac5e84aadf5f2f0a440533b38a0", "sha256": "1a351985f7f0db337afce7e08ff36b5d60e3d3512f397ee7c12a562d4163ba60" }, "downloads": -1, "filename": "wordgrapher-0.3.0.tar.gz", "has_sig": false, "md5_digest": "faf09ac5e84aadf5f2f0a440533b38a0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8183, "upload_time": "2013-08-16T19:06:40", "url": "https://files.pythonhosted.org/packages/a2/ce/09745f1ba49e8d4eae3cd22985ebad8209eae8043caa1519dde17b2defed/wordgrapher-0.3.0.tar.gz" } ], "0.3.1": [ { "comment_text": "", "digests": { "md5": "354050515102ad5de2bfa6959fb9b5a3", "sha256": "b1422c397c9e5e083099a247217fa2ddb7c62220cf9d6a948fb4f8c0671c49b6" }, "downloads": -1, "filename": "wordgrapher-0.3.1.tar.gz", "has_sig": false, "md5_digest": "354050515102ad5de2bfa6959fb9b5a3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8543, "upload_time": "2013-08-16T19:12:20", "url": "https://files.pythonhosted.org/packages/84/74/ba8154229d1e1b23fc320351c737ae69bb0aa8abdac9ad98a71da4099332/wordgrapher-0.3.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "354050515102ad5de2bfa6959fb9b5a3", "sha256": "b1422c397c9e5e083099a247217fa2ddb7c62220cf9d6a948fb4f8c0671c49b6" }, "downloads": -1, "filename": "wordgrapher-0.3.1.tar.gz", "has_sig": false, "md5_digest": "354050515102ad5de2bfa6959fb9b5a3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8543, "upload_time": "2013-08-16T19:12:20", "url": "https://files.pythonhosted.org/packages/84/74/ba8154229d1e1b23fc320351c737ae69bb0aa8abdac9ad98a71da4099332/wordgrapher-0.3.1.tar.gz" } ] }