{ "info": { "author": "Wouter Bolsterlee", "author_email": "uws@xs4all.nl", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "License :: OSI Approved :: BSD License", "Programming Language :: Python", "Programming Language :: Python :: 3", "Topic :: Database" ], "description": "====\nWhip\n====\n\nOverview\n========\n\n*Whip*, the who, what, where and when about IP address data.\n\nWhip provides a fast IP information lookup service that scales to large data\nsets. Whip can efficiently answer queries for a specific IP address, optionally\nrefined by a timestamp.\n\nFeature summary:\n\n* Fast lookups for IP addresses, optionally limited to a specific timestamp.\n This means that data that (slowly) changes over time can be queried at any\n known point in history.\n\n* Transparent support for both IPv4 and IPv6 addresses and ranges\n\n* Support for incremental loading. This means new data can be added to an\n existing database. This is useful to ingest weekly data set snapshots.\n\n* Efficient storage format (data stored as ranges, history as diffs)\n\n* Transparent support for gzip compressed input data the file name ends with\n ``.gz``\n\nWhip can handle almost any data set containing information about IP address\nranges. Whip does not limit the type of data that is associated to an IP address\nor IP address range, so it can be used for a variety of data sets, for instance\nIP geolocation data sets. Note that Whip itself does *not* come with any data\nsets.\n\nA source file should be seen as a 'snapshot' of an evolving data set at a\nspecific point in time. During the loading phase, Whip combines all snapshots\n(with different timestamps) and constructs the history of all records in a way\nthat can be queried efficiently. It does so by building an index of IP ranges\nand associating data to each range (e.g. geolocation tags), and keeping history\nfor each distinct range.\n\n\nInstallation\n============\n\nUse a virtualenv to install Whip and its dependencies. To install from a source\ntree::\n\n $ pip install -r requirements.txt\n\n\nThese are the current dependencies:\n\n* *Python* 3.3+ (no Python 2 support!)\n* *Plyvel* to access *LevelDB* from Python\n* *Flask* for the REST API\n* *Aaargh* for the command line tool\n* *UltraJSON (ujson)* for fast JSON encoding and decoding\n* *Msgpack* for Msgpack encoding and decoding\n\n\nUsage\n=====\n\nCommand line interface\n----------------------\n\nMost functionality is provided by the `whip-cli` command line tool. Detailed\nusage information::\n\n $ whip-cli --help\n\nTo load input data into ``my.db`` (creating it when necessary)::\n\n $ whip-cli --db my.db load input-file-1.json.gz input-file-2.json.gz\n\nTo serve the database over a REST API::\n\n $ whip-cli --db my.db serve\n\nThe REST API can also be deployed using WSGI e.g. using gunicorn/nginx.\n\nREST API\n--------\n\nThe REST API supports these queries:\n\n* To retrieve the latest record for an IP address::\n\n GET /ip/1.2.3.4\n\n* To obtain a specific record for an IP address::\n\n GET /ip/1.2.3.4?datetime=2013-05-15\n\n* To obtain the complete history for an IP address::\n\n GET /ip/1.2.3.4?datetime=all\n\nIn each case, the response will be an ``application/json`` encoded document,\neven if no hit was found, in which case the result will be an empty JSON\ndocument. HTTP status codes are only used to signify errors.\n\nInput data format\n-----------------\n\nThe source file format is simple: it's just a a text file with one\nJSON-formatted document on each line. Each document can contain arbitrary\ninformation about a range of IP addresses.\n\nWhip itself requires three fields:\n\n* ``begin``: begin of IP address range (inclusive), e.g. ``1.0.0.0``\n* ``end``: end of IP address range (inclusive), e.g. ``1.0.255.255``\n* ``datetime``: time stamp for this record, e.g. ``2013-02-28``\n\nIn addition to these fields, each document may contain arbitrary key/value\npairs, e.g. ``\"country\"``.\n\nAn input data file must follow these rules:\n\n* Each range in the file must span at least 1 IP address, but can span an\n arbitrary number of consecutive IP addresses (not limited by net block/CIDR),\n specified in the ``begin`` and ``end`` fields.\n\n* Ranges in a single input file must not overlap, but there may be gaps between\n ranges (in case no information is available for that range).\n\n* Ranges in a single input file must be sorted by IP address. IPv4 ranges must\n sort *before* IPv6 ranges, since Whip uses RFC3493 IPv4-mapped IPv6 addresses\n internally.\n\n* Timestamps, specified in ``datetime`` field, should be in ISO8601 format,\n since Whip depends on the lexicographic ordering of the input strings.\n\n* All timestamps in a single input file must be the same. Yes, this adds\n redundancy, but avoids the need for header lines or out-of-band metadata.\n\n.. warning::\n\n If the input data does not follow the above rules, bad things may happen,\n including database corruption.\n\nAn example input document looks like this (formatted on multiple lines for\nclarity)::\n\n {\n \"begin\": \"1.0.0.0\",\n \"end\": \"1.255.255.255\",\n \"datetime\": \"2013-02-28\",\n \"location\": \"Amsterdam\",\n \"some-other-data\": \"anything-you-like\"\n }\n\nA single input file with many of these documents looks like this::\n\n {\"begin\": \"1.0.0.0\", \"end\": \"1.255.255.255\", \"datetime\": \"2013-02-28\", \"...\": \"...\"}\n {\"begin\": \"2.0.0.0\", \"end\": \"2.255.255.255\", \"datetime\": \"2013-02-28\", \"...\": \"...\"}\n {\"begin\": \"11.0.0.0\", \"end\": \"11.0.255.255\", \"datetime\": \"2013-02-28\", \"...\": \"...\"}\n\nWhip can load many of these input files (e.g. weekly snapshots for a longer\nperiod of time) in a single loading pass.\n\n\nIdeas / TODO\n============\n\n* Perform range scan on in-memory structure instead seeking on a DB iterator.\n This means Whip must load all keys in memory on startup (in an `array.array`);\n use `bisect.bisect_right` to find the right entry, then simply `db.get()` for\n the actual value. To avoid scanning the complete database on startup, the\n keyspace should be split using a key prefix: one part keeps both the keys and\n values (full database), the other part only keeps the keys. The latter will be\n scanned and loaded into memory upon startup. For ~25000000 IPv4 addresses,\n keeping the index in memory only requires 100MB of RAM, and lookups would only\n issue `db.get()` for keys that are known to exist.\n\n Update: experiments on a big database containing most IP ranges in use show\n this is *not* any faster than doing the actual seek, since `it.seek()` takes\n just as long as `db.get()`. This means a lot of memory will be used to improve\n performance for non-hits (in which case no DB calls are made).\n\n* Try out LMDB instead of LevelDB\n\n* Pluggable storage backends (e.g. HBase)", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/wbolster/whip", "keywords": null, "license": "BSD", "maintainer": null, "maintainer_email": null, "name": "whip", "package_url": "https://pypi.org/project/whip/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/whip/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/wbolster/whip" }, "release_url": "https://pypi.org/project/whip/0.1/", "requires_dist": null, "requires_python": null, "summary": "Whip, the who, what, where and when about IP address data", "version": "0.1" }, "last_serial": 1394859, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "03fe9650cad2bb0ddee78bedf0f12065", "sha256": "b1938b7da96616ffcc7ff08b6322f7461feceaf3abbb6d83075841d1c0ba55b2" }, "downloads": -1, "filename": "whip-0.1.tar.gz", "has_sig": false, "md5_digest": "03fe9650cad2bb0ddee78bedf0f12065", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14145, "upload_time": "2015-01-24T16:38:42", "url": "https://files.pythonhosted.org/packages/1a/a6/a4cf9dbe303f317a020b9b6b6cf7aebbf773926cd970be6cffc21e00c88c/whip-0.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "03fe9650cad2bb0ddee78bedf0f12065", "sha256": "b1938b7da96616ffcc7ff08b6322f7461feceaf3abbb6d83075841d1c0ba55b2" }, "downloads": -1, "filename": "whip-0.1.tar.gz", "has_sig": false, "md5_digest": "03fe9650cad2bb0ddee78bedf0f12065", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14145, "upload_time": "2015-01-24T16:38:42", "url": "https://files.pythonhosted.org/packages/1a/a6/a4cf9dbe303f317a020b9b6b6cf7aebbf773926cd970be6cffc21e00c88c/whip-0.1.tar.gz" } ] }