{
"info": {
"author": "Michael Palumbo",
"author_email": "michael.palumbo87@gmail.com",
"bugtrack_url": null,
"classifiers": [
"Development Status :: 5 - Production/Stable",
"Intended Audience :: Developers",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
"Programming Language :: Python",
"Topic :: Software Development :: Libraries :: Python Modules",
"Topic :: Utilities"
],
"description": "=====================================================================\nDataSource - A simple wrapper for fetching data no matter where it is\n=====================================================================\n\n.. image:: https://secure.travis-ci.org/YAmikep/datasource.png\n :target: https://travis-ci.org/YAmikep/datasource\n\n.. image:: https://coveralls.io/repos/YAmikep/datasource/badge.png\n :target: https://coveralls.io/r/YAmikep/datasource \n\n.. image:: https://pypip.in/v/datasource/badge.png\n :target: https://crate.io/packages/datasource/\n\n.. image:: https://pypip.in/d/datasource/badge.png\n :target: https://crate.io/packages/datasource/\n\n\nDataSource is a simple wrapper for fetching data with a uniform API regardless how the data should be fetched.\n\nThe purpose of this library is to remove the pain of thinking of the required approach based on where the data is:\n\n- a file on disk\n- a remote file reachable via an URL\n- a string in memory\n- or even an iterable / a callable that generates data.\n\nIt takes care of fetching the data and provides \"readers\" (file-like objects) to actually get the data.\nIt can handle huge amount of data while keeping a low memory footprint. (See the section \"Working with huge amount of data\")\n\n\n\nDocs\n----\n\n.. http://datasource.readthedocs.org/en/latest/\n\n\n\nInstall\n-------\n\nFrom PyPI (stable)::\n\n pip install datasource\n\nFrom Github (unstable)::\n\n pip install git+git://github.com/YAmikep/datasource.git#egg=datasource\n\nDataSource currently requires the `requests `_ library to fetch URLs.\n\n\n\nMain API\n---------\n\n- ``datasource.DataSource(target, **kwargs)``: a DataSource object\n\nThe only mandatory argument is ``target`` which defines where the data is. It can be a string, a callable or an iterable.\n\nThere are a couple of possible keyword arguments:\n\n- ``is_file``: a boolean to ensure that the string target is actually considered as a filepath so that an error is raised if the file cannot be found. By default, a string target is considered as a raw string data if the file does no exist.\n\n\nThe following four parameters are useful only when the data is not locally available, i.e. has to be downloaded or generated.\n\n- ``preload``: a boolean to trigger data loading when the object is created (Default: False). A DataSource object is lazy by default.\n- ``max_memory``: the maximum size of the data to store in memory. This threshold triggers the creation of a temporary file. (Default: 5MB)\n- ``buffer_size``: the buffer size (Default: 512 KB)\n- ``dir_tmp``: the directory where to create temporary files\n\n\n\nDataSource objects\n------------------\n\nA ``DataSource`` object has a simple API:\n\n- ``is_loaded`` property: tells whether the source is loaded, meaning that the data is available locally\n- ``load(self)``: will load the data if it has not been done yet\n- ``size(self, force_load=False)``: the size of the data. If the data is not loaded yet, the size is 0, it will not load it unless you set ``force_load`` to True to make sure it is loaded before it returns the size.\n- ``get_reader(self)``: returns a \"reader\" which is a file-like object from which you will actually get the data with the read method.\n\n\nA DataSource object is lazy by default (preload=False). For example, when providing a URL, the data will not be downloaded when the object is created but when the data is first needed, which is to say when calling get_reader().\n\n\nWorking with huge amount of data will not make you run out of memory\n--------------------------------------------------------------------\nWhen using an in-memory string or a file, the data is already stored locally so there is no extra work to fetch and store the data.\nHowever, when fetching a remote file or using a generator, you do not know the size of the data.\nThe data is by default downloaded into memory for a performance concern. However, to avoid running out of memory because of a huge amount of data, you can tell the maximum memory to use (max_memory kwarg) so that a temporary file will automatically be created when reaching this threshold (default: 5 MB), freeing the memory. This features ensures that you can keep control of your memory while fetching data of unknown size. The temporary file is managed behind the scene and will be deleted when the DataSource object is deleted.\n\nYou can also have control over the buffer (buffer_size kwarg, default: 512 KB) and the directory where temporary files should be created (dir_tmp kwarg, default: the default temporary folder of the OS).\n\nNote that a DataSource object is lazy so, unless you set preload to True at the creation of the object, the data will actually be fetched and stored only when you call get_reader for the first time.\n\n\nUsage and examples\n------------------\n\n.. code:: python\n\n >>> import datasource as ds\n\n # Directly an in-memory string\n >>> data = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'\n >>> source = ds.DataSource(data)\n >>> print source.is_loaded, source.size()\n True 26\n >>> reader = source.get_reader()\n >>> print reader.read()\n ABCDEFGHIJKLMNOPQRSTUVWXYZ\n\n # A string as a filepath: absolute path, relative path or path defined with \"file://\"\n >>> filepath = 'tests/data/afile.txt'\n >>> source = ds.DataSource(filepath)\n >>> print source.is_loaded, source.size()\n True 26\n >>> reader = source.get_reader()\n >>> print reader.read()\n ABCDEFGHIJKLMNOPQRSTUVWXYZ\n\n # A string as a filepath: use is_file to make sure it is considered as a file to raise an Error if the file does not exist\n >>> filepath = 'file_does_not_exist.txt'\n >>> try:\n ... source = ds.DataSource(filepath, is_file=True)\n ... except Exception as e:\n ... print e\n File not found: file_does_not_exist.txt\n\n # A callable\n >>> f = lambda: (chr(c) for c in xrange(65, 91))\n >>> callable(f)\n True\n >>> source = ds.DataSource(f)\n >>> print source.is_loaded, source.size() # A DataSource is lazy by default so it is not loaded yet\n False 0\n >>> reader = source.get_reader() # get_reader triggers data loading\n >>> print source.is_loaded, source.size()\n True 26\n >>> print reader.read()\n ABCDEFGHIJKLMNOPQRSTUVWXYZ\n >>> source = ds.DataSource(f, preload=True) # Set preload to True to load the data at the creation\n >>> print source.is_loaded, source.size()\n True 26\n\n # A generator\n >>> gen = (chr(c) for c in xrange(65, 91))\n >>> type(gen)\n \n >>> source = ds.DataSource(gen)\n >>> print source.size(force_load=True), source.is_loaded # A DataSource is lazy so use force_load to make sure it is loaded\n 26 True\n >>> reader = source.get_reader()\n >>> print source.is_loaded, source.size()\n True 26\n >>> print reader.read()\n ABCDEFGHIJKLMNOPQRSTUVWXYZ\n\n # An URL\n >>> url = 'https://bitbucket.org/YAmikep/datasource/raw/master/tests/data/afile.txt'\n >>> source = ds.DataSource(url)\n >>> print source.is_loaded, source.size() # A DataSource is lazy by default\n False 0\n >>> reader = source.get_reader() # get_reader triggers data loading\n >>> print source.is_loaded, source.size()\n True 26\n >>> print reader.read()\n ABCDEFGHIJKLMNOPQRSTUVWXYZ\n\n\n\nContribute\n----------\n\nClone and install testing dependencies::\n\n $ python setup.py develop \n $ pip install -r requirements_tests.txt\n\nEnsure tests pass::\n\n $ ./scripts/runtests.sh\n\nIf the result displays ``Ran 0 tests``, ran the below on a directory and look at the output::\n\n $ nosetests -vv --collect-only\n\nIf Nose skips executable files, you'll need to change the file mode to non-executable. On Mac OS X or Linux, this can be accomplished with::\n\n $ chmod 644 *.py\n\nIf you cannot change the file mode (e.g.: vagrant shared folder with Windows), you can run Nose with the --exe option to \"look for tests in python modules that are executable\" (from ``man nosetests``)::\n\n $ ./scripts/runtests.sh --exe\n\nOr you can try using tox::\n\n $ tox",
"description_content_type": null,
"docs_url": null,
"download_url": "UNKNOWN",
"downloads": {
"last_day": -1,
"last_month": -1,
"last_week": -1
},
"home_page": "https://bitbucket.org/YAmikep/datasource",
"keywords": null,
"license": "Copyright (C) 2013, Michael Palumbo\n\nPermission is hereby granted, free of charge, to any person obtaining a copy of\nthis software and associated documentation files (the \"Software\"), to deal in\nthe Software without restriction, including without limitation the rights to\nuse, copy, modify, merge, publish, distribute, sublicense, and/or sell copies\nof the Software, and to permit persons to whom the Software is furnished to do\nso, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.",
"maintainer": null,
"maintainer_email": null,
"name": "datasource",
"package_url": "https://pypi.org/project/datasource/",
"platform": "UNKNOWN",
"project_url": "https://pypi.org/project/datasource/",
"project_urls": {
"Download": "UNKNOWN",
"Homepage": "https://bitbucket.org/YAmikep/datasource"
},
"release_url": "https://pypi.org/project/datasource/0.2.0/",
"requires_dist": null,
"requires_python": null,
"summary": "DataSource is a simple wrapper for fetching data with a uniform API regardless how the data should be fetched.",
"version": "0.2.0"
},
"last_serial": 1050658,
"releases": {
"0.1.0": [
{
"comment_text": "",
"digests": {
"md5": "d33f067722cbad795e09f98237612bf4",
"sha256": "78ffb4f7d4c44e815a8acd85e2f234a7222a685697bc1c044e5c13efec8e9e74"
},
"downloads": -1,
"filename": "datasource-0.1.0.zip",
"has_sig": false,
"md5_digest": "d33f067722cbad795e09f98237612bf4",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 24231,
"upload_time": "2013-08-17T14:18:05",
"url": "https://files.pythonhosted.org/packages/02/b4/f919ad2f271f725afc4d99979dba852ebe7782eba973c4dad7c2d7c425e2/datasource-0.1.0.zip"
}
],
"0.2.0": [
{
"comment_text": "",
"digests": {
"md5": "5f3745077aff22c122b17deeb5e39c15",
"sha256": "57c4fc4bb419c2a00fe16652c0c442d110fb204ba2999de44e208a6f5f9ed67d"
},
"downloads": -1,
"filename": "datasource-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "5f3745077aff22c122b17deeb5e39c15",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 13592,
"upload_time": "2014-04-03T19:57:18",
"url": "https://files.pythonhosted.org/packages/2b/c0/eb72a8f9df5f507d5c60672d78adb516c107bf84baaa72e30b994aba9b08/datasource-0.2.0.tar.gz"
}
]
},
"urls": [
{
"comment_text": "",
"digests": {
"md5": "5f3745077aff22c122b17deeb5e39c15",
"sha256": "57c4fc4bb419c2a00fe16652c0c442d110fb204ba2999de44e208a6f5f9ed67d"
},
"downloads": -1,
"filename": "datasource-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "5f3745077aff22c122b17deeb5e39c15",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 13592,
"upload_time": "2014-04-03T19:57:18",
"url": "https://files.pythonhosted.org/packages/2b/c0/eb72a8f9df5f507d5c60672d78adb516c107bf84baaa72e30b994aba9b08/datasource-0.2.0.tar.gz"
}
]
}