{ "info": { "author": "Emin Martinian", "author_email": "emin.martinian@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "Programming Language :: Python :: 3" ], "description": "Introduction\n============\n\nThe python ``oxtie`` package is a system for tying together data by\nsaving and loading data to various backends (especially backends in the\ncloud). It fits somewhere above low-level serialization/deserialization\ntools like pickle, json, msgpack, etc., but below a full fledged\ndatabase like postgres or MongoDB.\n\nThe goal of ``oxtie`` is to be a lightweight system for loading/saving\ndata in a backend agnostic way.\n\nInstallation\n------------\n\nYou can install in the usual way via something like\n\n.. code:: sh\n\n $ pip install oxtie\n\nExample Usage\n-------------\n\nThe main question ``oxtie`` tries to answer is \"which backend should I\nuse to store my python objects in an easy and efficient way?\". The\nanswer is \"You don't have to choose; use oxtie and you can easily change\nthe backend when you like.\"\n\nSimplest Example\n~~~~~~~~~~~~~~~~\n\nAs a simple example of how you can use ``oxtie``, consider the following\nexample:\n\n.. code:: python\n\n\n >>> from oxtie.backs import simple # Import some simple backends\n >>> tfbackend = simple.TempFileBackend() # Choose a temp file backend\n >>> from oxtie.fronts import base # Import front ends\n >>> class MyClass(base.Frontend): # Define your own class\n ... 'Simple sub-class of Frontend so it can save data'\n ... \n >>> f = MyClass(facets={'my_name': 'test_item'}, backend=tfbackend)\n >>> f.arbitrary_data = 'you can set whatever data you want'\n >>> f.save() # Backend will save\n\nSimply by inheriting from base.Frontend and choosing a backend, you can\ncall the save method of your object and it will save itself on the\ndesired backend. You could have used S3 or a file system or any other\nsupported backend simply by changing one line.\n\nYou can later load the object via something like:\n\n.. code:: python\n\n >>> g = tfbackend.load({'my_name': 'test_item'}, front_cls=MyClass)\n >>> f.arbitrary_data == g.arbitrary_data\n True\n\nThe key which ``oxtie`` uses to save/load is the ``facets`` dictionary.\nBy default, all keys in ``facets`` which do not start with an underscore\nare used but you can customize this as described later.\n\nAnother Example\n~~~~~~~~~~~~~~~\n\nYou can use ``oxtie`` to handle complicated data structures. To\nillustrate, imagine you want to save your Pandas DataFrame somehow.\nFirst, you would do the usual thing to import pandas and create a\nDataFrame:\n\n.. code:: python\n\n\n >>> import pandas # So we can make a dataframe.\n >>> data = {'estimate': [.17, None]} # Make example data\n >>> frame = pandas.DataFrame(data, index=['2017-07-01', '2017-10-01'])\n\nNext, you can create an instance of the ``SimpleFrame`` class from\n``oxtie`` to store your DataFrame as a temporary file via something\nlike:\n\n.. code:: python\n\n\n >>> from oxtie.fronts import specials # Illustrate special example\n >>> from oxtie.backs import simple # Import simple backends\n >>> backend=simple.TempFileBackend() # Choose a temp file backend\n >>> f = specials.nums.SimpleFrame({'name': 'test'},backend=backend,frame=frame)\n >>> f.arbitrary = 'You can also save arbitrary data in SimpleFrame'\n >>> f.save() # Now save the frame.\n\nYou could instead have done something like\n``from oxtie.backs import aws`` to get a different backend and use\n``backend = aws.S3Backend('prefix', 'bucket')`` to save to Amazon S3. Or\nif you prefer DynamoDB, you could use\n``backend = aws.DynamoBackend(table_name, key_fields={'primary':['name']})``\nto have your data saved to the table ``table_name`` with a primary key\n``primary`` built from the ``name`` facet instead. The key idea is that\n``oxtie`` handles dealing with the different backends so you don't have\nto worry about it or change your code (apart from choosing the backend\nof course).\n\nContinuing our example, once you want to load your data you can do the\nfollowing (possibly in another python session):\n\n.. code:: python\n\n >>> g = backend.load({'name': 'test'}, allow_load=True)\n >>> g.__class__.__name__\n 'SimpleFrame'\n >>> g.frame.to_csv() == f.frame.to_csv() # Compare CSVs to handle Nones\n True\n >>> g.arbitrary == f.arbitrary # Aribitrary properties also match\n True\n\nNote that in the above example we provide ``allow_load=True`` indicating\nthat the backend can dynamically figure out which of your python classes\nto load the data into. If you don't like dynamically loading, there are\nvarious ways to specify the class to load into.\n\nWhy Frontends?\n--------------\n\nYou may wonder why we need a special front-end class like\n``SimpleFrame``. If you really wanted to, you could just use the\nbackends by themselves as a more flexible version of ``pickle`` or a\nmore limited version of a database object relational manager (ORM). In\ngeneral, though, it is nice to have both a back-end protocol and a\nfront-end class so that you can more intelligently do things like\ncontrol serialization/deserialization, deal with headers, and so on.\n\nAs a very simple example of this, you could modify the previous example\nvia\n\n.. code:: python\n\n\n >>> facets = f.get_facets_dict()\n >>> facets['_timezone'] = 'US/Eastern'\n >>> f.save()\n\nThe dictionary returned by ``f.get_facets_dict()`` functions as a header\nto store simple \"facets\" (A \"facet\" is like an \"attribute\" but we use a\ndifferent word so as not to confuse oxtie \"facets\" with python\n\"attributes\"). One advantage these special facets have over the usual\npython properties is that you can do something like.\n\n.. code:: python\n\n\n >>> h = backend.load({'name': 'test'}, only_hdr=True)\n >>> print('name:tz = %s:%s' % (\n ... h['_facets']['name'], h['_facets']['_timezone']))\n name:tz = test:US/Eastern\n\nIn the above, we use the ``only_hdr=True`` option to ``backend.load`` to\nfirst load only the header. This is generally a much cheaper and faster\noperation than de-serializing and loading the full object. Among other\nthings, this header dictionary contains a ``'_facets`` key containing\nthe dictionary provided by ``get_facets_dict()``. As a result, we can\nlook at the header to see the timezone and do things like:\n\n1. Skip the full load for objects with the wrong timezone.\n2. Deserialize differently depending on things in the facets such as the\n timezone.\n\nIndeed, the ``SimpleFrame`` class does just that. If you load the full\nobject and print the frame, you will see that although we saved a pandas\nDataFrame with no timezone information, the ``SimpleFrame`` class looks\nfor the ``'_timezone'`` in the facets and localize appropriately on\nloading:\n\n::\n\n >>> print(backend.load({'name': 'test'},allow_load=True).frame.index[0])\n 2017-07-01 00:00:00-04:00\n\nKey Features\n------------\n\nThe ``oxtie`` package is designed to simplify loading and saving data\ninto different backends with the following key features:\n\n1. Built-in ``oxtie`` backends can easily store data out to local\n databases, local files, Amazon S3, Amazon DynamoDB, and other cloud\n providers.\n2. Support for serializing and storing Pandas DataFrame objects.\n3. Ability to easily implement your own backend storage to save/load\n to/from.\n4. Ability to separate backend from serialization.\n\n - You may want to *serialize* in a format like JSON, BSON, msgpack,\n CSV, pickle, etc., but *store* on backends like local files, S3,\n and so on. With ``oxtie`` you can mix and match these as you like.\n\n5. Ability to quickly load/scan header information before loading the\n full object so you can cheaply scan through stored data deciding\n what/how to load.\n6. Ability to build intelligent front ends to do validation or other\n actions on loading or saving.\n\nThese features may seem somewhat standard and generic so it may be worth\nemphasizing the one which most influenced creation of ``oxtie``: quickly\nload/scan header information. In the model of a modern major data system\nused by many organizations you generally want to do 3 things well:\n\n1. Save data.\n2. Get data for which you know the key.\n3. Look for a key based on some simple search criteria.\n\nIn the early days, SQL databases implemented #3 very well. You can could\nwrite arbitrarily complicated queries which let you use all kinds of\nconditions to find which data to load. Later, people realized that you\noften have binary large objects (BLOBs) which you don't need to search\nplus header information which you do want to search. This led to NoSQL\ndatabases which greatly improved performance at the cost of more limited\nsearch criteria. Since NoSQL was successful, cloud storage moved things\neven further providing even more limited search capability (e.g., Amazon\nS3 or Amazon DynamoDB).\n\nIn many cloud storage systems, you effectively divide your data into\nsome simple header information or facets such as a name, or last update\ntime, or region along with your non-searchable data. Your storage and\nsearch operations essentially only involve the header so that everything\ncan be efficient.\n\nAssuming you subscribe to this small header + big body paradigm, you may\nstart to realize that the simple save/load/scan methodology works on\nlots of different systems so why should you hard-code your application\nto depend on a particular backend? The answer is you shouldn't! Use\n``oxtie``!\n\nIdeally, one could even go a step further and define a simple language\nindependent protocol for search-load-scan (SLS) operations similar to\nthe SQL standard which could be implemented in various languages. With\nan SLS standard, you could flexibly save something like a Pandas\nDataFrame from python to S3 (or some other backend) and then load into\ninto the appropriate structure in some other language. The ``oxtie``\npackage is the first step on that path.\n\nFrequently Asked Questions (FAQ)\n================================\n\nDo We Need Yet Another Serialization System?\n--------------------------------------------\n\nUnfortunately, yes. If you try to naively store things like pandas\nDataFrames its easy to run into multiple issues:\n\n1. JSON doesn't support certain types (even simple things like datetime\n let alone DataFrames).\n2. JSON is not so efficient in terms of storage.\n3. You could use BSON but that has similar issues in type support.\n4. You could use msgpack but you need a little help to support things\n like pandas DataFrames. Even with msgpack, though, you probably may\n want additional features as described below.\n5. Most existing serialization formats don't provide support for\n automatic validation on deserialization.\n6. Often you want to serialize a time series or data table (e.g., a\n pandas DataFrame) but also include additional facets such as when the\n data was collected, the source of the data, and so on). This\n generally means you want a class that holds your time series along\n with some other facets.\n7. Sometimes you want to mix and match your serialization formats. For\n example, you may want to serialize an object in both msgpack and CSV\n formats or you may want the option to sometimes\n serializes/deserialize to/from one or the other.\n\nFor these reasons and more, it seems useful to have a lightweight \"cloud\nserialization\" system like ``oxtie`` to manage persisting data in a\nflexible way.\n\nWhat determines an objects key for saving/loading?\n--------------------------------------------------\n\nBy default, the facets dictionary provided to the ``Frontend`` class on\n``__init__`` is used to determine the object's key. By default, we use\nall the keys in the facets dict that do **NOT** start with an\nunderscore. This lets you easily set facets to contain both key\ninformation (i.e., those not starting with underscore) as well as other\nmeta-data which you do not want to be part of the key (i.e., by starting\nthe key with an underscore such as ``_timezone``).\n\nIf desired, you can override what the key is by inheriting from the\n``Frontend`` class in the ``oxtie.fronts.base`` module and overriding\neither ``get_key`` or preferably ``facets2key``.\n\nHow do you deal with special key requirements?\n----------------------------------------------\n\nSome back-ends like Amazon's DynamoDB have specific requirements for the\nkey. For example, DynamoDB requires a single primary key and an optional\nsecondary key. If you have a front-end with a facets dictionary with\nthree keys like ``{'k1': 'a', 'k2': 'b', 'k3': 'c'}`` you can define\nthings on the *backend* to translate the *frontend* key into the form\nthe backend can work with.\n\nIn the case of DynamoDB, we might have a *frontend* key like\n``{'k1': 'a', 'k2': 'b', 'k3': 'c'}``. The backend could be told that\n``k1`` and ``k2`` are part of the primary key called ``pkey`` and ``k3``\nis part of the secondary key called ``skey``. The backend would then\nproduce a *backend* key of the form ``{'pkey': 'a#b', 'skey': 'c'}`` by\njoining together the fields for ``pkey``. See the documentation for the\n``DynamoBackend`` class in the ``oxtie.backs.aws`` module for more\ndetails.\n\nBasically, the frontend key is always a simple dict with string keys and\nthe backend can then transform this further as required.\n", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://github.com/emin63/oxtie", "keywords": "nosql persistence database", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "oxtie", "package_url": "https://pypi.org/project/oxtie/", "platform": "", "project_url": "https://pypi.org/project/oxtie/", "project_urls": { "Homepage": "http://github.com/emin63/oxtie" }, "release_url": "https://pypi.org/project/oxtie/0.2.1/", "requires_dist": null, "requires_python": "", "summary": "Tools for saving NoSQL like data to various backends.", "version": "0.2.1" }, "last_serial": 3457528, "releases": { "0.1.1": [ { "comment_text": "", "digests": { "md5": "e077a7a067c2a90fa66bb7dd5b675947", "sha256": "b3207c29e909ed5f81c555cb28012e4638ddbae84b53025240e9320f1973ecdc" }, "downloads": -1, "filename": "oxtie-0.1.1.tar.gz", "has_sig": false, "md5_digest": "e077a7a067c2a90fa66bb7dd5b675947", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 22622, "upload_time": "2017-12-04T22:01:20", "url": "https://files.pythonhosted.org/packages/cf/8e/bf43b83bf3bd85435305b8d22684730fa45b383455bc6e952f5c95977506/oxtie-0.1.1.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "46f3900ac7c40861fa9c1bea1e1e8f42", "sha256": "29d0fa712f75d4fd373f0f4b4630dd7fd50a00138086b02c0c83c8987936b968" }, "downloads": -1, "filename": "oxtie-0.2.1.tar.gz", "has_sig": false, "md5_digest": "46f3900ac7c40861fa9c1bea1e1e8f42", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 25586, "upload_time": "2018-01-02T22:22:55", "url": "https://files.pythonhosted.org/packages/ee/6a/45f09bd5422ab37eb709071d4257b6918604dc36013c2f41c4b66471c6d6/oxtie-0.2.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "46f3900ac7c40861fa9c1bea1e1e8f42", "sha256": "29d0fa712f75d4fd373f0f4b4630dd7fd50a00138086b02c0c83c8987936b968" }, "downloads": -1, "filename": "oxtie-0.2.1.tar.gz", "has_sig": false, "md5_digest": "46f3900ac7c40861fa9c1bea1e1e8f42", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 25586, "upload_time": "2018-01-02T22:22:55", "url": "https://files.pythonhosted.org/packages/ee/6a/45f09bd5422ab37eb709071d4257b6918604dc36013c2f41c4b66471c6d6/oxtie-0.2.1.tar.gz" } ] }