{ "info": { "author": "The Scoota Engineering Team", "author_email": "engineering@scoota.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Environment :: Web Environment", "License :: OSI Approved :: Apache Software License", "Natural Language :: English", "Operating System :: OS Independent", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Programming Language :: Python :: 3 :: Only" ], "description": "# Events Delivery System\n\nThe **Events Delivery System (EDS)** library provides a suite of high-level Python utilities to\nwork with the *Google Cloud Platform*.\n\nThe main purpose of this library is to provide a unified interface to dealing with the same\nDomain across the range of GCP Python APIs.\n\n> **Supported Python versions**: 3.6, 3.7\n\n## Dimensional Modeling\n\nEDS is build around a dimensional model, a data design technique optimized for data querying.\nIt is particularly well suited for the querying of very large sorted data sets.\n\nThe dimensional model differs from a traditional entity-relationship model in its normalized form.\nDimensional models are denormalized to a 2nd-normal form, each fact table (the metrics) having a\none-to-many relationship with its context (the dimensions).\n\nThe dimensional model records a series of facts, surrounded by their contexts which are known\nto be true at the time of measurement.\n\nDimensions are of 3 types:\n\n* **Model Dimensions**: these reflect the operational entities defined in the _Domain Design_.\n For instance, a software dealing with the leasing of cars would probably define a `Car` entity,\n and some sort of `User` entity to record customers. These classes are typically Model Dimensions,\n or simply Dimensions (D). _They live in BigQuery, CloudSQL, and Datastore._\n* **Degenerate Dimensions**: a dimension that only exists alongside the fact is called a Degenerate\n Dimension (DD). For instance, upon returning a leased car to the station, its `State` can only be\n known when the car is returned. _These DDs only live in Datastore._\n* **Generated Dimensions**: these are static dimensions for which we generate the set of data.\n For instance, the `Time` or `Day` are deterministic and never change. _Generated Dimensions (GD)\n are pre-populated in BigQuery, CloudSQL, and Datastore._\n\n## Schema Definition\n\nThe dimensions and facts are defined using a YAML-formated `schema.yaml` file.\n\n> By default, this file is looked up at the root of the calling script.\n> This can be overridden by setting the `SCHEMA_YAML_PATH` environment variable.\n\nTo load the `Dimensions` and `Facts`:\n\n```python\nfrom eds.config import schema\n\nschema.DIMENSIONS\nschema.FACTS\n```\n\n### Dimensions\n\nThe dimensions are defined under the `dimensions` global section:\n\n```yaml\ndimensions:\n -\n name: DimensionKind\n key: dimension_id\n ancestor:\n kind: ParentKind\n lookup: \"{parent_id}\"\n relatives:\n - kind: RelativeKind\n lookup: \"{relative_id}\"\n - kind: SecondRelativeKind\n lookup: \"{second_relative_id}\"\n model_based: true\n key_field: dimension_key\n id_fields:\n - dimension_id\n index_fields:\n - dimension_id\n schema:\n - {name: dimension_key, type: INTEGER, mode: REQUIRED}\n - {name: dimension_id, type: INTEGER}\n - {name: dimension_name, type: STRING}\n - {name: archived, type: BOOLEAN}\n - {name: timestamp, type: TIMESTAMP}\n```\n\nA dimension has the following parameters:\n\n - `name` - The Datastore `Kind`.\n - `key` - The id used to uniquely identify the dimension. It is also used as a base to the various\n tables created in BigQuery and CloudSQL.\n - `ancestor` - To ensure strong consistency in Datastore, each `Kind` is created within an\n `EntityGroup`. This field defines this group's `Kind`, and how to locate it.\n The lookup field may be a sinple string (when the ancestor is unque), a simple lookup pattern\n (e.g.: `{property}`, to locate the entity based on the property value), and a multiple lookup\n pattern (e.g.: `{property_1},{property_2}`, to locate the entity based on multiple properties).\n __This parameter is optional__.\n - `relatives` - For dimensions that rely on other relations to exist, the dependents are defined\n as relatives. Each relative has a `kind` and `lookup` field, whose meaning is the same as for\n the `ancestor` lookup.\n __This parameter is optional__.\n - `model_based` - A flag that defines whether the dimension is based on an operational entity.\n Setting this parameter to `False` will flag the dimension as _Generated_.\n __This parameter is optional__.\n - `key_field` - The name of the dimension's surrogate key field.\n __This parameter is required for model-based dimensions__.\n - `id_fields` - The operational ID field of the dimension.\n __This parameter is required for model-based dimensions__.\n - `index_fields` - A list of dimension fields to index, both in Datastore and in CloudSQL.\n __This parameter is optional__.\n - `schema` - The fields of the dimension.\n __This parameter is required for model-based dimensions__.\n\n> Note that a `model_based` dimension without a `schema` is effectively a DD.\n> Whereas typically DG are non-`model_based` and do have a `schema`.\n> When `model_based` is `True`, and a `schema` is provided, it is a Model Dimension.\n\n### Facts\n\nThe facts are defined under the `facts` global section:\n\n```yaml\nfacts:\n -\n name: fact\n schema:\n - {name: day_key, type: INTEGER, required: Yes}\n - {name: time_key, type: INTEGER, required: Yes}\n - {name: dimension_key, type: INTEGER, required: Yes}\n - {name: timestamp, type: TIMESTAMP, required: Yes}\n - {name: fact, type: FLOAT, required: Yes}\n```\n\nA fact has the following parameters:\n\n - `name` - The name used in BigQuery.\n - `schema` - The fields of the fact.\n\n> Note that facts only exist in BigQuery.\n\n## Dimensions Data Lake\n\nAll _Model_ and _Generated_ Dimensions are stored in Datastore.\nHow they end up there is taken care of by another sub-system, though they must use the same schema\nas their BigQuery and CloudSQL counterpart. Each field is represented by a Datastore Entity attribute.\n\nThese Dimensions are being kept up-to-date by the operational system, and changes need to be\nperiodically propagated to BigQuery and CloudSQL.\n\n### Datastore Ancestors & Indexes\n\nIn order to guarantee strong write consistency, all the Datastore Entities are inserted with\na parent Entity. When using parents, a Datastore ancestor query can then be performed.\n\n> See [ancestor queries](https://cloud.google.com/datastore/docs/concepts/queries#ancestor_queries) and\n> [data consistency](https://cloud.google.com/datastore/docs/concepts/structuring_for_strong_consistency)\n> for more info.\n\nIndexes are created by default for all entities, attributes, and ancestors, ascending and descending.\n\n## BigQuery\n\nThe `BigQuery` class is a simple wrapper around the `google.cloud.bigquery.Client` class, so\ninstantiating it doesn't require any extra parameter.\n\n`BigQuery` provides helpers to create or re-create tables based on an `eds` schema, as well as\nupdate a table's schema.\n\n```python\nfrom eds.config import schema\nfrom eds.bigquery import BigQuery\n\nbigquery = BigQuery()\ndimension = schema.DIMENSIONS['dimension_id']\n\nbigquery.create_table(dataset='dimensions', recreate=False, schema=dimension.schema,\n table_name=dimension.key)\n\nbigquery.update_table(dataset='dimensions', schema=dimension.schema, table_name=dimension.key)\n```\n\n## CloudSQL\n\n> Note that `eds` only supports CloudSQL with MySQL.\n\nThe `CloudSQL` class must be instantiated with a `Connection`.\nA `Connection` is a **SQLAlchemy** wrapper class that uses the default pooling settings recommended\nby GCP.\n\n```python\nfrom eds.db import Connection\nfrom eds.cloudsql import CloudSQL\n\nconnection = Connection(db_engine_url='DB_ENGINE_URL')\ncloudsql = CloudSQL(connection=connection)\n```\n\n`CloudSQL` provides helpers to create or re-create tables based on an `eds` schema.\n\n> Updating an existing table's schema is currently not supported.\n\n```python\nfrom eds.config import schema\n\ndimension = schema.DIMENSIONS['dimension_id']\ncloudsql.create_table(table=dimension.tables[dimension.key], recreate=False)\n```\n\n> The `tables` property of a Dimension includes 2 SQLAlchemy `Table` definitions: one for a table\n> named after the Dimension's `key`, and another named using the `_staging` suffix.\n\n### CloudSQL/MySQL Charset\n\nUTF-8 support is only available since MySQL 5.7 with the `utf8mb4` charset.\n\nAs this isn't the default charset used by MySQL, it must be set as a flag:\n\n```\ncharacter_set_server = utf8mb4\n```\n\nIn order to display all characters properly via the **Cloud Shell**, remember to set the client\nconnection charset to `utf8mb4` from the `mysql` console, with:\n\n```\nSET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci;\n```\n\n## Loading Data to a Dimension\n\nA data payload (usually from a Dimension stored in Datastore) can be loaded to BigQuery and CloudSQL\nusing the `BigQuery` and `CloudSQL` classes.\n\n```python\npayload = [\n {\n 'dimension_key': 123,\n 'dimension_id': 364577,\n 'dimension_name': 'D123',\n 'archived': False,\n 'timestamp': '2018-11-02T12:01:05.571694+00:00',\n },\n]\n\nbigquery.load(dataset='dimensions', table_name=dimension.key, schema=dimension.schema,\n payload=payload)\ncloudsql.load(table=dimension.tables[dimension.key], payload=payload)\n```\n\n**Note on CloudSQL**\n\nAny schema field ending with `_name` will be encoded to UTF-8 if they are recieved as `bytes`,\nbefore writing to CloudSQL.\n\nAll `timestamp` fields will be stripped of any trailing timezone information, again before loading\ninto CloudSQL. The `timestamp` field is expected to be in ISO 8601.\n\n## Dimension Update\n\nA *Dimension Update* will update the records in a table with the records from its staging table.\n\n```python\nbigquery.update(table_name=dimension.key, key_field=dimension.key_field, schema=dimension.schema)\ncloudsql.update(table_name=dimension.key, key_field=dimension.key_field, schema=dimension.schema)\n```\n\n## Dimension Insert\n\nA *Dimension Insert* will insert records from its staging table that do not exist in its own table.\n\n```python\nbigquery.insert(table_name=dimension.key, key_field=dimension.key_field, schema=dimension.schema)\ncloudsql.insert(table_name=dimension.key, key_field=dimension.key_field, schema=dimension.schema)\n```\n\n# Development\n\nTo contribute to this project, install and run `tox`:\n\n```\npip install -r tests/setup.txt\ntox\n```\n\n## PyPI\n\nThe project is the root PyPI namespace `eds` for the public index.\n\nThe PyPI package is available at [pypi.python.org/pypi/eds](https://pypi.python.org/pypi/eds/).\n\nTo publish a new release:\n\n```\npython setup.py sdist\ntwine upload [--repository-url https://test.pypi.org/legacy/] dist/*\n```", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/rockabox/eds", "keywords": "", "license": "Apache 2.0", "maintainer": "", "maintainer_email": "", "name": "eds", "package_url": "https://pypi.org/project/eds/", "platform": "", "project_url": "https://pypi.org/project/eds/", "project_urls": { "Homepage": "https://github.com/rockabox/eds" }, "release_url": "https://pypi.org/project/eds/1.1.0/", "requires_dist": null, "requires_python": ">=3.6", "summary": "A series of high-level utilities on top of the google-cloud-python libraries.", "version": "1.1.0" }, "last_serial": 5889640, "releases": { "0.2.0": [ { "comment_text": "", "digests": { "md5": "93beeb2302b4217f8877dd80b3e618e3", "sha256": "7e6c0c45f30e1ab63405265a41e75fdf07e09299a653edfc8f8e91652bce524d" }, "downloads": -1, "filename": "eds-0.2.0.tar.gz", "has_sig": false, "md5_digest": "93beeb2302b4217f8877dd80b3e618e3", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 1317, "upload_time": "2019-01-30T13:45:01", "url": "https://files.pythonhosted.org/packages/27/13/285fc221e932fb27332c8dc3e8c58f7365c272cbc33088145537cc3bf829/eds-0.2.0.tar.gz" } ], "1.0.0": [ { "comment_text": "", "digests": { "md5": "aa116bf600fcfc52e7e261cddf6d9f03", "sha256": "ffbd0e9c10ba5d665f3344347546e9a54f1dd61b326c7cd9daa7434ac3c2855b" }, "downloads": -1, "filename": "eds-1.0.0.tar.gz", "has_sig": false, "md5_digest": "aa116bf600fcfc52e7e261cddf6d9f03", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 19150, "upload_time": "2019-02-13T13:24:30", "url": "https://files.pythonhosted.org/packages/33/76/e59ded3e80921e86f5ca16ced7abb0771696bb0b2e377cfed7f2e8764e59/eds-1.0.0.tar.gz" } ], "1.0.1": [ { "comment_text": "", "digests": { "md5": "ff171c226ff4397fcfdeef02afd60220", "sha256": "f0d66706bb8f683fefb81c349c89c872567d34af2375a3a050d85654f9b567b4" }, "downloads": -1, "filename": "eds-1.0.1.tar.gz", "has_sig": false, "md5_digest": "ff171c226ff4397fcfdeef02afd60220", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 23009, "upload_time": "2019-02-21T14:08:32", "url": "https://files.pythonhosted.org/packages/ce/63/e4debfddfddb16adddbf184e5cf2d05db81b8cd48ebd7f12f89508660a57/eds-1.0.1.tar.gz" } ], "1.0.2": [ { "comment_text": "", "digests": { "md5": "40cb11201380eb8283c183d1816034cb", "sha256": "cc4a52cc2bbf8c628fe65a5597d37be3b405b26724d8880424d7e23e0173f5e1" }, "downloads": -1, "filename": "eds-1.0.2.tar.gz", "has_sig": false, "md5_digest": "40cb11201380eb8283c183d1816034cb", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 23231, "upload_time": "2019-04-05T12:51:01", "url": "https://files.pythonhosted.org/packages/bd/52/11a0ef10ccbbdc3107569a1d3eaa08e8fca680e13eca2f94919bf3dd242c/eds-1.0.2.tar.gz" } ], "1.0.3": [ { "comment_text": "", "digests": { "md5": "3747f1d6607c6bde72360a07cf09a6de", "sha256": "21b7e113b531aa4390e50dced62431dfd67574abbe6c049b867547023cbb9f06" }, "downloads": -1, "filename": "eds-1.0.3.tar.gz", "has_sig": false, "md5_digest": "3747f1d6607c6bde72360a07cf09a6de", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 23244, "upload_time": "2019-04-08T14:11:47", "url": "https://files.pythonhosted.org/packages/2f/90/c0feeb43ba4ffaca2e34663ed461e194d46ff3da610581058b5ee0b32ed3/eds-1.0.3.tar.gz" } ], "1.0.4": [ { "comment_text": "", "digests": { "md5": "00a25c45a93e7c40b63a1dfbeb7f7941", "sha256": "712bf2aa1ca9a456667473cf36539d2fdba7adc4302ca84c50a574de217290f3" }, "downloads": -1, "filename": "eds-1.0.4.tar.gz", "has_sig": false, "md5_digest": "00a25c45a93e7c40b63a1dfbeb7f7941", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 23260, "upload_time": "2019-07-08T13:32:26", "url": "https://files.pythonhosted.org/packages/3f/09/4df89a26f5c0338928c4fc8dcc0ddfbb27bdb8dc6b63139ae549c04e3a53/eds-1.0.4.tar.gz" } ], "1.0.5": [ { "comment_text": "", "digests": { "md5": "6be6e6823cebe341354f922f931cc3ad", "sha256": "5af5cf535dd6c334ffe04fc0dc9c1ac7668d0ae36327507a4c2facaea8500fbc" }, "downloads": -1, "filename": "eds-1.0.5.tar.gz", "has_sig": false, "md5_digest": "6be6e6823cebe341354f922f931cc3ad", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 23375, "upload_time": "2019-07-10T14:41:11", "url": "https://files.pythonhosted.org/packages/41/0c/144ab87f5f4d94786f71410e41bdb4ecfa9a349e71987f688263af1ecf94/eds-1.0.5.tar.gz" } ], "1.0.5.dev206": [ { "comment_text": "", "digests": { "md5": "e07ff86cad822f09536612fec67ba6fb", "sha256": "1a6ed62a90881e821949bdaad135c80566b1f90a4b7180816853b106a990af42" }, "downloads": -1, "filename": "eds-1.0.5.dev206.tar.gz", "has_sig": false, "md5_digest": "e07ff86cad822f09536612fec67ba6fb", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 23611, "upload_time": "2019-09-25T09:37:57", "url": "https://files.pythonhosted.org/packages/01/27/8014f44cbbd6f0e16d10e1aaade33f267e61082ca71707ed2de2dbd2e6a1/eds-1.0.5.dev206.tar.gz" } ], "1.0.5.dev207": [ { "comment_text": "", "digests": { "md5": "793a6fcdb28ea9f894df6986cd3ac045", "sha256": "21f6eda7303640bdb5abb39997d09dd0daade79cd7bf621f061db2487feffa64" }, "downloads": -1, "filename": "eds-1.0.5.dev207.tar.gz", "has_sig": false, "md5_digest": "793a6fcdb28ea9f894df6986cd3ac045", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 23646, "upload_time": "2019-09-25T15:30:14", "url": "https://files.pythonhosted.org/packages/4c/cd/04d5fcb1a06fca32d40ebc63116e536f9b9edc8e1a2c465502d776ed649e/eds-1.0.5.dev207.tar.gz" } ], "1.1.0": [ { "comment_text": "", "digests": { "md5": "af80854ab775363d4e28eea02b670f78", "sha256": "acc0772464329f9ae784a3ac862f1abb7abed1d4321fb87890489b955fdda487" }, "downloads": -1, "filename": "eds-1.1.0.tar.gz", "has_sig": false, "md5_digest": "af80854ab775363d4e28eea02b670f78", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 23633, "upload_time": "2019-09-26T09:24:58", "url": "https://files.pythonhosted.org/packages/65/e2/8225529e3c610216ced4a78514dc2cd0eeda8068f419a60b4ef795951f9b/eds-1.1.0.tar.gz" } ], "1.1.0.dev208": [ { "comment_text": "", "digests": { "md5": "411194135560598c357396795269f6cc", "sha256": "9cece31de811d6d42c1162ee8e92b875222715c7e3aab2e8e709a4f2dc7f661c" }, "downloads": -1, "filename": "eds-1.1.0.dev208.tar.gz", "has_sig": false, "md5_digest": "411194135560598c357396795269f6cc", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 23651, "upload_time": "2019-09-26T09:10:15", "url": "https://files.pythonhosted.org/packages/b5/50/dc232c410088371fc825bde6fb6421601f83b1d956ae2fb88f4202a298b0/eds-1.1.0.dev208.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "af80854ab775363d4e28eea02b670f78", "sha256": "acc0772464329f9ae784a3ac862f1abb7abed1d4321fb87890489b955fdda487" }, "downloads": -1, "filename": "eds-1.1.0.tar.gz", "has_sig": false, "md5_digest": "af80854ab775363d4e28eea02b670f78", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 23633, "upload_time": "2019-09-26T09:24:58", "url": "https://files.pythonhosted.org/packages/65/e2/8225529e3c610216ced4a78514dc2cd0eeda8068f419a60b4ef795951f9b/eds-1.1.0.tar.gz" } ] }