{ "info": { "author": "Frank Bertsch", "author_email": "frank@mozilla.com", "bugtrack_url": null, "classifiers": [ "Intended Audience :: Developers", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.7" ], "description": "# Mozilla Schema Generator\n\nA library for generating full representations of Mozilla telemetry pings.\n\nSee [Mozilla Pipeline Schemas](https://www.github.com/mozilla-services/mozilla-pipeline-services)\nfor the more generic structure of pings. This library takes those generic structures and fills in\nall of the probes we expect to see in the appropriate places.\n\n## Telemetry Integration\n\nThere are two pings we are targeting for integration with this library:\n\n1. [The Main Ping](http://gecko-docs.mozilla.org.s3.amazonaws.com/toolkit/components/telemetry/telemetry/data/main-ping.html)\n is the historical Firefox Desktop ping, and contains many more than ten-thousand total pieces of data.\n2. [The Glean Ping](https://github.com/mozilla/glean_parser) is the new ping-type being created for\n more generic data collection.\n\nThis library takes the information for what should be in those pings from the [Probe Info Service](https://www.github.com/mozilla/probe-scraper).\n\n## Data Store Integration\n\nThe primary use of the schemas is for integration with the\n[Schema Transpiler](https://www.github.com/mozilla/jsonschema-transpiler). \nThe schemas that this repository generates can be transpiled into Avro and Bigquery. They define\nthe schema of the Avro and BigQuery tables that the [BQ Sink](https://www.github.com/mozilla/gcp-ingestion)\nwrites to.\n\n### BigQuery Limitations and Splitting\n\nBigQuery has a hard limit of ten thousand columns on any single table. This library\ncan take that limitation into account by splitting schemas into multiple tables. Each\ntable has some common information that are duplicated in every table, and then a set\nof fields that are unique to that table. The join of these tables gives the full\nset of fields available from the ping.\n\nTo decide on a table split, we include the `table_group` configuration in the configuration\nfile. For example, `payload/histograms` has `table_group: histograms`; this indicates that\nthere will be a table outputted with just histograms.\n\nCurrently, generates tables for:\n- Histograms\n- Keyed Histograms\n- Scalars\n- Keyed Scalars\n- Everything else\n\nIf a single table expands beyond 9000 columns, we move the new fields to the next table.\nFor example, main_histograms_1 and main_histograms_2.\n\nNote: Tables are only split if the `--split` parameter is provided.\n\n## Validation\n\nA secondary use-case of these schemas is for validation. The schemas produced are guaranteed to\nbe more correct, since they include explicit definitions of every metric and probe.\n\n## Usage\n\n### Main Ping\n\nGenerate the Full Main Ping schema:\n\n```\nmozilla-schema-generator generate-main-ping\n```\n\nGenerate the Main Ping schema divided among tables (for BigQuery):\n```\nmozilla-schema-generator generate-main-ping --split --out-dir main-ping\n```\n\nThe `out-dir` parameter will be the namespace for the pings.\n\nTo see a full list of options, run `mozilla-schema-generator generate-main-ping --help`.\n\n\n### Glean\n\nGenerate all Glean ping schemas - one for each application, for each ping\nthat application sends:\n\n```\nmozilla-schema-generator generate-glean-ping\n```\n\nWrite schemas to a directory:\n```\nmozilla-schema-generator generate-main-ping --out-dir main-ping\n```\n\nTo see a full list of options, run `mozilla-schema-generator generate-glean-ping --help`.\n\n\n## Configuration Files\n\nConfiguration files are default found in `/config`. You can also specify your own when running the generator.\n\nConfiguration files match certain parts of a ping to certain types of probes or metrics. The nesting\nof the config file matches the ping it is filling in. For example, Glean stores probe types under\nthe `metrics` key, so the nesting looks like this:\n```\n{\n \"metrics\": {\n \"string\": {\n : {...}\n }\n }\n}\n```\n\nWhile the generic schema doesn't include information about the specific ``s being included,\nthe schema-generator does. To include the correct metrics that we would find in that section of the ping,\nwe would organize the `config.yaml` file like this:\n\n```\nmetrics:\n string:\n match:\n type: string\n```\n\nThe `match` key indicates that we should fill-in this section of the ping schema with metrics,\nand the `type: string` makes sure we only put string metrics in there. You can do an exact\nmatch on any field available in the ping info from the [probe-info-service](https://probeinfo.telemetry.mozilla.org/glean/glean/metrics),\nwhich also contains the [Desktop probes](https://probeinfo.telemetry.mozilla.org/firefox/all/main/all_probes).\n\nThere are a few additional keywords allowable under any field:\n* `contains` - e.g. `process: contains: main`, indicates that the `process` field is an array\n and it should only match those that include the entry `main`.\n* `not` - e.g. `send_in_pings: not: glean_ping_info`, indicates that we should match\n any field for `send_in_pings` _except_ `glean_ping_info`.\n\n### `table_group` Key\n\nThis specific field is for indicating which table group that section of the ping should be included in when\nsplitting the schema. Currently we do not split the Glean ping, only the Main. See the section on [BigQuery\nLimitations and Splitting](#bigquery-limitations-and-splitting) for more info.\n\n## Development and Testing\n\nInstall requirements:\n```\nmake install-requirements\n```\n\nRun tests:\n```\nmake test\n```\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/mozilla/mozilla-schema-generator", "keywords": "mozilla-schema-generator", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "mozilla-schema-generator", "package_url": "https://pypi.org/project/mozilla-schema-generator/", "platform": "", "project_url": "https://pypi.org/project/mozilla-schema-generator/", "project_urls": { "Homepage": "https://github.com/mozilla/mozilla-schema-generator" }, "release_url": "https://pypi.org/project/mozilla-schema-generator/0.1.3/", "requires_dist": [ "click", "pyyaml", "requests" ], "requires_python": "", "summary": "Create full representations of schemas using the probe info service.", "version": "0.1.3" }, "last_serial": 5183960, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "7ceb637d00e46751f979cf86da8a290c", "sha256": "eed8677363485177bc92b0e8e84e80e7181e8516b8f94a67c07374c9f507e4f1" }, "downloads": -1, "filename": "mozilla_schema_generator-0.1.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "7ceb637d00e46751f979cf86da8a290c", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 21712, "upload_time": "2019-03-29T17:43:14", "url": "https://files.pythonhosted.org/packages/bd/48/73606185605bf19ea1cd99f24ff941b87793b1897889ec24b47852ffe460/mozilla_schema_generator-0.1.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "86eaef0db12116e75bdf9cac5f71516f", "sha256": "687e3365f639fcc268171031f461db11596df44112935af578b1fdd70a4db98a" }, "downloads": -1, "filename": "mozilla-schema-generator-0.1.0.tar.gz", "has_sig": false, "md5_digest": "86eaef0db12116e75bdf9cac5f71516f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19869, "upload_time": "2019-03-29T17:43:17", "url": "https://files.pythonhosted.org/packages/69/a0/948ae5dbd4151d9176a8cd2b16e8b0b9f1ce2ff56b2d940f26c632922d79/mozilla-schema-generator-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "ff84a543c1a77609e1cd8ced1c4ebbbd", "sha256": "6c97b9b54ab4687e20d05d69b8e36f22bd09f829f29e73b56bd6cf51684abc71" }, "downloads": -1, "filename": "mozilla_schema_generator-0.1.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "ff84a543c1a77609e1cd8ced1c4ebbbd", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 21711, "upload_time": "2019-03-29T17:48:51", "url": "https://files.pythonhosted.org/packages/60/32/e59ce49bcefff058b74acf72514e67ad188d4594e8e3db638adda86ae8a3/mozilla_schema_generator-0.1.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "272341e76efc9efbd464bd58ba190589", "sha256": "a59d6cf73ea350e7434158c484a12d9b9a18ad9d3e5fa47283d5ae2e9311d047" }, "downloads": -1, "filename": "mozilla-schema-generator-0.1.1.tar.gz", "has_sig": false, "md5_digest": "272341e76efc9efbd464bd58ba190589", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19858, "upload_time": "2019-03-29T17:48:52", "url": "https://files.pythonhosted.org/packages/84/53/663c85d364227266ea842ab4ecef53b0201ca3469ca9bbd774c965567ccf/mozilla-schema-generator-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "15a6f469f6700ddc937afabea63621d4", "sha256": "1ed11eb4b2326218c300e1384b6c10a376c18a3dbcbde440bb32c73ab8931368" }, "downloads": -1, "filename": "mozilla_schema_generator-0.1.2-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "15a6f469f6700ddc937afabea63621d4", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 22699, "upload_time": "2019-03-29T18:02:34", "url": "https://files.pythonhosted.org/packages/3b/ad/aa9ecf6db69fc75495d1ec2bab5b1dc4f50e5ad0cbdb3b2b4159c1b02db0/mozilla_schema_generator-0.1.2-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "0dea72fa8f2748e6ffa0cf1f0211651f", "sha256": "7981aa0dab93f84158c30d1a3736cab9bb0b42e7d3f2f5af1fde78827d852d91" }, "downloads": -1, "filename": "mozilla-schema-generator-0.1.2.tar.gz", "has_sig": false, "md5_digest": "0dea72fa8f2748e6ffa0cf1f0211651f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20545, "upload_time": "2019-03-29T18:02:35", "url": "https://files.pythonhosted.org/packages/d2/ac/208eaf28d067a90b8c9dfe076bd82d947449535f34a71a6189141bf840e1/mozilla-schema-generator-0.1.2.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "fc3548094b99df1b568e3b73addf6918", "sha256": "047e6cfbed33fcf0bd4156d3e48eaeb77f2a240c73964297f80db08dcec5b87a" }, "downloads": -1, "filename": "mozilla_schema_generator-0.1.3-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "fc3548094b99df1b568e3b73addf6918", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 20022, "upload_time": "2019-04-24T18:52:42", "url": "https://files.pythonhosted.org/packages/13/2d/78d6aca508c72717b8ac6f0266afaff70bb93ac6384925ee2ea1e5935c4b/mozilla_schema_generator-0.1.3-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "efc228ed1bd5ff5ff365931ca775562e", "sha256": "55833a4e332338fb4bad37a09b030ff9d3628939ba900d812ed11a9b2b9e6f24" }, "downloads": -1, "filename": "mozilla-schema-generator-0.1.3.tar.gz", "has_sig": false, "md5_digest": "efc228ed1bd5ff5ff365931ca775562e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20801, "upload_time": "2019-04-24T18:52:44", "url": "https://files.pythonhosted.org/packages/dc/de/2b0fef0873cce3154d78e36e5ea665b06130ad0523f0cbc509c7a81adaab/mozilla-schema-generator-0.1.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "fc3548094b99df1b568e3b73addf6918", "sha256": "047e6cfbed33fcf0bd4156d3e48eaeb77f2a240c73964297f80db08dcec5b87a" }, "downloads": -1, "filename": "mozilla_schema_generator-0.1.3-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "fc3548094b99df1b568e3b73addf6918", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 20022, "upload_time": "2019-04-24T18:52:42", "url": "https://files.pythonhosted.org/packages/13/2d/78d6aca508c72717b8ac6f0266afaff70bb93ac6384925ee2ea1e5935c4b/mozilla_schema_generator-0.1.3-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "efc228ed1bd5ff5ff365931ca775562e", "sha256": "55833a4e332338fb4bad37a09b030ff9d3628939ba900d812ed11a9b2b9e6f24" }, "downloads": -1, "filename": "mozilla-schema-generator-0.1.3.tar.gz", "has_sig": false, "md5_digest": "efc228ed1bd5ff5ff365931ca775562e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20801, "upload_time": "2019-04-24T18:52:44", "url": "https://files.pythonhosted.org/packages/dc/de/2b0fef0873cce3154d78e36e5ea665b06130ad0523f0cbc509c7a81adaab/mozilla-schema-generator-0.1.3.tar.gz" } ] }