{
    "info": {
        "author": "Fishtown Analytics",
        "author_email": "info@fishtownanalytics.com",
        "bugtrack_url": null,
        "classifiers": [],
        "description": "## dbt-spark\n\n### Documentation\nFor more information on using Spark with dbt, consult the [dbt documentation](https://docs.getdbt.com/docs/profile-spark).\n\n### Installation\nThis plugin can be installed via pip:\n```\n$ pip install dbt-spark\n```\n\n### Configuring your profile\n\n**Connection Method**\n\nConnections can be made to Spark in two different modes. The `http` mode is used when connecting to a managed service such as Databricks, which provides an HTTP endpoint; the `thrift` mode is used to connect directly to the master node of a cluster (either on-premise or in the cloud).\n\nA dbt profile can be configured to run against Spark using the following configuration:\n\n| Option  | Description                                        | Required?               | Example                  |\n|---------|----------------------------------------------------|-------------------------|--------------------------|\n| method    | Specify the connection method (`thrift` or `http`)   | Required   | `http`   |\n| schema  | Specify the schema (database) to build models into | Required                | `analytics`              |\n| host    | The hostname to connect to                         | Required                | `yourorg.sparkhost.com`  |\n| port    | The port to connect to the host on                 | Optional (default: 443 for `http`, 10001 for `thrift`) | `443`                    |\n| token   | The token to use for authenticating to the cluster | Required for `http`                | `abc123`                 |\n| cluster | The name of the cluster to connect to              | Required for `http`               | `01234-23423-coffeetime` |\n| user    | The username to use to connect to the cluster  | Optional  | `hadoop`  |\n| connect_timeout | The number of seconds to wait before retrying to connect to a Pending Spark cluster | Optional (default: 10) | `60` |\n| connect_retries | The number of times to try connecting to a Pending Spark cluster before giving up   | Optional (default: 0)  | `5` |\n\n**Usage with Amazon EMR**\n\nTo connect to Spark running on an Amazon EMR cluster, you will need to run `sudo /usr/lib/spark/sbin/start-thriftserver.sh` on the master node of the cluster to start the Thrift server (see https://aws.amazon.com/premiumsupport/knowledge-center/jdbc-connection-emr/ for further context). You will also need to connect to port `10001`, which will connect to the Spark backend Thrift server; port `10000` will instead connect to a Hive backend, which will not work correctly with dbt.\n\n\n**Example profiles.yml entries:**\n```\nyour_profile_name:\n  target: dev\n  outputs:\n    dev:\n      method: http\n      type: spark\n      schema: analytics\n      host: yourorg.sparkhost.com\n      port: 443\n      token: abc123\n      cluster: 01234-23423-coffeetime\n      connect_retries: 5\n      connect_timeout: 60\n```\n\n```\nyour_profile_name:\n  target: dev\n  outputs:\n    dev:\n      method: thrift\n      type: spark\n      schema: analytics\n      host: 127.0.0.1\n      port: 10001\n      user: hadoop\n      connect_retries: 5\n      connect_timeout: 60\n```\n\n\n\n### Usage Notes\n\n**Model Configuration**\n\nThe following configurations can be supplied to models run with the dbt-spark plugin:\n\n\n| Option  | Description                                        | Required?               | Example                  |\n|---------|----------------------------------------------------|-------------------------|--------------------------|\n| file_format  | The file format to use when creating tables | Optional                | `parquet`              |\n\n\n\n**Incremental Models**\n\nSpark does not natively support `delete`, `update`, or `merge` statements. As such, [incremental models](https://docs.getdbt.com/docs/configuring-incremental-models)\nare implemented differently than usual in this plugin. To use incremental models, specify a `partition_by` clause in your model config.\ndbt will use an `insert overwrite` query to overwrite the partitions included in your query. Be sure to re-select _all_ of the relevant\ndata for a partition when using incremental models.\n\n```\n{{ config(\n    materialized='incremental',\n    partition_by=['date_day'],\n    file_format='parquet'\n) }}\n\n/*\n  Every partition returned by this query will be overwritten\n  when this model runs\n*/\n\nselect\n    date_day,\n    count(*) as users\n\nfrom {{ ref('events') }}\nwhere date_day::date >= '2019-01-01'\ngroup by 1\n```\n\n### Reporting bugs and contributing code\n\n-   Want to report a bug or request a feature? Let us know on [Slack](http://slack.getdbt.com/), or open [an issue](https://github.com/fishtown-analytics/dbt-spark/issues/new).\n\n## Code of Conduct\n\nEveryone interacting in the dbt project's codebases, issue trackers, chat rooms, and mailing lists is expected to follow the [PyPA Code of Conduct](https://www.pypa.io/en/latest/code-of-conduct/).",
        "description_content_type": "text/markdown",
        "docs_url": null,
        "download_url": "",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "https://github.com/fishtown-analytics/dbt-spark",
        "keywords": "",
        "license": "",
        "maintainer": "",
        "maintainer_email": "",
        "name": "dbt-spark",
        "package_url": "https://pypi.org/project/dbt-spark/",
        "platform": "",
        "project_url": "https://pypi.org/project/dbt-spark/",
        "project_urls": {
            "Homepage": "https://github.com/fishtown-analytics/dbt-spark"
        },
        "release_url": "https://pypi.org/project/dbt-spark/0.13.0/",
        "requires_dist": null,
        "requires_python": "",
        "summary": "The SparkSQL plugin for dbt (data build tool)",
        "version": "0.13.0"
    },
    "last_serial": 5482736,
    "releases": {
        "0.13.0": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "b4985cce5174703043df23a701f7cce3",
                    "sha256": "65d8d9ccfd5185cfaba1652bb732d69e25eda12dbafcdb67943615d3255e6242"
                },
                "downloads": -1,
                "filename": "dbt_spark-0.13.0-py3.7.egg",
                "has_sig": false,
                "md5_digest": "b4985cce5174703043df23a701f7cce3",
                "packagetype": "bdist_egg",
                "python_version": "3.7",
                "requires_python": null,
                "size": 26236,
                "upload_time": "2019-07-03T17:12:07",
                "url": "https://files.pythonhosted.org/packages/6a/79/686f13b7bfa55ff80abc40c3db0a61f59fafba6c17e9b8fcebb153eed6bf/dbt_spark-0.13.0-py3.7.egg"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "aba7d7199a6f4f76fcc8c0933cbc5a4d",
                    "sha256": "d0c3255edadec5a2d423ca7fd20a4d2b0ba45c75fc0b73b554121a98f74c72c6"
                },
                "downloads": -1,
                "filename": "dbt-spark-0.13.0.tar.gz",
                "has_sig": false,
                "md5_digest": "aba7d7199a6f4f76fcc8c0933cbc5a4d",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 12799,
                "upload_time": "2019-07-03T17:12:04",
                "url": "https://files.pythonhosted.org/packages/bb/37/fe34166ef27c5d71022ae27ec2445c8c0227b3f17bd5999e5893e6012ca8/dbt-spark-0.13.0.tar.gz"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "b4985cce5174703043df23a701f7cce3",
                "sha256": "65d8d9ccfd5185cfaba1652bb732d69e25eda12dbafcdb67943615d3255e6242"
            },
            "downloads": -1,
            "filename": "dbt_spark-0.13.0-py3.7.egg",
            "has_sig": false,
            "md5_digest": "b4985cce5174703043df23a701f7cce3",
            "packagetype": "bdist_egg",
            "python_version": "3.7",
            "requires_python": null,
            "size": 26236,
            "upload_time": "2019-07-03T17:12:07",
            "url": "https://files.pythonhosted.org/packages/6a/79/686f13b7bfa55ff80abc40c3db0a61f59fafba6c17e9b8fcebb153eed6bf/dbt_spark-0.13.0-py3.7.egg"
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "aba7d7199a6f4f76fcc8c0933cbc5a4d",
                "sha256": "d0c3255edadec5a2d423ca7fd20a4d2b0ba45c75fc0b73b554121a98f74c72c6"
            },
            "downloads": -1,
            "filename": "dbt-spark-0.13.0.tar.gz",
            "has_sig": false,
            "md5_digest": "aba7d7199a6f4f76fcc8c0933cbc5a4d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 12799,
            "upload_time": "2019-07-03T17:12:04",
            "url": "https://files.pythonhosted.org/packages/bb/37/fe34166ef27c5d71022ae27ec2445c8c0227b3f17bd5999e5893e6012ca8/dbt-spark-0.13.0.tar.gz"
        }
    ]
}