{ "info": { "author": "sashgorokhov", "author_email": "sashgorokhov@gmail.com", "bugtrack_url": null, "classifiers": [], "description": "# pyspark-sugar\n\nSet python traceback on dataframe actions, enrich spark UI with actual business logic stages of spark application.\n\n\n## Installation\n\nVia pip:\n```\npip install pyspark-sugar\n```\n\nDirectly from github:\n```\npip install git+https://github.com/sashgorokhov/pyspark-sugar.git@master\n```\n\n## Usage\n\n\nConsider this synthetic example\n```python\nimport random\n\nimport pyspark\nimport pyspark_sugar\nfrom pyspark.sql import functions as F\n\n\n# Set verbose job description through decorator\n@pyspark_sugar.job_description_decor('Get nulls after type casts')\ndef get_incorrect_cast_cols(sdf, cols):\n \"\"\"\n Return columns with non-zero nulls amount across its values.\n\n :param cols: Subset of columns to check\n \"\"\"\n cols = set(cols) & set(sdf.columns)\n if not cols:\n return {}\n null_counts = sdf.agg(*[F.sum(F.isnull(col).cast('int')).alias(col) for col in cols]).collect()[0]\n return {col: null_counts[col] for col in cols if null_counts[col]}\n\n\ndef main():\n sc = pyspark.SparkContext.getOrCreate() # type: pyspark.SparkContext\n ssc = pyspark.sql.SparkSession.builder.getOrCreate()\n \n # Define a job group. All actions inside job group will be grouped.\n with pyspark_sugar.job_group('ETL', None):\n sdf = ssc.createDataFrame([\n {'CONTACT_KEY': n, 'PRODUCT_KEY': random.choice(['1', '2', 'what', None])}\n for n in range(1000)\n ]).repartition(2).cache()\n\n sdf = sdf.withColumn('PRODUCT_KEY', F.col('PRODUCT_KEY').cast('integer'))\n \n # Collect action inside this function will have nice and readable description\n print(get_incorrect_cast_cols(sdf, ['PRODUCT_KEY']))\n\n sdf = sdf.dropna(subset=['PRODUCT_KEY'])\n \n # You can define several job groups one after another.\n with pyspark_sugar.job_group('Analytics', 'Check dataframe metrics'):\n sdf.cache()\n \n # Set job description for actions made inside context manager.\n with pyspark_sugar.job_description('Get transactions count'):\n print(sdf.count())\n\n sdf.show()\n\n sc.stop()\n\n\nif __name__ == '__main__':\n # Patch almost-all dataframe actions with code that will set stack trace to job details description\n with pyspark_sugar.patch_dataframe_actions():\n main()\n\n```\n\nThis how SparkUI pages will look like:\n\n![SparkUI jobs page screenshot](docs/screenshots/screenshot_jobs.png)\n![SparkUI sql page screenshot](docs/screenshots/screenshot_sql.png)\n\nNow SparkUI is full of human-readable descriptions so even you manager can read it! Sweet.", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/sashgorokhov/pyspark-sugar", "keywords": "pyspark", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "pyspark-sugar", "package_url": "https://pypi.org/project/pyspark-sugar/", "platform": "", "project_url": "https://pypi.org/project/pyspark-sugar/", "project_urls": { "Homepage": "https://github.com/sashgorokhov/pyspark-sugar" }, "release_url": "https://pypi.org/project/pyspark-sugar/0.4.1/", "requires_dist": null, "requires_python": "", "summary": "SparkUI enchancements with pyspark", "version": "0.4.1" }, "last_serial": 5118104, "releases": { "0.3": [ { "comment_text": "", "digests": { "md5": "bed98482e2fd45301e1d47f17dd73bba", "sha256": "277ac5541fa5520f18795bd4547e618764346817bb7cfa1a63b65c96f105c077" }, "downloads": -1, "filename": "pyspark_sugar-0.3.tar.gz", "has_sig": false, "md5_digest": "bed98482e2fd45301e1d47f17dd73bba", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3832, "upload_time": "2019-04-03T09:06:56", "url": "https://files.pythonhosted.org/packages/3e/70/9348d7283c80e6889dde84e1bb97e8b665e0e5b8e579184e99335f135fc1/pyspark_sugar-0.3.tar.gz" } ], "0.3.1": [ { "comment_text": "", "digests": { "md5": "5c6bf01cfe78eab23284b133d2f26778", "sha256": "cbe7b25651ab20c266a90575af0c10dadc99df8cff00073826090e6173f0d0a2" }, "downloads": -1, "filename": "pyspark_sugar-0.3.1.tar.gz", "has_sig": false, "md5_digest": "5c6bf01cfe78eab23284b133d2f26778", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3792, "upload_time": "2019-04-04T10:54:43", "url": "https://files.pythonhosted.org/packages/ad/7c/3ca38320029dd7931a562bba64f128833b770bea3276ea4a5093c67a7d6e/pyspark_sugar-0.3.1.tar.gz" } ], "0.4": [ { "comment_text": "", "digests": { "md5": "8b8612f386f58f47888dd25aae29551b", "sha256": "81c505ab86d71128d71844632b789d2f3bf24af524219cf03414b86dc4dba75a" }, "downloads": -1, "filename": "pyspark_sugar-0.4.tar.gz", "has_sig": false, "md5_digest": "8b8612f386f58f47888dd25aae29551b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4138, "upload_time": "2019-04-09T10:23:04", "url": "https://files.pythonhosted.org/packages/66/06/fd4b9941b901901d0595664f0dba006d7c1ad3c7e194dc06ccfb8e471969/pyspark_sugar-0.4.tar.gz" } ], "0.4.1": [ { "comment_text": "", "digests": { "md5": "2bf1bb32456b68b702d5e7e808d3312e", "sha256": "9947d83ef3b2afade5e954e4ca75c1c4ca073015bd8f31f4037831c5ec10e6dd" }, "downloads": -1, "filename": "pyspark_sugar-0.4.1.tar.gz", "has_sig": false, "md5_digest": "2bf1bb32456b68b702d5e7e808d3312e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4178, "upload_time": "2019-04-09T10:39:06", "url": "https://files.pythonhosted.org/packages/9a/07/e120c8b32926f958b5d36e67a97e83580a30cfc2cc68c0dadb9528604e18/pyspark_sugar-0.4.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "2bf1bb32456b68b702d5e7e808d3312e", "sha256": "9947d83ef3b2afade5e954e4ca75c1c4ca073015bd8f31f4037831c5ec10e6dd" }, "downloads": -1, "filename": "pyspark_sugar-0.4.1.tar.gz", "has_sig": false, "md5_digest": "2bf1bb32456b68b702d5e7e808d3312e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4178, "upload_time": "2019-04-09T10:39:06", "url": "https://files.pythonhosted.org/packages/9a/07/e120c8b32926f958b5d36e67a97e83580a30cfc2cc68c0dadb9528604e18/pyspark_sugar-0.4.1.tar.gz" } ] }