{ "info": { "author": "Tomas Farias", "author_email": "tomasfariassantana@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Topic :: Software Development :: Testing" ], "description": "==========\nspark-test\n==========\n\nA collection of assertion functions to test Spark Collections like DataFrames\n\n---\n\nAs you develop Spark applications, you can eventually end up writing methods that apply transformations over Spark DataFrames. In order to test the results, you can create ``pandas`` DataFrames and use the test functions provided by ``pandas`` as ``pyspark`` does not provide any functions to assist with testing.\n\n``spark-test`` provides testing functions similar to ``pandas`` but geared towards Spark Collections.\n\nLet's say you have a function to apply some transformations on a Spark DataFrame (the full code for this example can be found in tests/test_example.py:\n\n::\n\n def transform(df):\n \"\"\"\n Fill nulls with 0, sum 10 to Age column and only return distinct rows\n \"\"\"\n\n df = df.na.fill(0)\n df = df.withColumn('Age', df['Age'] + 10)\n df = df.distinct()\n\n return df\n\nWe can then write a test case with as many test inputs as we need and test the results with ``assert_dataframe_equal``:\n\n::\n\n from spark_test.testing import assert_dataframe_equal\n\n\n def test_transform(spark, transform):\n\n input_df = spark.createDataFrame(\n [['Tom', 25], ['Tom', 25], ['Charlie', 24], ['Dan', None]],\n schema=['Name', 'Age']\n )\n\n expected = spark.createDataFrame(\n [['Tom', 35], ['Charlie', 34], ['Dan', 0]],\n schema=['Name', 'Age']\n )\n result = transform(input_df)\n\n assert_frame_equal(expected, result)\n\nOf course, tests are more interesting when they fail so let's introduce a bug in our ``transform`` function:\n\n::\n\n def bugged_transform(df):\n \"\"\"\n Fill nulls with 0, sum 10 to Age column and only return distinct rows\n \"\"\"\n\n df = df.na.fill(1) # Whoops! Should be 0!\n df = df.withColumn('Age', df['Age'] + 10)\n df = df.distinct()\n\n return df\n\nPassing both functions to our test using ``pytest.mark.parametize`` yields the following output with a nice message on what failed:\n\n::\n\n $ pytest tests/example.py\n ============================= test session starts =============================\n platform linux -- Python 3.7.3, pytest-5.0.0, py-1.8.0, pluggy-0.12.0\n rootdir: /home/tfarias/repos/spark-test\n collected 2 items\n\n tests/example.py .F [100%]\n\n ================================== FAILURES ===================================\n _______________________ test_transform[bugged_transform] ________________________\n\n assert left_d[key] == right_d[key], msg.format(\n > field=key, l_value=left_d[key], r_value=right_d[key]\n )\n E AssertionError: Values for Age do not match:\n E Left=10\n E Right=11\n\n\nLicense\n-------\n\nDistributed under the MIT License.", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/tomasfarias/spark-test", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "spark-test", "package_url": "https://pypi.org/project/spark-test/", "platform": "", "project_url": "https://pypi.org/project/spark-test/", "project_urls": { "Homepage": "https://github.com/tomasfarias/spark-test" }, "release_url": "https://pypi.org/project/spark-test/0.2.8/", "requires_dist": null, "requires_python": "", "summary": "Assertion functions to test Spark Collections like DataFrames.", "version": "0.2.8" }, "last_serial": 5594582, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "3c76d399eab2b375573ca6c759bc2be3", "sha256": "b9b44bf2e192ec187f8c19a3191a4d29f74c0bd4cc5118e5a74951b88250a9b0" }, "downloads": -1, "filename": "spark-test-0.1.tar.gz", "has_sig": false, "md5_digest": "3c76d399eab2b375573ca6c759bc2be3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4734, "upload_time": "2019-07-11T02:50:23", "url": "https://files.pythonhosted.org/packages/81/91/a8d1bd2b54c78c1a96cc960a43c891035d50ab379489a8de1e1e9f20e76b/spark-test-0.1.tar.gz" } ], "0.2": [ { "comment_text": "", "digests": { "md5": "a0084b3489fd1d510814040cdd8d4a86", "sha256": "62ead6bcc02cea4aefb8016e710e6ab0a81a85399b977297b40043f71da24bcf" }, "downloads": -1, "filename": "spark-test-0.2.tar.gz", "has_sig": false, "md5_digest": "a0084b3489fd1d510814040cdd8d4a86", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4787, "upload_time": "2019-07-11T03:44:29", "url": "https://files.pythonhosted.org/packages/c6/5c/8db571e0a21633710523a5d98d83b9a8e43f9e1eafdf77415d8f5a325dfc/spark-test-0.2.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "0513219425a465612e63715c947dd4ff", "sha256": "da587c95adb2be4c83989307ff6a8c6319a60246c376bfff27a9c147f92b7824" }, "downloads": -1, "filename": "spark-test-0.2.1.tar.gz", "has_sig": false, "md5_digest": "0513219425a465612e63715c947dd4ff", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4787, "upload_time": "2019-07-11T03:49:50", "url": "https://files.pythonhosted.org/packages/9c/20/1eff60cde58738f46c3a800c67ef4de5666673cb6cae72d5179255da0dcd/spark-test-0.2.1.tar.gz" } ], "0.2.2": [ { "comment_text": "", "digests": { "md5": "0fe8f814c3c08bd7202387763f786397", "sha256": "5e72df11b9075db020b6adb5dd7f329199a3955c42ed9243196a998d9f7f9e89" }, "downloads": -1, "filename": "spark-test-0.2.2.tar.gz", "has_sig": false, "md5_digest": "0fe8f814c3c08bd7202387763f786397", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4801, "upload_time": "2019-07-12T03:08:28", "url": "https://files.pythonhosted.org/packages/c5/c2/69461851c50ecc37ee6ee8cf7418c35b2f218940345c54f38adedf80a189/spark-test-0.2.2.tar.gz" } ], "0.2.3": [ { "comment_text": "", "digests": { "md5": "b5ed020bcec956a16e25352aa581fbee", "sha256": "a8663b9c6eaf20070aa0cacc419a1a576b3aad3123577b61b71d6f518ec555bb" }, "downloads": -1, "filename": "spark-test-0.2.3.tar.gz", "has_sig": false, "md5_digest": "b5ed020bcec956a16e25352aa581fbee", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4837, "upload_time": "2019-07-12T03:20:23", "url": "https://files.pythonhosted.org/packages/e4/69/f8c593ab77889d565dcd3a8e8d888fffe5c6a223d2fdacd78b23399a679c/spark-test-0.2.3.tar.gz" } ], "0.2.4": [ { "comment_text": "", "digests": { "md5": "8db8eca6c64c78bc5eb853cd6a15a232", "sha256": "a9003c0a7c6a55d1428dcdb6807b99d65e3e9cb8af9c2a390762ebc7a138f932" }, "downloads": -1, "filename": "spark-test-0.2.4.tar.gz", "has_sig": false, "md5_digest": "8db8eca6c64c78bc5eb853cd6a15a232", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4829, "upload_time": "2019-07-12T04:03:38", "url": "https://files.pythonhosted.org/packages/de/6f/8316a872d3002364c7adf8209deb2922a804970d90ad9deeaa8b9fbf9320/spark-test-0.2.4.tar.gz" } ], "0.2.5": [ { "comment_text": "", "digests": { "md5": "b3be0b631e20a3d0822c1d64c0ed4176", "sha256": "64a4615c4645a629faa3de4d2e91e5660004be734a5888ae8d03724ca0e3c2d5" }, "downloads": -1, "filename": "spark-test-0.2.5.tar.gz", "has_sig": false, "md5_digest": "b3be0b631e20a3d0822c1d64c0ed4176", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4832, "upload_time": "2019-07-15T23:24:26", "url": "https://files.pythonhosted.org/packages/dc/50/b6cdc6f5b70d05852c2738cb9eeb753ba523589cac43ae211a8e0c5bee81/spark-test-0.2.5.tar.gz" } ], "0.2.6": [ { "comment_text": "", "digests": { "md5": "e9f5d60f4565d017ed48df41afcf8c44", "sha256": "cc6807c64d66f26b1bf7950ae27a020dd4948da50561145270902e095f1fa875" }, "downloads": -1, "filename": "spark-test-0.2.6.tar.gz", "has_sig": false, "md5_digest": "e9f5d60f4565d017ed48df41afcf8c44", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4827, "upload_time": "2019-07-15T23:29:40", "url": "https://files.pythonhosted.org/packages/cd/0c/66799369d70ecf392c082fb496be828121bbf4f598d6eb241d407552823d/spark-test-0.2.6.tar.gz" } ], "0.2.7": [ { "comment_text": "", "digests": { "md5": "5f3fe33efdec54a35954d65b9e894b33", "sha256": "5c986fbd4ac908382189d84189da3baf84d12818da7359914c100e9210828a23" }, "downloads": -1, "filename": "spark-test-0.2.7.tar.gz", "has_sig": false, "md5_digest": "5f3fe33efdec54a35954d65b9e894b33", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4831, "upload_time": "2019-07-15T23:32:21", "url": "https://files.pythonhosted.org/packages/84/e1/1c214e8e6f22611895419063041eab8949b0d8db07b81519f67a96172e9c/spark-test-0.2.7.tar.gz" } ], "0.2.8": [ { "comment_text": "", "digests": { "md5": "98019ceba47d54266d9928a7cfb3c839", "sha256": "8985df2f8af522a0b48e3c082542dedc62c1b406778d882252a85d573af1f16d" }, "downloads": -1, "filename": "spark-test-0.2.8.tar.gz", "has_sig": false, "md5_digest": "98019ceba47d54266d9928a7cfb3c839", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4855, "upload_time": "2019-07-28T02:34:27", "url": "https://files.pythonhosted.org/packages/34/ac/f5642f46a789cbb421ba28188656ad09b01b6a07ef02f6791f478818c915/spark-test-0.2.8.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "98019ceba47d54266d9928a7cfb3c839", "sha256": "8985df2f8af522a0b48e3c082542dedc62c1b406778d882252a85d573af1f16d" }, "downloads": -1, "filename": "spark-test-0.2.8.tar.gz", "has_sig": false, "md5_digest": "98019ceba47d54266d9928a7cfb3c839", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4855, "upload_time": "2019-07-28T02:34:27", "url": "https://files.pythonhosted.org/packages/34/ac/f5642f46a789cbb421ba28188656ad09b01b6a07ef02f6791f478818c915/spark-test-0.2.8.tar.gz" } ] }