{ "info": { "author": "BriteCore", "author_email": "", "bugtrack_url": null, "classifiers": [], "description": "diffino\n====\n[![Build Status](https://travis-ci.com/IntuitiveWebSolutions/diffino.svg?branch=master)](https://travis-ci.com/IntuitiveWebSolutions/diffino)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/ambv/black)\n\nDiffing tools for comparing datasets in CSV, XLSX and other formats available as CLI app, API, web app and module. Powered by the awesome Pandas library for Python.\n\n### Done\n- Install as CLI app\n- Install and use as python module\n- Compare two CSV datasets using Pandas where you can output differences row by row\n- Use the following inputs for your datasets:\n - Local file in CSV pandas modes\n - File in S3 pandas mode\n- Define a subset of columns to use for comparing/diffing (only works with pandas mode, not supported for MD5 comparison)\n- Output differences to:\n - Console (print)\n - CSV file\n\n### To-Do (ROADMAP)\n- Compare one or more CSV datasets using MD5 hash of the files\n- Compare one or more XLSX datasets using Pandas where you can output differences row by row\n- Use the following inputs for your datasets:\n - Local file in CSV MD5\n - Local file in XLSX (only for pandas mode)\n - Local directory with CSVs or XSLX files (for both MD5 and pandas modes)\n - ZIP file with CSVs or XLSX files (only for pandas mode)\n - File in S3 for MD5\n - Bucket in S3 (for both MD5 and pandas modes)\n- Output differences to:\n - XSLX file\n - JSON file\n\n## Install\n\nTo install as module and CLI:\n\n```\npip install diffino\n```\n\n## CLI\n\nDiffino will try it's best to guess your input storage mechanisms, for that you need to include `s3://` in the input argument and/or the `.csv`, `.xls` and `.xlsx extensions`.\n\n### Compare using pandas\n\nMD5 is only useful for knowing two CSV datasets are not the same but it's not useful for knowing which are the actual differences among those. For that you can use the pandas mode which will output the differences row by row.\nThe same commands shown earlier for MD5 are available, you need to pass the `--mode pandas` argument for using pandas. **By default Pandas mode is used so this argument can be omitted**:\n\n```\ndiffino before_dataset.csv after_dataset.csv --mode pandas\n```\n\nWhen using pandas mode, by default Diffino will try to convert numeric columns, you can change this behavior with:\n\n```\ndiffino before_dataset.csv after_dataset.csv --convert-numeric false\n```\n\nYou can define the columns to be used for checking the diffs:\n\n```\ndiffino before_dataset.csv after_dataset.csv --cols id name\n```\n\n#### Compare two CSV files in an S3 bucket using pandas mode\n\n```\ndiffino s3://bucket/before_dataset.csv s3://bucket/after_dataset.csv --mode pandas\n```\n\n### Output diff results to file\n\nDiffino will try it's best to guess your output storage mechanism, for that you need to include `s3://` in the input argument or use the `.csv`, `.xls` and `.xlsx extensions`.\n\n#### Output to a local CSV file\n```\ndiffino file_1.csv file_2.csv --output diff.csv\n```\n\nNote: Two files are going to be generated, comparing the left argument file to the right argument file. For the example above, 2 files are going to be created:\n\n* `diff_left.csv`\n* `diff_right.csv`\n\n#### Avoid creating unnecesary files\n\nIf you want to avoid unnecesary noise, you can prevent diffino from creating resulting files if there are no actual differences with the `--output-only-diffs` like\n```\ndiffino file_1.csv file_2.csv --output diff.csv\n```\n\nFor the above example, if `file_1` has some extra rows that are not present in `file_2`, but `file_2` only have rows that are present in `file_1`, then we are going to end up only with a resulting `diff_left.csv` file.\n\n\n#### Output to a local Excel file\n\nWhen using Excel, output will contain different sheets as well as one summary sheet containing all differences:\n\n```\ndiffino file_1.csv file_2.csv --output diff.xlsx\n```\n\n#### Output to a local JSON file\n\n```\ndiffino file_1.csv file_2.csv --output diff.json\n```\n\n#### Output to an CSV file in S3\n\n```\ndiffino file_1.csv file_2.csv --output s3://bucket/diff.csv\n```\n\n#### Output to an Excel file in S3\nWhen using Excel, output will contain different sheets as well as one summary sheet containing all differences:\n\n```\ndiffino file_1.csv file_2.csv --output s3://bucket/diff.xlsx\n```\n\n#### Output to a JSON file in S3\n\n```\ndiffino file_1.csv file_2.csv --output s3://bucket/diff.json\n```\n\n## Python module\n\nUseful if you want to integrate as part of you ETL or as part of you Continuous Integration (CI) builds.\n\n### Get a dictionary with differences using pandas mode\nFor using all columns:\n\n```python\nfrom diffino.models import Diffino\n\ndiffino = Diffino(left='s3://bucket/one.csv', right='s3://bucket/two.csv', mode='pandas')\nresults = diffino.build_diff()\n```\n\nIn the above example, the `results` variable contains a tuple with the first index containing\nthe left differences count and the second index with the right differences count:\n\n```python\nresults(0)\nresults(1)\n```\n\nAnd for using a subset of columns you can specify a string with a Python list of the column names you want to include:\n\n```python\nfrom diffino.models import Diffino\n\ndiffino = Diffino(\n left='one.csv',\n right='two.csv',\n mode='pandas',\n cols=['id', 'name']\n)\nresults = diffino.build_diff()\n```\n\n## COMING SOON\nDifferent column names? No problemo that works too! \n\n```python\nfrom diffino.models import Diffino\n\ndiffino = Diffino(\n left='one.xlsx',\n right='two.xlsx',\n mode='pandas',\n left_cols=['myColumn'],\n right_cols=['my_column'],\n)\nresults = diffino.build_diff()\n```\n\n## Web App\n\nComing soon\n\n## API\n\nComing soon\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/IntuitiveWebSolutions/diffino", "keywords": "diffing comparing csv excel json", "license": "", "maintainer": "", "maintainer_email": "", "name": "diffino", "package_url": "https://pypi.org/project/diffino/", "platform": "", "project_url": "https://pypi.org/project/diffino/", "project_urls": { "Homepage": "https://github.com/IntuitiveWebSolutions/diffino" }, "release_url": "https://pypi.org/project/diffino/0.2.1/", "requires_dist": [ "pandas (==0.19.2)", "boto3 (==1.7.3)" ], "requires_python": "", "summary": "Diffing tools for comparing datasets in CSV, XLSX and other formats", "version": "0.2.1" }, "last_serial": 5304465, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "b06fe4c45da6c6777f679dd898754b56", "sha256": "549f2830534edc197912fb135013caeb5d9ead3a8c667406033acdd3db9af0c6" }, "downloads": -1, "filename": "diffino-0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "b06fe4c45da6c6777f679dd898754b56", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 6616, "upload_time": "2019-04-23T23:14:25", "url": "https://files.pythonhosted.org/packages/16/46/f93564c90d3d7bb5a4d233b0ad7f52ebe42edba5c2dc32df4fe412f915ef/diffino-0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3a7269e86166bee39814b67cc2db4761", "sha256": "25b49cd02f98cdaf5b96a9c305d8156613c5a6c794c29f486cf1593eeb512138" }, "downloads": -1, "filename": "diffino-0.1.tar.gz", "has_sig": false, "md5_digest": "3a7269e86166bee39814b67cc2db4761", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5307, "upload_time": "2019-04-23T23:14:32", "url": "https://files.pythonhosted.org/packages/d2/de/909d9e2f919a16eae5c65fc1050214be4d2a27c26f50cf840a43f5bab6de/diffino-0.1.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "5319ae81606de969b4b6ae86a78a2cd0", "sha256": "9d6b6d4c7ff6db8b92e54c19e46ed97d2406cfd9e33529ae20e6ad97d1d4602d" }, "downloads": -1, "filename": "diffino-0.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "5319ae81606de969b4b6ae86a78a2cd0", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 6646, "upload_time": "2019-04-23T23:14:27", "url": "https://files.pythonhosted.org/packages/00/3b/a55bf8ebda5f8b88b4423a0aeda32c846dc5787a7845b4bc43c18e85d37c/diffino-0.1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "592cd5ab39f23f481d554b0f2161acd6", "sha256": "980d5e7fa29508ac9140873684e8df3fb91f8f249d60f6070b7c1c5946115e4a" }, "downloads": -1, "filename": "diffino-0.1.1.tar.gz", "has_sig": false, "md5_digest": "592cd5ab39f23f481d554b0f2161acd6", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5323, "upload_time": "2019-04-23T23:14:30", "url": "https://files.pythonhosted.org/packages/ff/1e/01426f1d2d8a5616bd07ec51f61041bfd91c4529599ced02a3158985b239/diffino-0.1.1.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "41a3f9e708b9b07158e2a11a97199154", "sha256": "494f9b078ea6e5127e298d5a36939dda5996368084aaec4f786c98c5fee9736f" }, "downloads": -1, "filename": "diffino-0.1.3-py3-none-any.whl", "has_sig": false, "md5_digest": "41a3f9e708b9b07158e2a11a97199154", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 6652, "upload_time": "2019-04-23T23:14:28", "url": "https://files.pythonhosted.org/packages/7e/1f/1d692c85945b0876e325f1caee0b639b98b0f89859cd2c060000e83b09af/diffino-0.1.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "4a4942f74a2e4a53d9250a5e92fd4beb", "sha256": "e655efac3be1933c7b4866880e05d110b702ef1d2fde625a61228e5429f8d95f" }, "downloads": -1, "filename": "diffino-0.1.3.tar.gz", "has_sig": false, "md5_digest": "4a4942f74a2e4a53d9250a5e92fd4beb", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5303, "upload_time": "2019-04-23T23:14:31", "url": "https://files.pythonhosted.org/packages/76/b9/ab09994936af6fd085135df9fba642649b7d71967199ef06a128625762f5/diffino-0.1.3.tar.gz" } ], "0.1.4": [ { "comment_text": "", "digests": { "md5": "f4acaf8ddb66ba280a1f03c3d40fc592", "sha256": "ddc240fc976b980f0ff5444378e620365c44e378a955a981bf16704cee763f08" }, "downloads": -1, "filename": "diffino-0.1.4-py3-none-any.whl", "has_sig": false, "md5_digest": "f4acaf8ddb66ba280a1f03c3d40fc592", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 6653, "upload_time": "2019-04-23T23:44:03", "url": "https://files.pythonhosted.org/packages/a0/a5/3134be172d1e48fe834e169414027d646eafb4446451e0a9111fe02d5d72/diffino-0.1.4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "edb8b28e5d5a9995080165541c94803c", "sha256": "369c2f90b77952e74463a9c7527849f506a5af4f5b1436926bf520305805e030" }, "downloads": -1, "filename": "diffino-0.1.4.tar.gz", "has_sig": false, "md5_digest": "edb8b28e5d5a9995080165541c94803c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5393, "upload_time": "2019-04-23T23:44:04", "url": "https://files.pythonhosted.org/packages/b4/f6/d3ce5703b1ffc76e0b121554bed8291ff7f35c8e27c87f89e81f8c2815cb/diffino-0.1.4.tar.gz" } ], "0.1.5": [ { "comment_text": "", "digests": { "md5": "203248c10132d8b821f806f9e39ccb3d", "sha256": "7e844fc1b0b54136cf90531a6acf41426b54f49863c48f27650e938ed4625fec" }, "downloads": -1, "filename": "diffino-0.1.5-py3-none-any.whl", "has_sig": false, "md5_digest": "203248c10132d8b821f806f9e39ccb3d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 6653, "upload_time": "2019-04-23T23:49:10", "url": "https://files.pythonhosted.org/packages/0f/71/748b5fce5f3290c97f91abde4294e8a06265589bb3a84bc64925528079b5/diffino-0.1.5-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "f0bb20417063bdbcde19a1a911899490", "sha256": "775599b759499f9daa09493370ec2cd4560b3ba3ae7e1ac45f8af064f5e68d1b" }, "downloads": -1, "filename": "diffino-0.1.5.tar.gz", "has_sig": false, "md5_digest": "f0bb20417063bdbcde19a1a911899490", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5317, "upload_time": "2019-04-23T23:49:11", "url": "https://files.pythonhosted.org/packages/b1/b2/f5da16133e67b681d669a0bafd8bd4ae4894c77b074d7f2691c222e6f54e/diffino-0.1.5.tar.gz" } ], "0.1.6": [ { "comment_text": "", "digests": { "md5": "d8bb4955017b020e7f1e6ccc67386adf", "sha256": "3eef203abfeebd7c9f6ef919b086384d886406097418a1b0d8f4f36e3e19beb6" }, "downloads": -1, "filename": "diffino-0.1.6-py2-none-any.whl", "has_sig": false, "md5_digest": "d8bb4955017b020e7f1e6ccc67386adf", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 6656, "upload_time": "2019-04-29T16:22:55", "url": "https://files.pythonhosted.org/packages/59/75/b7a95462702d79236676b74c7162214957ea811f023bfd350aee013eb481/diffino-0.1.6-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "5a804388952c00651346b7d5a3710223", "sha256": "a36a9eecf431950e900d1eac1c99b880a070d9d2728cc4f5334df1b38d67db65" }, "downloads": -1, "filename": "diffino-0.1.6.tar.gz", "has_sig": false, "md5_digest": "5a804388952c00651346b7d5a3710223", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5321, "upload_time": "2019-04-29T16:22:58", "url": "https://files.pythonhosted.org/packages/9a/52/c13422947841931f0dd9307eb824d205949d60e471d685387998f10bbb4b/diffino-0.1.6.tar.gz" } ], "0.1.7": [ { "comment_text": "", "digests": { "md5": "d33d11506a65cd81811697761d3d2ca2", "sha256": "df233f23b90c0ea328db1f5a7ba43551fdfe46cdd783db74543b63b16f4b4d6d" }, "downloads": -1, "filename": "diffino-0.1.7-py2-none-any.whl", "has_sig": false, "md5_digest": "d33d11506a65cd81811697761d3d2ca2", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 6794, "upload_time": "2019-05-08T22:06:01", "url": "https://files.pythonhosted.org/packages/d6/05/2d711e3fd00842b2c459efe77d1d5cd7acebe28690a24c8023fec12aba81/diffino-0.1.7-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "2cbab000dae71a53227280fc15abe36a", "sha256": "48b2ce3751c6c4f317ed5845a0aef679782a6f7e7fdb5bc50918897d77aff9e6" }, "downloads": -1, "filename": "diffino-0.1.7.tar.gz", "has_sig": false, "md5_digest": "2cbab000dae71a53227280fc15abe36a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5463, "upload_time": "2019-05-08T22:06:02", "url": "https://files.pythonhosted.org/packages/77/95/5fee5500d74cfcc594a011f15cf9a993aec04fd9b215aa41c407cefef997/diffino-0.1.7.tar.gz" } ], "0.1.8": [ { "comment_text": "", "digests": { "md5": "513b153a3906bfee12af692ed46994e1", "sha256": "0c41cb65e7090c8f0745696ad30fb8b2ace353760bbb496d177a1be19aad024f" }, "downloads": -1, "filename": "diffino-0.1.8-py2-none-any.whl", "has_sig": false, "md5_digest": "513b153a3906bfee12af692ed46994e1", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 6815, "upload_time": "2019-05-08T22:50:14", "url": "https://files.pythonhosted.org/packages/91/9d/d3fa8ecf030c8bb3e7e7dcfcc3fab46bbdd78b2d1560320d361fded85bca/diffino-0.1.8-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "6d1addaac2a27b1d83fdfaeae5e90138", "sha256": "a99f5b723b37d0d7f2481da2a6271d606a218039b88bb7fbeb95c7e003514f4d" }, "downloads": -1, "filename": "diffino-0.1.8.tar.gz", "has_sig": false, "md5_digest": "6d1addaac2a27b1d83fdfaeae5e90138", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5484, "upload_time": "2019-05-08T22:50:21", "url": "https://files.pythonhosted.org/packages/d7/5d/e30949e4fe8266da810659c699563863523ac4af51aac865287b332b0ec6/diffino-0.1.8.tar.gz" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "488a25e80d9d5247db183f55610b48a7", "sha256": "adfa9dda9bea39d2d9ec2cfae603084d902739bcc900930e80c75a9edeaea5ca" }, "downloads": -1, "filename": "diffino-0.2.0-py2-none-any.whl", "has_sig": false, "md5_digest": "488a25e80d9d5247db183f55610b48a7", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 7278, "upload_time": "2019-05-21T17:48:19", "url": "https://files.pythonhosted.org/packages/89/eb/13c322d718c7ce843df388f2b6925df0475ec99eedc61189ce4597a8cefa/diffino-0.2.0-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "632d504bfe3c81e0dadccc9407e9b777", "sha256": "5ba40998b04cb4a7e923d50c930b38abde346c7422dfc98c62a2898f7e4b9d36" }, "downloads": -1, "filename": "diffino-0.2.0.tar.gz", "has_sig": false, "md5_digest": "632d504bfe3c81e0dadccc9407e9b777", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5930, "upload_time": "2019-05-21T17:48:20", "url": "https://files.pythonhosted.org/packages/19/da/eb268a3ddbec6efc239be0a41069ca9c9f80d1abfb16005ceeca3901e7d1/diffino-0.2.0.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "7918b87cb2e205d03260817ccef0159e", "sha256": "d7393054f7d33de02f444fdd1b6ab2e358bb75f592c561b35ad97643d0eadc7c" }, "downloads": -1, "filename": "diffino-0.2.1-py2-none-any.whl", "has_sig": false, "md5_digest": "7918b87cb2e205d03260817ccef0159e", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 7274, "upload_time": "2019-05-22T20:31:51", "url": "https://files.pythonhosted.org/packages/4f/de/08ef2aa08d3c2224e8d9d78bb2535e7849c110a3741bf99bfb9668b34d0a/diffino-0.2.1-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "0ae4447e8e0aa74e8549013aa55a03f5", "sha256": "14b625007e96626a12ffdafbb0832849a5fc6783b0844e9f2fa16b97d9c630f8" }, "downloads": -1, "filename": "diffino-0.2.1.tar.gz", "has_sig": false, "md5_digest": "0ae4447e8e0aa74e8549013aa55a03f5", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5932, "upload_time": "2019-05-22T20:31:53", "url": "https://files.pythonhosted.org/packages/69/5a/8f6c379905d6c466748baad39a7444f42091603c5853a84a005521294a3a/diffino-0.2.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "7918b87cb2e205d03260817ccef0159e", "sha256": "d7393054f7d33de02f444fdd1b6ab2e358bb75f592c561b35ad97643d0eadc7c" }, "downloads": -1, "filename": "diffino-0.2.1-py2-none-any.whl", "has_sig": false, "md5_digest": "7918b87cb2e205d03260817ccef0159e", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 7274, "upload_time": "2019-05-22T20:31:51", "url": "https://files.pythonhosted.org/packages/4f/de/08ef2aa08d3c2224e8d9d78bb2535e7849c110a3741bf99bfb9668b34d0a/diffino-0.2.1-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "0ae4447e8e0aa74e8549013aa55a03f5", "sha256": "14b625007e96626a12ffdafbb0832849a5fc6783b0844e9f2fa16b97d9c630f8" }, "downloads": -1, "filename": "diffino-0.2.1.tar.gz", "has_sig": false, "md5_digest": "0ae4447e8e0aa74e8549013aa55a03f5", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5932, "upload_time": "2019-05-22T20:31:53", "url": "https://files.pythonhosted.org/packages/69/5a/8f6c379905d6c466748baad39a7444f42091603c5853a84a005521294a3a/diffino-0.2.1.tar.gz" } ] }