{ "info": { "author": "Alexander L. Belikoff", "author_email": "abelikoff@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Environment :: Console", "Intended Audience :: End Users/Desktop", "License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)", "Operating System :: OS Independent", "Programming Language :: Python :: 3", "Topic :: Utilities" ], "description": "# classify_bills\n\n_[Almost] automatically sort and archive PDF bills\nand statements._\n\nI get tons of electronic statements each month: from bank statements\nto credit cards statements to updated insurance policy documents. I\nstore all of them - they don't take much space and you never know when\nyou need to use one of them (I did have a case quite recently, where a\nreceipt from a bank issued _18 years ago_ had made all the difference,\nbut that's a story for some other day).\n\nOne problem with so many receipts is storing them in a way that helps\nfinding things fast (or even finding things _at all_). Renaming each\nfile by hand sticking to some uniform convention and shuffling those\nfiles around gets mind numbingly boring pretty soon.\n\nThis is where `classify_bills` script comes handy. It goes over all\nnewly downloaded bills and does a couple of things:\n\n* It tries to figure out what account each document belongs to.\n* It then figures out what date this document should be associated\n with.\n* Finally, it names the document according to the set pattern and\n places it in the right directory.\n\n\n### What `classify_bills` is not\n\nIt is not a jack of all trades. It will not download your statements\nfor you (which is a much more complex task given different websites\nhosting those documents). Neither will it OCR those documents that\ndon't come with text embedded (some places give you a PDF which has no\ntext at all).\n\nIt is also not really intelligent. At its core, it is driven by a list\nof regular expressions.\n\nStill, given the number of bills I get monthly, over the last couple\nyears, it has saved me probably _hours_ of menial, incredibly boring\nwork, so I do consider it a win.\n\n\n## Installation\n\nInstall using `pip`:\n\n```shell\npip install classify_bills\n```\n\nOnce installed, `classify_bills` should be available in your path.\n\nWhen using the source code directly or from GitHub, use\n`run_classify_bills` shell driver that will invoke the Python code\nproperly.\n\nCurrently, the `classify_bills` has the following external\ndependencies:\n\n* Python 3.x\n* `pdftotext` program (usually comes as part of `poppler` package.)\n\n\n## Configuration\n\n`classify_bills` is configured via a set of XML files, where each file\ndefines a setup for a specific bill type. Those files should be placed\nin a configuration directory, which could be specified one of the\nfollowing ways:\n\n* Via `-c` command line option.\n* Via an environment variable `$CLASSIFY_BILLS_CONFIG_DIRECTORY`\n* Using a default location `~/.classify_bills.conf.d`\n\nEach file defines patterns that might be present in the bill's text in\norder to be considered for this account. A pattern to match bill date\nalso must be specified (and must be matched) as well as a pattern to\nextract the date (via `strptime()`).\n\nThe package directory `classify_bills.config.examples` contains\nseveral examples. In particular, see `0-Example.xml` in that\ndirectory, which describes all aspects of configuration.\n\nLastly, color in the output can be disabled by setting\n`$CLASSIFY_BILLS_DISABLE_COLOR` environment variable.\n\n\n## Creating new configuration files\n\nIn general, a process of adding support for a new kind of bill works\nthe following way:\n\n* Use `pdftotext` to examine text output of a couple bills for a given\n account to determine the following:\n * Patterns in the text that could be used to _uniquely identify this\n kind of bill_ (name of a bank or service provider, URLs,\n etc). It's beter to have several specific patterns to allow future\n disambiguation between multiple accounts from the same provider\n (e.g. separate banking and investment bills from the same bank).\n * Pattern that could be used to infer the date this bill should be\n associated with.\n * Format of that date.\n* Create a new XML file (use `0-Example.xml` as a boilerplate).\n\nThis process is unfortunately not easy to automate so it has to be\ndone manually and it is a pain. However, it only has to be done once\nper each account (that is, until the provider decides to change the\nformat of the bill thus breaking the patterns, but it also doesn't\nhappen often).\n\n\n## Using `classify_bills`\n\nRunning `classify_bills` is fairly simple. You can pass either\nindividual PDF files or directories as command-line arguments. By\ndefault, it runs in dry-run mode, not making any changes. To actually\nperform all actions, run it with `-f` flag.\n\nFiles that have been successfully detected are moved to the hierarchy\nunder the output directory (which is specified either with `-o` flag\nor via an environment variable `$CLASSIFY_BILLS_OUTPUT_DIRECTORY`. The\nscript will never overwrite any existing document in the destination\ndirectory (unless it is forced to via `-w` flag).\n\n\n## Future work\n\nCurrently, the most painful aspect of using the tool is manual\nconfiguration of patterns for each bill type. There are several ideas\non how to try to make it easier: from finding and parsing all dates in\nthe bill to using neural networks to infer those facts from the\nbill. This might be an interesting direction for future work.\n\n\n# Contributing\n\nYou are welcome to contribute to this project either by submitting\nbill configuration for different institutions or by improving the\ncode and adding features. :-)\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/abelikoff/classify_bills", "keywords": "bills management organize documents", "license": "", "maintainer": "", "maintainer_email": "", "name": "classify-bills", "package_url": "https://pypi.org/project/classify-bills/", "platform": "", "project_url": "https://pypi.org/project/classify-bills/", "project_urls": { "Bug Reports": "https://github.com/abelikoff/classify_bills/issues", "Homepage": "https://github.com/abelikoff/classify_bills", "Source": "https://github.com/abelikoff/classify_bills/" }, "release_url": "https://pypi.org/project/classify-bills/1.0.1/", "requires_dist": [ "check-manifest ; extra == 'dev'", "coverage ; extra == 'test'" ], "requires_python": ">=3.5", "summary": "Automatically sort and archive PDF bills and statements", "version": "1.0.1" }, "last_serial": 5408459, "releases": { "1.0.0": [ { "comment_text": "", "digests": { "md5": "8adeac1b7645ad88f7ed9fefb6ee8e5c", "sha256": "61cbb7f2808f0cd9ea3e4d41bcb930534a29948410e99f460133563b01081f7e" }, "downloads": -1, "filename": "classify_bills-1.0.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "8adeac1b7645ad88f7ed9fefb6ee8e5c", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=3.5", "size": 24250, "upload_time": "2019-06-16T20:51:43", "url": "https://files.pythonhosted.org/packages/20/69/ac2bdbc793b2da5ee0eeab8fe6a2e581e7c19f1cfd5c71244059635f3ede/classify_bills-1.0.0-py2.py3-none-any.whl" } ], "1.0.1": [ { "comment_text": "", "digests": { "md5": "fd10162200d1a43ddbeca70d59565caa", "sha256": "277cb5e99ab5bfd907436af40670c56f4827097d6af6beef059aa0f0f25e2001" }, "downloads": -1, "filename": "classify_bills-1.0.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "fd10162200d1a43ddbeca70d59565caa", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=3.5", "size": 24245, "upload_time": "2019-06-17T04:41:56", "url": "https://files.pythonhosted.org/packages/cb/e5/b31a75811fa8fd120c9a585a3b4e267bd9032805c8d66220001333f7f78a/classify_bills-1.0.1-py2.py3-none-any.whl" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "fd10162200d1a43ddbeca70d59565caa", "sha256": "277cb5e99ab5bfd907436af40670c56f4827097d6af6beef059aa0f0f25e2001" }, "downloads": -1, "filename": "classify_bills-1.0.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "fd10162200d1a43ddbeca70d59565caa", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=3.5", "size": 24245, "upload_time": "2019-06-17T04:41:56", "url": "https://files.pythonhosted.org/packages/cb/e5/b31a75811fa8fd120c9a585a3b4e267bd9032805c8d66220001333f7f78a/classify_bills-1.0.1-py2.py3-none-any.whl" } ] }