{ "info": { "author": "", "author_email": "seiji.armstrong@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Topic :: Scientific/Engineering :: Artificial Intelligence" ], "description": "# seipy\n\nHelper functions for the python data science stack as well as spark, AWS, jupyter.\n\n## What is it\n\nThis library contains helpers and wrappers for common data science libraries in the python stack:\n- pandas\n- numpy\n- scipy\n- sklearn\n- matplotlib\n- pyspark\n\nThere are also functions that simplify common manipulations for machine learning and data science\nin general, as well as interfacing with the following tools:\n- s3\n- jupyter\n- aws\n- spark SQL\n\n## Installation\n```\n# PyPI\npip install seipy\n```\n\n## Here are some examples\n\n### pandas\n\n#### Apply function to unique DataFrame entries only (for speedup)\n```\nfrom seipy import apply_uniq\ndf2 = apply_uniq(df, orig_col, new_col, _func)\n```\nThis will return the same DataFrame as performing:\n`df[new_col] = df[orig_col].apply(_func)`\nbut is much more performant when there are many duplicate entries in `orig_col`.\n\nIt works by performing the function `_func` only on the unique entries and then merging with the original DataFrame.\nOriginally answered on stack overflow:\nhttps://stackoverflow.com/questions/46798532/how-do-you-effectively-use-pd-dataframe-apply-on-rows-with-duplicate-values/\n\n#### Filtering DataFrame with multiple conditions\n```\nfrom seipy import filt\n# example with keyword arguments\nfilt(df,\n season=\"summer\",\n age=(\">\", 18),\n sport=(\"isin\", [\"Basketball\", \"Soccer\"]),\n name=(\"contains\", \"Armstrong\")\n )\n\n# example with dict notation\na = {'season': \"summer\", 'age': (\">\", 18)}\nfilt(df, **a)\n```\n\n### linear algebra\n\n```\nfrom seipy import distmat\ndistmat()\n```\nThis will prints possible distance metrics such as \"euclidean\" \"chebyshev\", \"hamming\".\n\n```\ndistmat(fframe, metric)\n```\nThis generates a distance matrix using `metric`.\nNote, this function is a wrapper of scipy.spatial.distance.cdist\n\n\n### jupyter\n\n```\nfrom seipy import notebook_contains\nnotebook_contains(search_str,\n on_docker=False,\n git_dir='~/git/experiments/',\n start_date='2015-01-01', end_date='2018-12-31')\n```\nPrints a list of notebooks that contain the str `search_str`.\nVery useful for these situations: \"Where's that notebook where I was trying that one thing that one time?\"\n\n### s3\n```\nfrom seipy import s3zip_func\ns3zip_func(s3zip_path, _func, cred_fpath=cred_fpath, **kwargs)\n```\nThis one's kinda nice. It allows one to apply a function `_func` to each subfile in a zip file sitting on s3.\nI use it to filter and enrich some csv files that periodically get zipped to s3, for example.\n\n\n### spark and s3 on jupyter\n\n```\nfrom seipy import s3spark_init\nspark = s3spark_init(cred_fpath)\n```\nReturns `spark`, a `SparkSession` that makes it possible to interact with s3 from jupyter notebooks.\n`cred_fpath` is the file path to the aws credentials file containing your keys.\n\n\n### Miscellaneous\n\n```\nfrom seiji import merge_two_dicts\nmerge_two_dicts(dict_1, dict_2)\n```\nReturns the merged dict `{**dict_1, **dict_2}`.\nAn extension for mulitple dicts is `reduce(lambda d1,d2: {**d1,**d2}, dict_args[0])`\n\n### Getting help\n\nPlease either post an issue on this github repo, or email the author `seiji dot armstrong at gmail` with feedback,\nfeature requests, or to complain that something doesn't work as expected.\n\n\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/Seiji-Armstrong/seipy", "keywords": "pandas numpy spark jupyter data-science machine-learning s3", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "seipy", "package_url": "https://pypi.org/project/seipy/", "platform": "", "project_url": "https://pypi.org/project/seipy/", "project_urls": { "Homepage": "https://github.com/Seiji-Armstrong/seipy" }, "release_url": "https://pypi.org/project/seipy/1.3.2/", "requires_dist": [ "boto3", "ipython", "matplotlib", "numpy", "pandas", "pandas", "pyspark", "scapy-python3", "scikit-learn", "scipy", "seaborn" ], "requires_python": ">=3", "summary": "Helper functions for data science", "version": "1.3.2" }, "last_serial": 3962212, "releases": { "1.2.0b1": [ { "comment_text": "", "digests": { "md5": "344c56d13c648e995b12d6f39190b6d7", "sha256": "02c5146caf527e645273e019b8a573f01040cc56f0bbcf1fc2d668968a7f686d" }, "downloads": -1, "filename": "seipy-1.2.0b1-py3-none-any.whl", "has_sig": false, "md5_digest": "344c56d13c648e995b12d6f39190b6d7", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 19780, "upload_time": "2017-12-28T06:20:40", "url": "https://files.pythonhosted.org/packages/a9/8f/c4bfc9495b122230ffb6c18b6bcde275ea3e64ff0fce88f451ce244c855b/seipy-1.2.0b1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "60e98f6f6036fc989b85e8d0de6ac4bc", "sha256": "a0c974e5394ba4f7197c34418c6778292bdd8cff6ff1f555d7250e62fe84dc26" }, "downloads": -1, "filename": "seipy-1.2.0b1.tar.gz", "has_sig": false, "md5_digest": "60e98f6f6036fc989b85e8d0de6ac4bc", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 15393, "upload_time": "2017-12-28T06:20:41", "url": "https://files.pythonhosted.org/packages/51/3c/3f2cbf8619ca55d7b74b413803d57510a940b21be301b30f79a142c300df/seipy-1.2.0b1.tar.gz" } ], "1.3.0": [ { "comment_text": "", "digests": { "md5": "31313602f2e256b402f9764a737449f7", "sha256": "e2f47a05f2aaf3230bfee99d1added87f84ced895005cf3c753b311c6b6bd591" }, "downloads": -1, "filename": "seipy-1.3.0-py3-none-any.whl", "has_sig": false, "md5_digest": "31313602f2e256b402f9764a737449f7", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 21358, "upload_time": "2018-03-14T22:49:58", "url": "https://files.pythonhosted.org/packages/18/ba/85fc9a37675d60a96afdeb44574536b20fdaf97d8472cb418c7125cc701f/seipy-1.3.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "d7556285a4a58e0045f748241aeff4d3", "sha256": "0cd0262e45325a908f93d1faba07d666a44a807b32a2a08e61816fa98fe31af0" }, "downloads": -1, "filename": "seipy-1.3.0.tar.gz", "has_sig": false, "md5_digest": "d7556285a4a58e0045f748241aeff4d3", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 16995, "upload_time": "2018-03-14T22:50:01", "url": "https://files.pythonhosted.org/packages/d1/c9/062bef7214d267c924ef855279b9c24cb571ae2ed52864c294e2e0e7d68c/seipy-1.3.0.tar.gz" } ], "1.3.1": [ { "comment_text": "", "digests": { "md5": "4acef174d32beecc3eaa1eb221b98ef5", "sha256": "60780aaec8bdffbadbf552006e46ebd03cef70d051c00ec396c9d811bfd112b9" }, "downloads": -1, "filename": "seipy-1.3.1-py3-none-any.whl", "has_sig": false, "md5_digest": "4acef174d32beecc3eaa1eb221b98ef5", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 21425, "upload_time": "2018-06-14T17:13:56", "url": "https://files.pythonhosted.org/packages/88/d4/8c9583b53d173e5c4a2c897a3416162b8753c7ec6466707d98dc6dd65ee1/seipy-1.3.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "fbd8b7e7d70a4d4448410644a3121aa7", "sha256": "42c6be1cef625f261d97dc1a5c7dfc5e7feb53b368213416b40f61cd01bd6e75" }, "downloads": -1, "filename": "seipy-1.3.1.tar.gz", "has_sig": false, "md5_digest": "fbd8b7e7d70a4d4448410644a3121aa7", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 17058, "upload_time": "2018-06-14T17:13:58", "url": "https://files.pythonhosted.org/packages/04/e7/888dc276ee6139f4e85844d3d8b738232109819e714882a3dd3dc0182c3a/seipy-1.3.1.tar.gz" } ], "1.3.2": [ { "comment_text": "", "digests": { "md5": "ecf290c3e7d6204fc016cf50f5516c82", "sha256": "d4e2613e63d53f55c86f7c80eb62eb4942de0553d9b5107f626db31e2b49e1a7" }, "downloads": -1, "filename": "seipy-1.3.2-py3-none-any.whl", "has_sig": false, "md5_digest": "ecf290c3e7d6204fc016cf50f5516c82", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 21424, "upload_time": "2018-06-14T17:31:46", "url": "https://files.pythonhosted.org/packages/78/e6/8b8501c38653dc998cdbdd4aca4acc04ef6dec53e8c7d18017b20e60cbf3/seipy-1.3.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "1f4aaa1af5370122ee0952d3ccdda394", "sha256": "bc7aa12b6d1428bdd22fa7150ebee06be942113df73240b08deec3cb6fcbbc79" }, "downloads": -1, "filename": "seipy-1.3.2.tar.gz", "has_sig": false, "md5_digest": "1f4aaa1af5370122ee0952d3ccdda394", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 17054, "upload_time": "2018-06-14T17:31:48", "url": "https://files.pythonhosted.org/packages/97/eb/bcdcd82b6d3a74ba998ab9b0eefb9794ad7235c3a63e45427ddfdf80e451/seipy-1.3.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "ecf290c3e7d6204fc016cf50f5516c82", "sha256": "d4e2613e63d53f55c86f7c80eb62eb4942de0553d9b5107f626db31e2b49e1a7" }, "downloads": -1, "filename": "seipy-1.3.2-py3-none-any.whl", "has_sig": false, "md5_digest": "ecf290c3e7d6204fc016cf50f5516c82", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 21424, "upload_time": "2018-06-14T17:31:46", "url": "https://files.pythonhosted.org/packages/78/e6/8b8501c38653dc998cdbdd4aca4acc04ef6dec53e8c7d18017b20e60cbf3/seipy-1.3.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "1f4aaa1af5370122ee0952d3ccdda394", "sha256": "bc7aa12b6d1428bdd22fa7150ebee06be942113df73240b08deec3cb6fcbbc79" }, "downloads": -1, "filename": "seipy-1.3.2.tar.gz", "has_sig": false, "md5_digest": "1f4aaa1af5370122ee0952d3ccdda394", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 17054, "upload_time": "2018-06-14T17:31:48", "url": "https://files.pythonhosted.org/packages/97/eb/bcdcd82b6d3a74ba998ab9b0eefb9794ad7235c3a63e45427ddfdf80e451/seipy-1.3.2.tar.gz" } ] }