{ "info": { "author": "Octopus Energy", "author_email": "nerds@octoenergy.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Typing :: Typed" ], "description": "[![CircleCI status](https://circleci.com/gh/octoenergy/tentaclio/tree/master.png?circle-token=df7aad11367f1ace5bce253b18efb6b21eaa65bc)](https://circleci.com/gh/octoenergy/tentaclio/tree/master)\n[![codecov](https://codecov.io/gh/octoenergy/tentaclio/branch/master/graph/badge.svg)](https://codecov.io/gh/octoenergy/tentaclio)\n\n\n# Tentaclio\n\nPython library that simplifies:\n* Handling streams from different protocols such as `file:`, `ftp:`, `sftp:`, `s3:`, ...\n* Opening database connections.\n* Managing the credentials in distributed systems.\n\nMain considerations in the design:\n* Easy to use: all streams are open via `tentaclio.open`, all database connections through `tentaclio.db`.\n* URLs are the basic resource locator and db connection string.\n* Automagic authentication for protected resources.\n* Extensible: you can add your own handlers for other schemes.\n* Pandas interaction.\n\n# Quick Examples.\n\n## Read and write streams.\n```python\nimport tentaclio\ncontents = \"\ud83d\udc4b \ud83d\udc19\"\n\nwith tentaclio.open(\"ftp://localhost:2021/upload/file.txt\", mode=\"w\") as writer: \n writer.write(contents)\n\n# Using boto3 authentication under the hood.\nbucket = \"s3://my-bucket/octopus/hello.txt\"\nwith tentaclio.open(bucket) as reader:\n print(reader.read())\n```\n\n## Copy streams\n```python\nimport tentaclio\n\nwith tentaclio.open(\"/home/constantine/data.csv\") as reader, tentaclio.open(\n \"sftp://constantine:tentacl3@sftp.octoenergy.com/uploads/data.csv\", mode=\"w\") as writer: \n) as writer:\n writer.write(reader.read())\n```\n## List resources\n```python\nimport tentaclio\n\nfor entry in tentaclio.listdir(\"s3:://mybucket/path/to/dir\"):\n print(\"Entry\", entry)\n```\n\n## Authenticated resources.\n```python\nimport os \n\nimport tentaclio \n\nprint(\"env ftp credentials\", os.getenv(\"OCTOIO__CONN__OCTOENERGY_FTP\")) \n# This prints `sftp://constantine:tentacl3@sftp.octoenergy.com/` \n\n# Credentials get automatically injected. \n\nwith tentaclio.open(\"sftp://sftp.octoenergy.com/uploads/data.csv\") as reader: \n print(reader.read()) \n```\n\n## Database connections.\n```python\nimport os \n\nimport tentaclio \n\nprint(\"env TENTACLIO__CONN__DB\", os.getenv(\"TENTACLIO__CONN__DB\")) \n\n# This prints `postgresql://octopus:tentacle@localhost:5444/example` \n\n# hostname is a wildcard, the credentials get injected. \nwith tentaclio.db(\"postgresql://hostname/example\") as pg: \n results = pg.query(\"select * from my_table\") \n```\n\n## Pandas interaction.\n```python\nimport pandas as pd # \ud83d\udc3c\ud83d\udc3c \nimport tentaclio # \ud83d\udc19 \n\ndf = pd.DataFrame([[1, 2, 3], [10, 20, 30]], columns=[\"col_1\", \"col_2\", \"col_3\"]) \n\nbucket = \"s3://my-bucket/data/pandas.csv\" \n\nwith tentaclio.open(bucket, mode=\"w\") as writer: # supports more pandas readers \n df.to_csv(writer, index=False) \n\nwith tentaclio.open(bucket) as reader: \n new_df = pd.read_csv(reader) \n\n```\n\n# Installation\n\nYou can get tentaclio using pip\n\n```sh\npip install tentaclio\n```\nor pipenv\n```sh\npipenv install tentaclio\n```\n\n## Developing. \n\nClone this repo and install [pipenv](https://pipenv.readthedocs.io/en/latest/):\n\nIn the `Makefile` you'll find some useful targets for linting, testing, etc. i.e.:\n```sh\nmake test\n```\n\n\n## How to use\nThis is how to use `tentaclio` for your daily data ingestion and storing needs.\n\n### Streams\nIn order to open streams to load or store data the universal function is:\n\n```python\nimport tentaclio \n\nwith tentaclio.open(\"/path/to/my/file\") as reader:\n contents = reader.read()\n\nwith tentaclio.open(\"s3://bucket/file\", mode='w') as writer:\n writer.write(contents)\n\n```\nAllowed modes are `r`, `w`, `rb`, and `wb`. You can use `t` instead of `b` to indicate text streams, but that's the default.\n\n\nThe supported url protocols are:\n\n* `/local/file`\n* `file:///local/file`\n* `s3://bucket/file`\n* `ftp://path/to/file`\n* `sftp://path/to/file`\n* `http://host.com/path/to/resource`\n* `https://host.com/path/to/resource`\n* `postgresql://host/database::table` will allow you to write from a csv format into a database with the same column names (note that the table goes after `::` :warning:).\n\nYou can add the credentials for any of the urls in order to access protected resources.\n\n\nYou can use these readers and writers with pandas functions like:\n\n```python\nimport pandas as pd\nimport tentaclio \n\nwith tentaclio.open(\"/path/to/my/file\") as reader:\n df = pd.read_csv(reader) \n\n[...]\n\nwith tentaclio.open(\"s3::/path/to/my/file\", mode='w') as writer:\n df.to_parquet(writer) \n```\n`Readers`, `Writers` and their closeable versions can be used anywhere expecting a file-like object; pandas or pickle are examples of such functions.\n\n### File system like operations to resources\n#### Listing resources\nSome URL schemes allow listing resources in a pythonnic way:\n```python\nimport tentaclio\n\nfor entry in tentaclio.listdir(\"s3:://mybucket/path/to/dir\"):\n print(\"Entry\", entry)\n```\n\nWhereas `listdir` might be convinient we also offer `scandir`, which returns a list of [DirEntry](https://github.com/octoenergy/tentaclio/blob/ddbc28615de4b99106b956556db74a20e4761afe/src/tentaclio/fs/scanner.py#L13)s, and, `walk`. All functions follow as closely as possible their standard library definitions.\n\n\n### Database access \n\nIn order to open db connections you can use `tentaclio.db` and have instant access to postgres, sqlite, athena and mssql. \n\n```python\nimport tentaclio\n\n[...] \n\nquery = \"select 1\";\nwith tentaclio.db(POSTGRES_TEST_URL) as client:\n result =client.query(query)\n[...]\n```\n\nThe supported db schemes are:\n\n* `postgresql://`\n* `sqlite://`\n* `awsathena+rest://`\n* `mssql://`\n\n### Automatic credentials injection \n\n1. Configure credentials by using environmental variables prefixed with `TENTACLIO__CONN__` (i.e. `TENTACLIO__CONN__DATA_FTP=sfpt://real_user:132ldsf@ftp.octoenergy.com`).\n\n2. Open a stream:\n```python\nwith tentaclio.open(\"sftp://ftp.octoenergy.com/file.csv\") as reader:\n reader.read()\n```\nThe credentials get injected into the url. \n\n3. Open a db client:\n```python\nimport tentaclio\n\nwith tentaclio.db(\"postgresql://hostname/my_data_base\") as client:\n client.query(\"select 1\")\n```\nNote that `hostname` in the url to be authenticated is a wildcard that will match any hostname. So `authenticate(\"http://hostname/file.txt\")` will be injected to `http://user:pass@octo.co/file.txt` if the credential for `http://user:pass@octo.co/` exists.\n\nDifferent components of the URL are set differently:\n- Scheme and path will be set from the URL, and null if missing.\n- Username, password and hostname will be set from the stored credentials.\n- Port will be set from the stored credentials if it exists, otherwise from the URL.\n- Query will be set from the URL if it exists, otherwise from the stored credentials (so it can be\n overriden)\n\n#### Credentials file\n\nYou can also set a credentials file that looks like:\n```\nsecrets:\n db_1: postgresql://user1:pass1@myhost.com/database_1\n db_2: postgresql://user2:pass2@otherhost.com/database_2\n ftp_server: ftp://fuser:fpass@ftp.myhost.com\n```\nAnd make it accessible to tentaclio by setting the environmental variable `TENTACLIO__SECRETS_FILE`. The actual name of each url is for traceability and has no effect in the functionality. \n\n\n## Quick note on protocols structural subtyping.\n\nIn order to abstract concrete dependencies from the implementation of data related functions (or in any part of the system really) we use typed [protocols](https://mypy.readthedocs.io/en/latest/protocols.html#simple-user-defined-protocols). This allows a more flexible dependency injection than using subclassing or [more complex approches](http://code.activestate.com/recipes/413268/). This idea is heavily inspired by how this exact thing is done in [go](https://www.youtube.com/watch?v=ifBUfIb7kdo). Learn more about this principle in our [tech blog](https://tech.octopus.energy/news/2019/03/21/python-interfaces-a-la-go.html).\n\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/octoenergy/tentaclio", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "tentaclio", "package_url": "https://pypi.org/project/tentaclio/", "platform": "", "project_url": "https://pypi.org/project/tentaclio/", "project_urls": { "Homepage": "https://github.com/octoenergy/tentaclio" }, "release_url": "https://pypi.org/project/tentaclio/0.0.1a3/", "requires_dist": [ "urllib3 (>=1.24.2)", "boto3 (<1.10,>=1.9.0)", "requests", "psycopg2-binary", "sqlalchemy (>1.3)", "PyAthena", "pysftp (<0.3,>=0.2.0)", "typing-extensions", "pandas", "click", "pyyaml" ], "requires_python": "", "summary": "Unification of data connectors for distributed data tasks", "version": "0.0.1a3" }, "last_serial": 5578290, "releases": { "0.0.1a0": [ { "comment_text": "", "digests": { "md5": "14fb95ede116a4fe20d051d597c96cd9", "sha256": "9babcc5151394b39779b400d6221478d1e219ea0f22ab5e278fdb04cd06c202d" }, "downloads": -1, "filename": "tentaclio-0.0.1a0.tar.gz", "has_sig": false, "md5_digest": "14fb95ede116a4fe20d051d597c96cd9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 27208, "upload_time": "2019-04-24T15:31:01", "url": "https://files.pythonhosted.org/packages/14/76/30feb7fa0130746df5183507a84b40c23548169e898761765126630a7275/tentaclio-0.0.1a0.tar.gz" } ], "0.0.1a1": [ { "comment_text": "", "digests": { "md5": "eb586c6d0c33490db31a22de606dbc38", "sha256": "e0321c96862ab7fc11f00fd0994e4b165cb3176ffdf6905fd31c38d4eff4aca0" }, "downloads": -1, "filename": "tentaclio-0.0.1a1.tar.gz", "has_sig": false, "md5_digest": "eb586c6d0c33490db31a22de606dbc38", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 27240, "upload_time": "2019-04-24T15:43:59", "url": "https://files.pythonhosted.org/packages/5f/7e/be3d5e858d283f2f7b0fd6f21ed797f2f714d5f7bd9af2ca4c8f5969fcac/tentaclio-0.0.1a1.tar.gz" } ], "0.0.1a2": [ { "comment_text": "", "digests": { "md5": "a1ea4a69d969d68c850dbc2080fa59a6", "sha256": "db720544ea2f965dcd5c3b97f0d66fcce269f99fcaaa5f5a52bf826cb5976aed" }, "downloads": -1, "filename": "tentaclio-0.0.1a2-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "a1ea4a69d969d68c850dbc2080fa59a6", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 34374, "upload_time": "2019-05-22T11:23:11", "url": "https://files.pythonhosted.org/packages/d5/66/bc775d0bd8673139dbfe723b31ca765e0512d05d80f6e7de5233ef86c964/tentaclio-0.0.1a2-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "46f4053d1ae4caa75d8b22ce80980cef", "sha256": "41aa401d4e2576dee5fe95d00110a4105a4d8622f8cf5ec220382166516e682a" }, "downloads": -1, "filename": "tentaclio-0.0.1a2.tar.gz", "has_sig": false, "md5_digest": "46f4053d1ae4caa75d8b22ce80980cef", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 26505, "upload_time": "2019-05-22T11:23:13", "url": "https://files.pythonhosted.org/packages/45/d5/12b3986e27047fcebc857d1c011c6fbfe3394482469a88b34d252899b6d2/tentaclio-0.0.1a2.tar.gz" } ], "0.0.1a3": [ { "comment_text": "", "digests": { "md5": "177d7e7722a44c092984abb29cf915f3", "sha256": "deda5fdd3a004c5b1d56ac62308cb6059d2465635d259d160bccf4fff58449b7" }, "downloads": -1, "filename": "tentaclio-0.0.1a3-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "177d7e7722a44c092984abb29cf915f3", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 39970, "upload_time": "2019-07-24T15:50:53", "url": "https://files.pythonhosted.org/packages/f0/96/591504d0cc87f0fc6dc4807f07df1790a2d013ba0abe750ff9b8ee51824d/tentaclio-0.0.1a3-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "0f8e52e858ed052b9530217531bd97ac", "sha256": "8c989cabfced65abbda6a18e7730a08097fac2827f217360b3858817f3cc32b0" }, "downloads": -1, "filename": "tentaclio-0.0.1a3.tar.gz", "has_sig": false, "md5_digest": "0f8e52e858ed052b9530217531bd97ac", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 29018, "upload_time": "2019-07-24T15:50:55", "url": "https://files.pythonhosted.org/packages/15/72/cb6af763616c8c160d7c8edf304a63ddd3d586b6b1359bb223000ce65634/tentaclio-0.0.1a3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "177d7e7722a44c092984abb29cf915f3", "sha256": "deda5fdd3a004c5b1d56ac62308cb6059d2465635d259d160bccf4fff58449b7" }, "downloads": -1, "filename": "tentaclio-0.0.1a3-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "177d7e7722a44c092984abb29cf915f3", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 39970, "upload_time": "2019-07-24T15:50:53", "url": "https://files.pythonhosted.org/packages/f0/96/591504d0cc87f0fc6dc4807f07df1790a2d013ba0abe750ff9b8ee51824d/tentaclio-0.0.1a3-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "0f8e52e858ed052b9530217531bd97ac", "sha256": "8c989cabfced65abbda6a18e7730a08097fac2827f217360b3858817f3cc32b0" }, "downloads": -1, "filename": "tentaclio-0.0.1a3.tar.gz", "has_sig": false, "md5_digest": "0f8e52e858ed052b9530217531bd97ac", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 29018, "upload_time": "2019-07-24T15:50:55", "url": "https://files.pythonhosted.org/packages/15/72/cb6af763616c8c160d7c8edf304a63ddd3d586b6b1359bb223000ce65634/tentaclio-0.0.1a3.tar.gz" } ] }