{ "info": { "author": "Josh Dimarsky", "author_email": "", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "Intended Audience :: Financial and Insurance Industry", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Programming Language :: Python :: 3.8", "Programming Language :: SQL", "Topic :: Database" ], "description": "# bcpandas\n\n[](https://github.com/psf/black)\n[](https://pypi.org/project/bcpandas/)\n[](https://anaconda.org/conda-forge/bcpandas)\n[](https://github.com/yehoshuadimarsky/bcpandas/blob/master/LICENSE)\n[](https://pypi.python.org/pypi/bcpandas/)\n[](https://pypi.python.org/pypi/bcpandas/)\n[](https://github.com/Naereen/badges)\n\n\nHigh-level wrapper around BCP for high performance data transfers between pandas and SQL Server. No knowledge of BCP required!!\n\n## Quickstart\n\n```python\nIn [1]: import pandas as pd\n ...: import numpy as np\n ...: \n ...: from bcpandas import SqlCreds, to_sql, read_sql\n\nIn [2]: creds = SqlCreds(\n ...: 'my_server',\n ...: 'my_db',\n ...: 'my_username',\n ...: 'my_password'\n ...: )\n\nIn [3]: df = pd.DataFrame(\n ...: data=np.ndarray(shape=(10, 6), dtype=int), \n ...: columns=[f\"col_{x}\" for x in range(6)]\n ...: )\n\nIn [4]: df\nOut[4]: \n col_0 col_1 col_2 col_3 col_4 col_5\n0 4128860 6029375 3801155 5570652 6619251 7536754\n1 4849756 7536751 4456552 7143529 7471201 7012467\n2 6029433 6881357 6881390 7274595 6553710 3342433\n3 6619228 7733358 6029427 6488162 6357104 6553710\n4 7536737 7077980 6422633 7536732 7602281 2949221\n5 6357104 7012451 6750305 7536741 7340124 7274610\n6 7340141 6226036 7274612 7077999 6881387 6029428\n7 6619243 6226041 6881378 6553710 7209065 6029415\n8 6881378 6553710 7209065 7536743 7274588 6619248\n9 6226030 7209065 6619231 6881380 7274612 3014770\n\nIn [5]: to_sql(df, 'my_test_table', creds, index=False, if_exists='replace')\n\nIn [6]: df2 = read_sql('my_test_table', creds)\n\nIn [7]: df2\nOut[7]: \n col_0 col_1 col_2 col_3 col_4 col_5\n0 4128860 6029375 3801155 5570652 6619251 7536754\n1 4849756 7536751 4456552 7143529 7471201 7012467\n2 6029433 6881357 6881390 7274595 6553710 3342433\n3 6619228 7733358 6029427 6488162 6357104 6553710\n4 7536737 7077980 6422633 7536732 7602281 2949221\n5 6357104 7012451 6750305 7536741 7340124 7274610\n6 7340141 6226036 7274612 7077999 6881387 6029428\n7 6619243 6226041 6881378 6553710 7209065 6029415\n8 6881378 6553710 7209065 7536743 7274588 6619248\n9 6226030 7209065 6619231 6881380 7274612 3014770\n```\n\n## Requirements\n- BCP Utility\n - [Windows](https://docs.microsoft.com/en-us/sql/tools/bcp-utility)\n- SqlCmd Utility\n - [Windows](https://docs.microsoft.com/en-us/sql/tools/sqlcmd-utility)\n- python >= 3.6\n- pandas\n\n## Benchmarks\n_# TODO_\n\n## Installation\nYou can download and install this package from PyPI\n\n```\npip install bcpandas\n```\n\nor from conda\n```\nconda install -c conda-forge bcpandas\n```\n\n## Caveats and Limitations\n\nHere are some caveats and limitations of bcpandas. Hopefully they will be addressed in future releases\n* In the `to_sql` function:\n * If `replace` is passed to the `if_exists` parameter, the new SQL table created will make the columns all of `NVARCHAR(MAX)` type.\n * If `append` is passed to the `if_exists` parameter, if the dataframe columns don't match the SQL table columns exactly by both name and order, it will fail.\n * If there is a NaN/Null in the last column of the dataframe it will throw an error. This is due to a BCP issue. See my issue with Microsoft about this [here](https://github.com/MicrosoftDocs/sql-docs/issues/2689) .\n * Bcpandas attempts to use a delimiter that is not present in the dataframe. This is because BCP does __not__ ignore delimiter characters when surrounded by quotes, unlike CSVs - see [here](https://docs.microsoft.com/en-us/sql/relational-databases/import-export/specify-field-and-row-terminators-sql-server#characters-supported-as-terminators) in the Microsoft docs. Therefore, if all possible delimiter characters are present in the dataframe and bcpandas cannot find a delimiter to use, it will throw an error.\n * Possible delimiters are specified in `constants.py` .\n* Currently the STDOUT stream from BCP and SqlCmd is not asynchronous.\n* Currently this is being built with Windows in mind. Linux support is definitely easily added, it's just not in the immediate scope of the project yet. PRs are welcome.\n\n## Motivations and Design\n### Overview\nReading and writing data from pandas DataFrames to/from a SQL database is very slow using the built-in `read_sql` and `to_sql` methods, even with the newly introduced `execute_many` option. For Microsoft SQL Server, a far far faster method is to use the BCP utility provided by Microsoft. This utility is a command line tool that transfers data to/from the database and flat text files.\n\nThis package is a wrapper for seamlessly using the bcp utility from Python using a pandas DataFrame. Despite the IO hits, the fastest option by far is saving the data to a CSV file in the file system and using the bcp utility to transfer the CSV file to SQL Server. **Best of all, you don't need to know anything about using BCP at all!**\n\n### Existing Solutions\n\n
| Name | \nGitHub | \nPyPI | \n
| bcpy | \nhttps://github.com/titan550/bcpy | \nhttps://pypi.org/project/bcpy | \n
| magical-sqlserver | \nhttps://github.com/brennoflavio/magical-sqlserver | \nhttps://pypi.org/project/magical-sqlserver/ | \n