{ "info": { "author": "Donald \"Max\" Ziff", "author_email": "ziff@verticloud.com", "bugtrack_url": null, "classifiers": [], "description": "s3copy\n======\n\n**s3copy** - multi-threaded, fault-tolerant, bucket-to-bucket copy for\ns3\n\nUsage\n-----\n\nSimple example:\n\n::\n\n s3copy s3://source/path s3://dest/path\n\nAll the files in the source bucket contained in path are copied to the\ndestination bucket at the specified path (paths may be omitted for\neither source or destination).\n\nSome common arguments:\n\n::\n\n s3copy s3://source/path s3://dest/path -n\n\nThe above example is a dry-run (-n), so it does no copying but indicates\nwhat copying it would do.\n\n::\n\n s3copy s3://source/path s3://dest/path -t 20 -l DEBUG -L log.txt\n\nThe above example uses 20 threads, prints log messages down to the DEBUG\nlevel, and appends its log to a file named log.txt.\n\n::\n\n s3copy --help\n\nThe above example prints help.\n\nSource and Destination Paths\n----------------------------\n\nPaths behave similarly to the unix \"cp -r\" command. Suppose for example\nthat the source bucket contains the following files:\n\n::\n\n folder-001/test-001\n folder-001/test-002\n\nand suppose in each case below that the destination folder is initially\nempty. Then these commands have these results:\n\n+------------------------------------------+-------------------------------------------------+\n| Command | Result files in dest |\n+==========================================+=================================================+\n| s3copy s3://source s3://dest | folder-001/test-001 folder-001/test-002 |\n+------------------------------------------+-------------------------------------------------+\n| s3copy s3://source s3://dest/foo | foo/folder-001/test-001 foo/folder-001/test-002 |\n+------------------------------------------+-------------------------------------------------+\n| s3copy s3://source/folder-001 s3://dest | folder-001/test-001 folder-001/test-002 |\n+------------------------------------------+-------------------------------------------------+\n|s3copy s3://source/folder-001/* s3://dest | test-001 test-002 |\n+------------------------------------------+-------------------------------------------------+\n\nGlobbing support is limited to a single ``*`` at the end of a pattern.\nInternal ``*`` and other pattern characters are not supported.\n\nRestart and Sync\n----------------\n\ns3copy is fault-tolerant and restartable. If a copy run is interrupted\nor fails for any reason, you can repeat the command and it will complete\nthe operation. It does this by checking the md5s of the individual\nsource and destination files. In this way, s3copy can also be used to\nsync buckets. If an s3copy command succeeeds and the source bucket\nsubsequently changes, with the addition of new files or changes to\nexisting files, the same command can be run again and only the changes\nwill be copied. Note that additions and changes are supported but\ndeletions are not: if a file is deleted from the source and the command\nis rerun, the corresponding file in the destination will not deleted.\n\nMultipart File Handling\n-----------------------\n\nAmazon S3 can currently store individual files up to a limit of 5\nterabytes (http://aws.amazon.com/s3/faqs/), however files above 5GB must\nbe stored as \"multipart\" files. Multipart files are created and copied\nin multiple transactions, using special apis that specify a byte offset\nand length.\n\nMultipart files present special difficulties for copying:\n\n- S3 does not expose the md5 of the entire file, nor of its constituent\n parts. It does report the md5 of a part as it is copied. It also\n exposes an \"etag\" of the file, which resembles an md5 hash (discussed\n further below).\n- S3 does not permit creating a simple, non-multipart file directly\n from a portion of a multipart file. That is, when copying a portion,\n both the source and target must be multipart.\n\nIn addition, although files larger than 5GB *must* be multipart, files\nsmaller than 5GB *may* be. For example, it is faster and more reliable\nto create a 5GB file (the maximum non-multipart file) in 80 chunks of\n64MB each, rather than as a single 5GB file. For these reasons, s3copy\nhandles multipart files bigger and smaller than 5GB.\n\nMultipart files are copied in parts. The default part size is 64MB, but\nthis can be controlled by the ``-p`` option. For validation and\nrestartability, the part files are copied in three steps:\n\n1. Each part is copied to a multipart file with a single part in a temp\n area on the destination bucket. As noted above, one can not create a\n single part file directly from a part of a multipart file. As this\n copy occurs, s3copy keeps the md5 of the part in memory (since, as\n noted above, S3 does not expose the md5 of a multipart file, even if\n it has only one part).\n2. Each part is then copied to a temporary simple file. S3 does expose\n the md5 of this file. s3copy validates that the md5 of the file\n matches the md5 of the portion which S3 reported in step 1.\n3. The final target file is created as a multipart file from the\n constituent simple files.\n\nThis process means that the data bytes of a multipart file are copied\nthree times, whereas in principle they could have been copied only once,\nas a multipart copy of the source to the target. The extra copies enable\nrestartability and validation, as follows:\n\n- Restartability: steps one and two create durable temp files which\n s3copy can observe to determine which parts of a multipart copy have\n already occurred.\n- Validation: During the multipart copy in step three, the md5s of each\n component can be revalidated against the stored md5s of the parts.\n\nIt should be noted that only step one involves data transfer between the\nsource and destination buckets. Steps two and three are copies from the\ndestination bucket to itself, and are thus inside the same S3 data\ncenter. Data transfer inside a data center is very fast and is\n(currently) free. Amazon only charges for data transfer between data\ncenters, and this methodology does the minimum cross-data-center\ncopying. The author's experience is that the restartability justifies\nthe longer throughput time.\n\nBy default, the temporary files are created in a directory named \"temp\",\nnamed as follows. For each multipart source file, for example, named\n\"source/path/to/file-to-split\", we use a directory named\n\"dest/temp/path/to/file-to-split//parts\". The versionid is the first 7\ncharacters of the \"etag\" of the source multi-part file. Inside that\ndirectory, the temporary 1-part-multipart files are named \"temp-00001\",\n\"temp-00002\", etc., and the single-part files are named \"part-00001\",\n\"part-00002\", etc. In some situations, such as subsequent transfer to\nhdfs, retaining the part files may be useful. By default, all these\nfiles are retained.\n\nSecurity and Access\n-------------------\n\nIf needed, you can supply two credentials sets to s3copy: one for\naccessing the source bucket and copying to the destination and one for\nlisting the destination bucket. This is useful for cross-account copying\nwhen you are given a set of credentials to access the source, but those\ncredentials can not list the destination. Supply a second set of\ncredentials by using the ``-d`` option.\n\nYou can grant access to files created on the destination by specifying\nthe ``--acl-grant`` option. The grantee must be an email address.\n\nTesting Notes\n------------\n\nTo run the tests, edit the s3copy_test_settings.py for your local environment, then run this command:\n\n python test_s3copy\n\nOn-line Help\n------------\n\nThis is the current built-in on-line help:\n\n::\n\n s3copy --help\n\n usage: s3copy [-h] [-n] [-f FILE [FILE ...]] [-p PREFIX [PREFIX ...]]\n [-F FILES] [-P PREFIXES] [-a AWS_ACCESS_KEY] [-k AWS_SECRET_KEY]\n [-c S3CFG_FILE] [-d DEST_S3CFG_FILE] [--acl-grant ACL_GRANT]\n [-t NUM_THREADS] [-l LOG_LEVEL] [-L LOG_DEST]\n source_bucket [dest_bucket]\n\n Multithreaded multipart copier for Amazon S3\n\n positional arguments:\n source_bucket source bucket/path\n dest_bucket destination bucket/path\n\n optional arguments:\n -h, --help show this help message and exit\n -n, --dry-run do no work but report what work would be done\n -f FILE [FILE ...], --file FILE [FILE ...]\n source file[s] to copy\n -p PREFIX [PREFIX ...], --prefix PREFIX [PREFIX ...]\n source prefix[es] to copy\n -F FILES, --files FILES\n file containing a list of files to copy\n -P PREFIXES, --prefixes PREFIXES\n file containing a list of prefixes to copy\n -a AWS_ACCESS_KEY AWS Access Key\n -k AWS_SECRET_KEY AWS Secret Key\n -c S3CFG_FILE, --config_file S3CFG_FILE\n s3cmd-format config file\n -d DEST_S3CFG_FILE, --dest-config DEST_S3CFG_FILE\n s3cmd-format config file for destination bucket only\n --acl-grant ACL_GRANT\n acl to grant as PERMISSION:EMAIL\n -t NUM_THREADS number of threads (default: 40)\n -l LOG_LEVEL logging level\n -L LOG_DEST logging file (appended)", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://pypi.python.org/pypi/S3Copy/", "keywords": null, "license": "LICENSE.txt", "maintainer": null, "maintainer_email": null, "name": "s3copy", "package_url": "https://pypi.org/project/s3copy/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/s3copy/", "project_urls": { "Download": "UNKNOWN", "Homepage": "http://pypi.python.org/pypi/S3Copy/" }, "release_url": "https://pypi.org/project/s3copy/0.0.2/", "requires_dist": null, "requires_python": null, "summary": "Multi-threaded, fault-tolerant, bucket-to-bucket copy for s3.", "version": "0.0.2" }, "last_serial": 799158, "releases": { "0.0.2": [ { "comment_text": "", "digests": { "md5": "23cafbbb1a4d7e51cb1a90c7b75865f1", "sha256": "f67d70e732f903c06d5cbd271588b089fd48da63d5192061b6f78d55e080d098" }, "downloads": -1, "filename": "s3copy-0.0.2.tar.gz", "has_sig": false, "md5_digest": "23cafbbb1a4d7e51cb1a90c7b75865f1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16095, "upload_time": "2013-03-26T01:10:45", "url": "https://files.pythonhosted.org/packages/04/4b/1a6a9ca2eb1085d6522dbc840d47ef6efd329bf0ef6a58a83fda53b7a200/s3copy-0.0.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "23cafbbb1a4d7e51cb1a90c7b75865f1", "sha256": "f67d70e732f903c06d5cbd271588b089fd48da63d5192061b6f78d55e080d098" }, "downloads": -1, "filename": "s3copy-0.0.2.tar.gz", "has_sig": false, "md5_digest": "23cafbbb1a4d7e51cb1a90c7b75865f1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16095, "upload_time": "2013-03-26T01:10:45", "url": "https://files.pythonhosted.org/packages/04/4b/1a6a9ca2eb1085d6522dbc840d47ef6efd329bf0ef6a58a83fda53b7a200/s3copy-0.0.2.tar.gz" } ] }