{ "info": { "author": "Ezekiel Hendrickson", "author_email": "ezekielh@umich.edu", "bugtrack_url": null, "classifiers": [], "description": "# ODM\n\nODM is a set of tools for administratively downloading content from\nOneDrive to a local directory tree without the involvement of the end\nuser. It also includes tools for administratively uploading local\nfiles to OneDrive or Google Drive.\n\nAs a set of relatively low-level tools, ODM is not designed to be a\nturnkey solution for migrating data. Some examples of how you might\nglue these tools together are included in the contrib directory.\n\nODM has been used at the University of Michigan for several multi-TiB\nmigrations from OneDrive to Google Drive, and an ~35 TiB migration\nbetween different Microsoft 365 tenants.\n\n## Setting up your environment\n\nThis tool was mainly written and tested using Python 2.7 on Linux. Portions of\nthe code were also tested under various versions of Python >= 3.4.\n\nFor development, we recommend using a virtualenv to install ODM's Python\ndependencies.\n\n* Run `init.sh` to set up the virtualenv\n* When you want to use ODM, source env-setup.sh (`. env-setup.sh`) to set up the\n necessary environment variables.\n\n## Credentials\n\nThe odm command requires credentials for an authorized Azure AD 2.0\nclient. Uploading OneNote files requires additional Sharepoint API\npermissions and a client certificate.\n\nThe gdm command requires credentials for an authorized Google service\naccount.\n\n### Azure AD 2.0\n\n* Register your client at https://portal.azure.com/#blade/Microsoft_AAD_RegisteredApps/ApplicationsListBlade\n * Click `New registration`\n * Give your client a name and click `Register`\n * Use the displayed `Application (client) ID` as the `client_id` in your\n ODM config.\n * Under `Certificates & secrets` select `New client secret`; use this as\n the `client_secret` in your ODM config.\n * Create a certificate\n * `openssl req -x509 -days 3650 -nodes -newkey rsa:2048 -keyout key.pem -out cert.pem -subj '/CN=odm'`\n * Use the contents of key.pem as the `client_cert_key` in your ODM config.\n * Upload the certificate, then use the displayed thumbprint as the `client_cert` in your ODM config.\n * Under `API Permissions` add the necessary application\n permissions for Microsoft Graph:\n * User.Read.All\n * Files.ReadWrite.All\n * Notes.ReadWrite.All\n * Sites.FullControl.All\n * and the necessary application permissions for Sharepoint:\n * Sites.FullControl.All\n * Grant admin consent for your tenant by clicking on the button.\n\n\n### Google Service Account\n\n* Create a project at\n https://console.developers.google.com/cloud-resource-manager\n* Inside the project, create a service account.\n * Name the account something meaningful.\n * The account does not require any roles.\n * Select `CREATE KEY` and the JSON key type; use the downloaded file as the `credentials` in your ODM config\n* Inside the project, enable the Google Drive API.\n * Open the navigation menu by clicking the three bars icon at the top left.\n * Click `APIs & Services`\n * Click `ENABLE APIS AND SERVICES`\n * Find `Google Drive API`\n * Click `ENABLE`\n* As a super-admin, authorize the scopes at https://admin.google.com/\n * Click `Security`\n * Click `Advanced settings`\n * Click `Manage API client access`\n * Enter the client ID and authorize it for\n `https://www.googleapis.com/auth/drive`\n\n## Downloading from OneDrive\n\nIndividual operations are designed to be idempotent and cleanly\nresumable. Because fetching metadata for large drives is a very\nexpensive operation (both in volume of API calls and time) it's done\nas a separate step, and the download operations use this cached\nmetadata file instead of the live API.\n\n### Fetch metadata\n\n```\nodm user ezekielh show\nodm user ezekielh list-drives\nodm user ezekielh list-items > ezekielh.json\nodm user ezekielh list-items --incremental ezekielh.json > ezekielh-$(date +%f).json\n```\n\n### Download items\n\nDownloaded files are verified as they're saved, but you can also re-check the\nstate of previously downloaded files as a separate operation, or clean up\nan existing destination folder by deleting extraneous files.\n\n```\nodm list ezekielh.json list-filenames\nodm list ezekielh.json download-estimate\nodm list ezekielh.json download --filetree /var/tmp/ezekielh\nodm list ezekielh.json verify --filetree /var/tmp/ezekielh\nodm list ezekielh.json clean-filetree --filetree /var/tmp/ezekielh\n```\n\n### Upload items\n\n```\nodm list ezekielh.json upload --filetree /var/tmp/ezekielh --upload-user flowerysong\nodm list ezekielh.json upload --filetree /var/tmp/ezekielh --upload-user flowerysong --upload-path 'other users/ezekielh'\nodm filetree /var/tmp/ezekielh upload --upload-user flowerysong --upload-path 'other users/ezekielh'\n```\n\n### Convert OneNote notebooks\n\nOneNote has a rudimentary API that allows some but not all note data to be\nextracted and converted to HTML documents.\n\n```\nodm user ezekielh list-notebooks > ezekielh-onenote.json\nodm list ezekielh-onenote.json convert-notebooks --filetree '/var/tmp/ezekielh/Exported from OneNote'\n```\n\n## Uploading to Google Drive\n\n```\ngdm filetree /var/tmp/ezekielh upload --upload-user ezekielh --upload-path \"Magically Delicious\"\ngdm filetree /var/tmp/ezekielh verify --upload-user ezekielh --upload-path \"Magically Delicious\"\n```\n\n## Known Limitations\n\n* The modification time of individual files is preserved wherever possible, but\n no attempt is made to preserve the mtime of folders.\n\n* It is possible for additional owners to be added to files in a user's OneDrive\n by poking deep into the bowels of the SharePoint web interface; this is not\n possible via the OneDrive API and no attempt is made to preserve these\n permissions.\n\n* Shared links are not recreated during upload; this functionality *is* exposed\n via the API, but it's not clear that it would be useful behaviour.\n\n* Microsoft does not like it when you upload large amounts of data,\n and may arbitrarily ban your client for a period of time even though ODM does\n incremental backoff for failed requests and respects Retry-After headers.\n We have seen this manifest as spurious `401 Unauthorized` and `503 Service\n Unavailable` responses.\n\n* Files uploaded to OneDrive will show up as last modified by `SharePoint App`;\n it is impossible to preserve the original modification information or have it\n display a less generic name.\n\n* OneDrive is normally provisioned on a JIT basis when the user accesses the\n service, rather than at the time of account creation / license assignment.\n ODM cannot upload to OneDrive unless the drive is provisioned. SharePoint\n Online [provides an API endpoint](https://docs.microsoft.com/en-us/previous-versions/office/developer/sharepoint-rest-reference/dn790354%28v=office.15%29#createpersonalsiteenqueuebulk-method)\n for requesting the asynchronous bulk creation of drives. ODM does not\n currently have the ability to call this API, but PowerShell tools exist to\n request bulk creation using SharePoint Admin credentials. Once requests were\n put into the black box we observed widely varying provisioning rates ranging\n between ~80/hour and ~240/hour.\n\n* OneDrive filenames are not case sensitive. ODM does not currently make any\n attempt to handle case discrepancies or filename collision.\n\n* OneDrive filenames can be up to 400 characters in length, while most Unix\n filesystems only allow 255 bytes (which could be as few as 63 UTF-8\n characters.) If ODM encounters a filename or path component that is more than\n 255 bytes it chunks the excess characters into leading directory components.\n Metadata-based uploads (`odm list upload`) will upload as the original name,\n but filetree-based uploads (`odm filetree` or `gdm filetree`) will not.\n\n* OneNote files can be downloaded via the OneDrive API, but they do not have an\n associated hash and do not reliably report the actual download size via the\n API so no verification is possible.\n\n* Due to a limitation in the OneNote API any notebook upload will create a\n `Notebooks` directory in the root of the drive, even if the final destination\n is not underneath that location.\n\n* OneDrive will sometimes return an incorrect file hash when listing files.\n Once the file has been downloaded, the API will often, but not always, start\n returning the correct hash.\n\n* OneNote is fragile and breaks easily. It sometimes fails to render sections\n uploaded via the API even though they are bit-exact copies of what was\n originally downloaded. We do not know why.\n\n* Files detected as malware by OneDrive's scanning cannot be downloaded via\n the API.\n\n* Microsoft use a non-standard fingerprinting method for files in OneDrive for\n Business. ODM includes an incredibly slow pure Python implementation of this\n algorithm so that file verification works out of the box, but if you are\n dealing with any significant amount of data fingerprint calculation can be\n sped up quite a bit by installing\n [libqxh](https://github.com/flowerysong/quickxorhash).\n\n## Further Information On OneNote Exports\n\n* Most of ODM's magic is like the wardrobe to Narnia. OneNote exports are more\n akin to the Lament Configuration.\n\n* The OneNote API does not return any content for certain types of page\n elements, so mathematical expressions (and possibly some other node types)\n will be lost in conversion.\n\n* The OneNote API is heavily throttled.", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/UMCollab/ODM", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "odm", "package_url": "https://pypi.org/project/odm/", "platform": "", "project_url": "https://pypi.org/project/odm/", "project_urls": { "Homepage": "https://github.com/UMCollab/ODM" }, "release_url": "https://pypi.org/project/odm/2.0.0/", "requires_dist": null, "requires_python": "", "summary": "Storage Magic", "version": "2.0.0" }, "last_serial": 5852280, "releases": { "1.0.0": [ { "comment_text": "", "digests": { "md5": "9dfefc411a056a8fad0c798e4467968c", "sha256": "8ae6c73da87c9747d5dafdf810142ed29907a1316b7275386be09d588c6ac9df" }, "downloads": -1, "filename": "odm-1.0.0.tar.gz", "has_sig": false, "md5_digest": "9dfefc411a056a8fad0c798e4467968c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 21292, "upload_time": "2018-08-15T20:49:47", "url": "https://files.pythonhosted.org/packages/fa/12/467372579ed2c6b24a1ffce70a932e6d54a39bfa97663cbd8fc22181a33c/odm-1.0.0.tar.gz" } ], "2.0.0": [ { "comment_text": "", "digests": { "md5": "d664784d84d6636d6d5113b935b1a709", "sha256": "39ba171c3554a8bbe6c00a37a7fa04acf887f34e308cef3234232c5ec32b0e17" }, "downloads": -1, "filename": "odm-2.0.0.tar.gz", "has_sig": false, "md5_digest": "d664784d84d6636d6d5113b935b1a709", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 34833, "upload_time": "2019-09-18T20:01:52", "url": "https://files.pythonhosted.org/packages/f0/dc/48b0ee27db94826e6657a5512c4bf638eb1b41cd89396016c660e6b29d55/odm-2.0.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "d664784d84d6636d6d5113b935b1a709", "sha256": "39ba171c3554a8bbe6c00a37a7fa04acf887f34e308cef3234232c5ec32b0e17" }, "downloads": -1, "filename": "odm-2.0.0.tar.gz", "has_sig": false, "md5_digest": "d664784d84d6636d6d5113b935b1a709", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 34833, "upload_time": "2019-09-18T20:01:52", "url": "https://files.pythonhosted.org/packages/f0/dc/48b0ee27db94826e6657a5512c4bf638eb1b41cd89396016c660e6b29d55/odm-2.0.0.tar.gz" } ] }