{ "info": { "author": "Dirk Petersen", "author_email": "dp@nowhere.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Environment :: Console", "Intended Audience :: Developers", "Intended Audience :: Science/Research", "Intended Audience :: System Administrators", "License :: OSI Approved :: Apache Software License", "Natural Language :: English", "Operating System :: POSIX :: Linux", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: Implementation :: CPython", "Programming Language :: Unix Shell", "Topic :: Software Development :: Libraries :: Python Modules", "Topic :: System :: Systems Administration", "Topic :: Utilities" ], "description": "slurm-toys (aka awesome-slurm)\n==============================\n\nA collection of slurm command line tools and wrappers mostly found on github\n\nwhy slurm-toys\n--------------\n\nThe purpose of slurm-toys is to package useful SLURM helper tools written in Python 3 or Shell and\npublish them into a single package on PyPI\n\ncurrently integrated toys\n-------------------------\n\nslurm-limiter\n~~~~~~~~~~~~~\n\nHPC clusters are optimized to maximize utilization for batch jobs. FairShare helps to ensure that\nall users get an appropriate amount of resources over time. However FairShare can only influence\njobs that have not started yet. If a cluster is used 100% by \"large\" users, \"small\" users become\nunhappy because they may not be able to get a single node ad hoc. Currently the only solution to\nthis problem appears to be setting hard account limits. Unfortunatelty these limits are often set\ntoo high when a cluster is busy and too low when it is idle. slurm-limiter addresses this problem by\ndynamically adjusting the limits based on overal partition/ queue load.\n\nIf you want a responsive HPC cluster this should take no longer than 5 sec:\n\n::\n\n ~$ time srun hostname \n srun: job 61004624 queued and waiting for resources\n srun: job 61004624 has been allocated resources\n gizmof171\n\n real 0m1.668s\n user 0m0.044s\n sys 0m0.012s\n\nexample use in a cron job, running every 20 min:\n\n::\n\n */20 * * * * root ( ml Python/3.6.4-foss-2016b-fh1; /app/bin/slurm-limiter -p campus \\ \n --error-email=sysadmin\\@institute.org --minaccountlimit=50 --maxaccountlimit=350 \\ \n --slaaccountlimit=300 --changestep=50 --maxpercentuse=90 \\\n --minidlenodes=5 ) >>/var/tmp/slurm-limiter.log 2>&1\n\nexample output to syslog:\n\n::\n\n ~$ grep slurm-limiter: /var/log/syslog\n Apr 15 09:40:03 gizmo-ctld slurm-limiter: INFO:slurm-limiter.85: Cores: running=689, pending=3299, total=1180, Usage=58 %, Limits: 350 / 370, Nodes: idle=101\n Apr 15 10:00:03 gizmo-ctld slurm-limiter: INFO:slurm-limiter.85: Cores: running=689, pending=3274, total=1180, Usage=58 %, Limits: 350 / 370, Nodes: idle=101\n Apr 15 10:20:03 gizmo-ctld slurm-limiter: INFO:slurm-limiter.85: Cores: running=680, pending=3241, total=1180, Usage=57 %, Limits: 350 / 370, Nodes: idle=102\n Apr 15 10:40:03 gizmo-ctld slurm-limiter: INFO:slurm-limiter.85: Cores: running=680, pending=3219, total=1180, Usage=57 %, Limits: 350 / 370, Nodes: idle=102\n\noutput of slurm-limiter --help\n\n::\n\n ~$ slurm-limiter --help\n usage: slurm-limiter [-h] [--debug] [--error-email ERROREMAIL]\n [--cluster CLUSTER] [--partition PARTITION]\n [--feature FEATURE] [--qos QOS]\n [--maxaccountlimit MAXLIMIT] [--minaccountlimit MINLIMIT]\n [--slaaccountlimit SLALIMIT]\n [--userlimitoffset USERLIMITOFFSET]\n [--changestep CHANGESTEP] [--minpending MINPENDING]\n [--maxpercentuse MAXPERCENTUSE]\n [--minidlenodes MINIDLENODES]\n\n slurm-limiter checks the current util of a slurm cluster and adjusts the\n account and user limits dynamically within certain range\n\n optional arguments:\n -h, --help show this help message and exit\n --debug, -d verbose output for all commands\n --error-email ERROREMAIL, -e ERROREMAIL\n send errors to this email address.\n --cluster CLUSTER, -M CLUSTER\n name of the slurm cluster, (default: current cluster)\n --partition PARTITION, -p PARTITION\n partition of the slurm cluster (default: entire\n cluster)\n --feature FEATURE, -f FEATURE\n filter for only this slurm feature\n --qos QOS, -q QOS slurm QOS to use for changing account limits (default:\n public)\n --maxaccountlimit MAXLIMIT, -x MAXLIMIT\n maximum account limit, never go above this (default:\n 300)\n --minaccountlimit MINLIMIT, -n MINLIMIT\n minimum account limit, never go below this (default:\n 100)\n --slaaccountlimit SLALIMIT, -t SLALIMIT\n min SLA limit that has been committed to customers,\n notify via email if breached (default: 150)\n --userlimitoffset USERLIMITOFFSET, -o USERLIMITOFFSET\n offset of userlimit from account limit, set a negative\n number for a userlimit lower than account limit\n (default: 20)\n --changestep CHANGESTEP, -s CHANGESTEP\n increase or decrease the limit by this # of cores\n (default: 10)\n --minpending MINPENDING, -i MINPENDING\n minimum number of jobs that have to be pending to take\n action (default: 50)\n --maxpercentuse MAXPERCENTUSE, -u MAXPERCENTUSE\n maximum allowed % usage in this cluster or partition\n Throttle QOS down by --changestep if exceeded.\n (default: 90)\n --minidlenodes MINIDLENODES, -w MINIDLENODES\n critical minimum number of idle nodes. Throttle QOS\n down to --minaccountlimit if exceeded. (default: 5)\n\nfuture toys\n-----------\n\nin the future we can integrate other tools, predominantly stuff found on github\n\nhttps://github.com/search?l=Python&p=1&q=slurm+&type=Repositories\n\nhttps://github.com/search?l=Shell&q=slurm+&type=Repositories\n\nnew tool\n~~~~~~~~\n", "description_content_type": "", "docs_url": null, "download_url": "https://github.com/FredHutch/slurm-toys/tarball/1.0.1", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/FredHutch/slurm-toys", "keywords": "slurm", "license": "", "maintainer": "", "maintainer_email": "", "name": "slurm-toys", "package_url": "https://pypi.org/project/slurm-toys/", "platform": "", "project_url": "https://pypi.org/project/slurm-toys/", "project_urls": { "Download": "https://github.com/FredHutch/slurm-toys/tarball/1.0.1", "Homepage": "https://github.com/FredHutch/slurm-toys" }, "release_url": "https://pypi.org/project/slurm-toys/1.0.1/", "requires_dist": null, "requires_python": "", "summary": "helper tools for the SLURM HPC workload manager used at Fred Hutch and elsewhere", "version": "1.0.1" }, "last_serial": 3767190, "releases": { "1.0.1": [ { "comment_text": "", "digests": { "md5": "0ba58b998d52d9cc0cc70327f2d6f09a", "sha256": "36edcc7a65ae4a69ff52000e996288790e4cb1e6fa6479c3f7d5a856d78aa383" }, "downloads": -1, "filename": "slurm-toys-1.0.1.tar.gz", "has_sig": false, "md5_digest": "0ba58b998d52d9cc0cc70327f2d6f09a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13877, "upload_time": "2018-04-15T23:00:05", "url": "https://files.pythonhosted.org/packages/e7/39/9975c57de9645e20a9d37416ce165dabc9f6f937d03cd78deea449cf0d75/slurm-toys-1.0.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "0ba58b998d52d9cc0cc70327f2d6f09a", "sha256": "36edcc7a65ae4a69ff52000e996288790e4cb1e6fa6479c3f7d5a856d78aa383" }, "downloads": -1, "filename": "slurm-toys-1.0.1.tar.gz", "has_sig": false, "md5_digest": "0ba58b998d52d9cc0cc70327f2d6f09a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13877, "upload_time": "2018-04-15T23:00:05", "url": "https://files.pythonhosted.org/packages/e7/39/9975c57de9645e20a9d37416ce165dabc9f6f937d03cd78deea449cf0d75/slurm-toys-1.0.1.tar.gz" } ] }