{ "info": { "author": "haoxintong", "author_email": "haoxintongpku@gmail.com", "bugtrack_url": null, "classifiers": [], "description": "Gluon Audio Toolkit\n===================\n\nGluon Audio is a toolkit providing deep learning based audio recognition\nalgorithm. The project is still under development, and only Chinese\nintroduction will be provided.\n\nGluonAR Introduction:\n---------------------\n\nGluonAR is based on MXnet-Gluon, if you are new to it, please check out\n`dmlc 60-minute crash course `__.\n\n\u867d\u7136\u540d\u5b57\u53ebGluonAR, \u4f46\u662f\u76ee\u524d\u4ee5\u53ca\u53ef\u4ee5\u9884\u89c1\u7684\u65f6\u95f4\u5185\u53ea\u6709Text-Independent\nSpeaker Recognition\u7684\u5185\u5bb9.\n\n\u5df2\u7ecf\u5b9e\u73b0\u7684feature: - \u4f7f\u7528ffmpeg\u7684pythonic binding\n``av``\\ \u548c\\ ``librosa``\\ \u505aaudio\u6570\u636e\u8bfb\u53d6 - \u6a21\u5757\u652f\u6301\\ ``Hybridize()``.\nforward\u9636\u6bb5\u4e0d\u4f7f\u7528pysound, librosa, scipy, \u6548\u7387\u66f4\u9ad8,\n\u63d0\u4f9b\u5feb\u901f\u8bad\u7ec3\u548cend-to-end\u90e8\u7f72\u7684\u80fd\u529b, \u5305\u62ec: -\n\u57fa\u4e8e\\ ``nd.contrib.fft``\\ \u7684\u77ed\u65f6\u5085\u91cc\u53f6\u53d8\u6362(\\ ``STFTBlock``)\u548cz-score\nblock, \u76f8\u6bd4\u4f7f\u7528numpy\u548cscipy\u9884\u5904\u7406\u540e\u8f7d\u5165GPU\u8bad\u7ec3\u6548\u7387\u63d0\u9ad812%. -\n``MelSpectrogram``, ``DCT1D``, ``MFCC``, ``PowerToDB`` -\n`1808.00158 `__\\ \u4e2d\u63d0\u51fa\u7684\\ ``SincBlock``\n- gluon\u98ce\u683c\u7684VOX\u6570\u636e\u96c6\u8f7d\u5165 - \u7c7b\u4f3c\u4eba\u8138\u9a8c\u8bc1\u7684Speaker Verification -\n\u4f7f\u7528\u9891\u8c31\u56fe\u8bad\u7ec3\u58f0\u7eb9\u7279\u5f81\u7684\u4f8b\u5b50, \u5728VOX1\u4e0a\u76841:1\u9a8c\u8bc1acc: 0.941152+-0.004926\n\nexample:\n\n.. code:: python\n\n import numpy as np\n import mxnet as mx\n import librosa as rosa\n from gluonar.utils.viz import view_spec\n from gluonar.nn.basic_blocks import STFTBlock\n\n data = rosa.load(r\"resources/speaker_recognition/speaker0_0.m4a\", sr=16000)[0][:35840]\n nd_data = mx.nd.array([data], ctx=mx.gpu())\n\n stft = STFTBlock(35840, hop_length=160, win_length=400)\n stft.initialize(ctx=mx.gpu())\n\n # stft block forward\n ret = stft(nd_data).asnumpy()[0][0]\n spec = np.transpose(ret, (1, 0)) ** 2\n view_spec(spec)\n\n # stft in librosa \n spec = rosa.stft(data, hop_length=160, win_length=400, window=\"hamming\")\n spec = np.abs(spec) ** 2\n view_spec(spec)\n\n\u8f93\u51fa:\n\n+-------------+-------------------+\n| STFTBlock | STFT in librosa |\n+=============+===================+\n+-------------+-------------------+\n\n\u66f4\u591a\u7684\u4f8b\u5b50\u8bf7\u53c2\u8003\\ ``examples/``.\n\nRequirements\n------------\n\nmxnet-1.5.0+, gluonfr, av, librosa, ...\n\n\u97f3\u9891\u5e93\u7684\u9009\u62e9\u4e3b\u8981\u8003\u8651\u6570\u636e\u8bfb\u53d6\u901f\u5ea6,\n\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u97f3\u9891\u7684\u89e3\u7801\u76f8\u6bd4\u56fe\u50cf\u89e3\u7801\u4f1a\u6d88\u8017\u66f4\u591a\u65f6\u95f4,\n\u5b9e\u9645\u6d4b\u8bd5librosa\u4ece\u78c1\u76d8\u52a0\u8f7d\u4e00\u4e2aaac\u7f16\u7801\u7684\u77ed\u97f3\u9891 \u8017\u65f6\u662fpyav\u76848\u500d\u5de6\u53f3.\n\n- librosa\n ``pip install librosa``\n- ffmpeg\n\n ::\n\n # \u4e0b\u8f7dffmpeg\u6e90\u7801, \u8fdb\u5165\u6839\u76ee\u5f55\n ./configure --extra-cflags=-fPIC --enable-shared\n make -j\n sudo make install\n\n- pyav, \u9700\u8981\u5148\u5b89\u88c5ffmpeg\n ``pip install av``\n- | gluonfr\n | ``pip install git+https://github.com/THUFutureLab/gluon-face.git@master``\n\nDatasets\n--------\n\nTIMIT\n~~~~~\n\nThe DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus (TIMIT)\nTraining and Test Data. Before using this dataset please follow the\ninstruction on `link `__.\n\nA copy of this was uploaded to `Google Drive `__\nby @philipperemy `here `__.\n\nVoxCeleb\n~~~~~~~~\n\nVoxCeleb is an audio-visual dataset consisting of short clips of human\nspeech, extracted from interview videos uploaded to YouTube.\n\nFor more information, checkout this\n`page `__.\n\nPretrained Models\n-----------------\n\nSpeaker Recognition\n~~~~~~~~~~~~~~~~~~~\n\nResNet18 training with VoxCeleb\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDownload: `Baidu `__,\n`Google\nDrive `__\n\nI followed the ideas in paper **VoxCeleb2**\n`1806.05622 `__ to train this model,\nthe differences between them:\n\n+-------+--------+--------+\n| - | Res18 | Res34 |\n| | in | in |\n| | this | paper |\n| | repo | |\n+=======+========+========+\n| Train | VoxCel | VoxCel |\n| ed | eb2 | eb2 |\n| on | | |\n+-------+--------+--------+\n| Input | 224x22 | 512x30 |\n| spec | 4 | 0 |\n| size | | |\n+-------+--------+--------+\n| Eval | Random | Origin |\n| on | 9500+ | al |\n| | pair | VoxCel |\n| | sample | eb1 |\n| | s | test |\n| | from | set |\n| | VoxCel | |\n| | eb1 | |\n| | train | |\n| | and | |\n| | test | |\n| | set | |\n+-------+--------+--------+\n| Metri | Accura | EER: |\n| c | cy:0.9 | 0.0504 |\n| | 32656+ | |\n| | -0.005 | |\n| | 187 | |\n+-------+--------+--------+\n| Frame | Mxnet | Matcon |\n| work | Gluon | vnet |\n+-------+--------+--------+\n| ROC | | - |\n+-------+--------+--------+\n\nTODO\n----\n\n\u63a5\u4e0b\u6765\u4f1a\u6162\u6162\u8865\u5168\u4f7f\u7528mxnet gluon\u8bad\u7ec3\u8bf4\u8bdd\u4eba\u8bc6\u522b\u7684\u5de5\u5177\u94fe, \u9884\u8ba1\u4f1a\u82b1\u8d85\u957f\u65f6\u95f4.\n\nDocs\n----\n\nGluonAR documentation is not available now.\n\nAuthors\n-------\n\n{ `haoxintong `__ }\n\nDiscussion\n----------\n\nAny suggestions, please open an issue.\n\nContributes\n-----------\n\nThe final goal of this project is providing an easy using deep learning\nbased audio algorithm library like\n`pytorch-kaldi `__.\n\nContribution is welcomed.\n\nReferences\n----------\n\n1. MXNet Documentation and Tutorials\n https://zh.diveintodeeplearning.org/\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/haoxintong/gluon-audio", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "gluonar", "package_url": "https://pypi.org/project/gluonar/", "platform": "", "project_url": "https://pypi.org/project/gluonar/", "project_urls": { "Homepage": "https://github.com/haoxintong/gluon-audio" }, "release_url": "https://pypi.org/project/gluonar/0.1.0/", "requires_dist": null, "requires_python": "", "summary": "Gluon Audio Toolkit", "version": "0.1.0" }, "last_serial": 5385310, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "d8dd0a44fa65e46068a6d97ae497713d", "sha256": "0e38622857699094e231ecc9a680d1e20175d98ec63cfb3e63b02158a812a194" }, "downloads": -1, "filename": "gluonar-0.1.0.tar.gz", "has_sig": false, "md5_digest": "d8dd0a44fa65e46068a6d97ae497713d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15000, "upload_time": "2019-06-11T08:15:00", "url": "https://files.pythonhosted.org/packages/e2/a1/33e1a0221c11e85382979dfd6d227f768c97ac25facc72efafa08328b089/gluonar-0.1.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "d8dd0a44fa65e46068a6d97ae497713d", "sha256": "0e38622857699094e231ecc9a680d1e20175d98ec63cfb3e63b02158a812a194" }, "downloads": -1, "filename": "gluonar-0.1.0.tar.gz", "has_sig": false, "md5_digest": "d8dd0a44fa65e46068a6d97ae497713d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15000, "upload_time": "2019-06-11T08:15:00", "url": "https://files.pythonhosted.org/packages/e2/a1/33e1a0221c11e85382979dfd6d227f768c97ac25facc72efafa08328b089/gluonar-0.1.0.tar.gz" } ] }