{ "info": { "author": "Alan Liddell", "author_email": "alan@vidriotech.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# Hybrid ground-truth generation for spike-sorting\n\n## Installation\n\nThe best way to get started is to [install Anaconda or Miniconda](https://conda.io/docs/user-guide/install/index.html).\nOnce you've done that, fire up your favorite terminal emulator (PowerShell or CMD on Windows, but we recommend CMD; iTerm2 or Terminal on Mac;\nlots of choices if you're on Linux, but you knew that) and navigate to the directory containing this README file (it\nalso contains `environment.yml`).\n\nOn UNIX variants, type:\n\n```bash\n$ conda env create -n hybridfactory\nSolving environment: done\nDownloading and Extracting Packages\n...\nPreparing transaction: done\nVerifying transaction: done\nExecuting transaction: done\n#\n# To activate this environment, use:\n# > source activate hybridfactory\n#\n# To deactivate an active environment, use:\n# > source deactivate\n#\n\n$ source activate hybridfactory\n```\n\nOn Windows:\n\n```shell\n$ conda env create -n hybridfactory\nSolving environment: done\nDownloading and Extracting Packages\n...\nPreparing transaction: done\nVerifying transaction: done\nExecuting transaction: done\n#\n# To activate this environment, use:\n# > activate hybridfactory\n#\n# To deactivate an active environment, use:\n# > deactivate\n#\n# * for power-users using bash, you must source\n#\n\n$ activate hybridfactory\n```\n\nand you should be good to go. Remember that `[source] activate hybridfactory` every time you open up a new shell.\n\n## Usage\n\nThis tool is primarily a command-line utility.\nProvided you have a [parameter file](#parameter-file), you can invoke it like so:\n\n```shell\n(hybridfactory) $ hybridfactory generate /path/to/params.py\n```\n\nRight now, `generate` is the only command available, allowing you to generate hybrid data from a pre-existing raw\ndata set and output from a spike-sorting tool, e.g., [KiloSort](https://github.com/cortex-lab/KiloSort) or\n[JRCLUST](https://github.com/JaneliaSciComp/JRCLUST).\nThis is probably what you want to do.\n\nAfter your hybrid data has been generated, we have some [validation tools](#validation-tools) you can use to look at\nyour hybrid output, but this is not as convenient as a command-line tool (yet).\n\n### A word about bugs\n\nThis software is under active development.\nAlthough we strive for accuracy and consistency, there's a good chance you'll run into some bugs.\nIf you run into an exception which crashes the program, you should see a helpful message with my email address and a\ntraceback.\nIf you find something a little more subtle, please post an issue on the\n[issue page](https://gitlab.com/vidriotech/spiegel/hybridfactory/issues).\n\n## Parameter file\n\nRather than pass a bunch of flags and arguments to `hybridfactory`, we have collected all the parameters in a\nparameter file, `params.py`.\nWe briefly explain each option below.\nSee [params_example.py](https://gitlab.com/vidriotech/spiegel/hybridfactory/blob/master/params_example.py) for an example.\n\n### Required parameters\n\n- `data_directory`: Directory containing output from your spike sorter, e.g., `rez.mat` or `*.npy` for KiloSort;\n or `*_jrc.mat` and `*_spk(raw|wav|fet).jrc` for JRCLUST.\n- `raw_source_file`: Path to file containing raw source data (currently only [SpikeGL[X]](https://github.com/billkarsh/SpikeGLX/)-formatted data is supported).\n This can also be a [glob](https://en.wikipedia.org/wiki/Glob_%28programming%29) if you have multiple data files.\n- `data_type`: Type of raw data, as a [NumPy data type](https://docs.scipy.org/doc/numpy/user/basics.types.html).\n (I have only seen `int16`.)\n- `sample_rate`: Sample rate of the source data, in Hz.\n- `ground_truth_units`: Cluster labels (1-based indexing) of ground-truth units from your spike sorter's output.\n- `start_time`: Start time (0-based) of recording in data file (in sample units).\n Nonnegative integer if `raw_source_file` is a single file, iterable of nonnegative integers if you have a globbed\n `raw_source_file`.\n If you have SpikeGL meta files, you can use `hybridfactory.io.spikegl.get_start_times` to get these automagically.\n\n### Probe configuration\n\n- `probe_type`: Probe layout.\n This is pretty open-ended so it is up to you to construct.\n If you have a Neuropixels Phase 3A probe with the standard reference channels,\n you have it easy.\n Just put `neuropixels3a()` for this value.\n Otherwise, you'll need to construct the following NumPy arrays to describe\n your probe:\n - `channel_map`: a 1-d array of `n` ints describing which row in the data to\n look for which channel (0-based).\n - `connected`: a 1-d array of `n` bools, with entry `k` being `True` if and\n only if channel `k` was used in the sorting.\n - `channel_positions`: an $`n \\times 2`$ array of floats, with row `k`\n holding the x and y coordinates of channel `k`.\n - `name` (optional): a string giving the model name of your probe.\n This is just decorative for now.\n\n With these parameters, you can pass them to\n [`hybridfactory.probes.custom_probe`](https://gitlab.com/vidriotech/spiegel/hybridfactory/blob/master/hybridfactory/probes/probe.py#L275) like so:\n\n```python\n# if your probe has a name\nprobe = hybridfactory.probes.custom_probe(channel_map, connected, channel_positions, name)\n\n# alternatively, if you don't want to specify a name\nprobe = hybridfactory.probes.custom_probe(channel_map, connected, channel_positions)\n```\n\nBe sure to `import hybridfactory.probes` in your `params.py` (see the [example `params.py`]((https://gitlab.com/vidriotech/spiegel/hybridfactory/blob/master/params_example.py)) to get a feel for this).\n\n### Optional parameters\n\n- `session_name`: String giving an identifying name to your hybrid run.\n Default is an MD5 hash computed from the current timestamp.\n- `random_seed`: Nonnegative integer in the range $`[0, 2^{31})`$.\n Because this algorithm is randomized, setting a random seed allows for reproducible output.\n The default is itself randomly generated, but will be output in a `hfparams_[session_name].py` on successful completion.\n- `output_directory`: Path to directory where you want to output the hybrid data.\n (This includes raw data files and annotations.)\n Defaults to \"`data_directory`/hybrid_output\".\n- `output_type`: Type of output from your spike sorter.\n One of \"phy\" (for `*.npy`), \"kilosort\" (for `rez.mat`), or \"jrc\" (for `*_jrc.mat` and `*_spk(raw|wav|fet).jrc`).\n `hybridfactory` will try to infer it from files in `data_directory` if not\n specified.\n- `num_singular_values`: Number of singular values to use in the construction of artificial events.\n Default is 6.\n- `channel_shift`: Number of channels to shift artificial events up or down from their source.\n Default depends on the probe used.\n- `synthetic_rate`: Firing rate, in Hz, for hybrid units.\n This should be either an empty list (if you want to use the implicit firing rate of your ground-truth units) or an iterable of artificial rates.\n In the latter case, you must specify a firing rate for each ground-truth unit.\n Default is the implicit firing rate of each unit.\n- `time_jitter`: Scale factor for (normally-distributed) random time shift, in sample units.\n Default is 100.\n- `amplitude_scale_min`: Minimum factor for (uniformly-distributed) random amplitude scaling, in percentage units.\n Default is 1.\n- `amplitude_scale_max`: Maximum factor for (uniformly-distributed) random amplitude scaling, in percentage units.\n Default is 1.\n- `samples_before`: Number of samples to take before an event timestep for artificial event construction.\n Default is 40.\n- `samples_after`: Number of samples to take after an event timestep for artificial event construction.\n Default is 40.\n- `copy`: Whether or not to copy the source file to the target.\n You usually want to do this, but if the file is large and you know where your data has been perturbed, you could use\n [`HybridDataSet.reset`](https://gitlab.com/vidriotech/spiegel/hybridfactory/blob/master/hybridfactory/data/dataset.py#L485) instead.\n Default is False.\n\n## Validation tools\n\nFor KiloSort output, we compare (shifted) templates associated with the artificial events to templates from the sorting\nof the hybrid data.\nThis will probably be meaningless unless you use the same master file to sort the hybrid data that you used to sort the\ndata from which we derived our artificial events.\nWe [compare](https://gitlab.com/vidriotech/spiegel/hybridfactory/blob/master/hybridfactory/validate/comparison.py#L99) in one of\ntwo ways: by computing Pearson correlation coefficients of the flattened templates (in which case, higher is better), or by computing the\nFrobenius norm of the difference of the two templates (lower is better here).\nWhen we find the best matches in a 2 ms interval around each true firing, we can generate a\n[confusion matrix](https://gitlab.com/vidriotech/spiegel/hybridfactory/blob/master/hybridfactory/validate/comparison.py#L283)\nto see how we did.\n\nThis functionality is not in `generate.py`, but should be used in a Jupyter notebook (for now).\nAdding a demo notebook is a TODO.\n\nAdding more validation tools is another TODO.\nSuggestions for tools you'd want to see are\n[always welcome](https://gitlab.com/vidriotech/spiegel/hybridfactory/issues).\n\n## Output\n\nIf successful, `generate.py` will output several files in `output_directory`:\n- Raw data files.\n The filenames of your source data file will be reused, prepending `.GT` before\n the file extension.\n For example, if your source file is called `data.bin`, the target file will be\n named `data.GT.bin` and will live in `output_directory`.\n- Dataset save files.\n These include:\n - `metadata-[session_name].csv`: a table of filenames, start times, and\n sample rates of the files in your hybrid dataset (start times and sample\n rates should match those of your source files).\n - `annotations-[session_name].csv`: a table of (real and synthetic) cluster\n IDs, timesteps, and templates (Kilosort only) or assigned channels (JRCLUST\n only).\n - `artificial_units-[session_name].csv`: a table of new cluster IDs, true\n units, timesteps, and templates (Kilosort only) or assigned channels (JRCLUST\n only) for your artificial units.\n - `probe-[session_name].npz`: a NumPy-formatted archive of data describing\n your probe. (See [Probe configuration](#probe-configuration) for a\n description of these data.)\n - `dtype-[session_name].npy`: a NumPy-formatted archive containing the sample\n rate of your dataset in the same format as your raw dataset.\n- `firings_true.npy`.\n This is a $`3 \\times K`$ array of `uint64`, where $`K`$ is the number of events generated.\n - Row 0 is the channel on which the event is centered, zero-based.\n - Row 1 is the timestamp of the event in sample units, zero-based.\n - Row 2 is the unit/cluster ID from the original data set for the event.\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://gitlab.com/vidriotech/spiegel/hybridfactory", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "hybridfactory", "package_url": "https://pypi.org/project/hybridfactory/", "platform": "", "project_url": "https://pypi.org/project/hybridfactory/", "project_urls": { "Homepage": "https://gitlab.com/vidriotech/spiegel/hybridfactory" }, "release_url": "https://pypi.org/project/hybridfactory/0.1.0b1/", "requires_dist": null, "requires_python": "", "summary": "A utility for generating hybrid ephys data", "version": "0.1.0b1" }, "last_serial": 4378862, "releases": { "0.1.0b0": [ { "comment_text": "", "digests": { "md5": "7232517ba544dba8dbb9d81f6257da77", "sha256": "1dd87881a85165381c005326bbc30e1ed04115fbe359d6bd8234e9ceb70afcde" }, "downloads": -1, "filename": "hybridfactory-0.1.0b0-py3-none-any.whl", "has_sig": false, "md5_digest": "7232517ba544dba8dbb9d81f6257da77", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 35256, "upload_time": "2018-10-09T15:42:26", "url": "https://files.pythonhosted.org/packages/78/a8/7f11ed4b732e95f66d732bfe538b137ffd1287f0c8600cb3c80218382252/hybridfactory-0.1.0b0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "5c36fcdf0efd3f2f6563113ec01c2d72", "sha256": "d9489226b8cd277a2b3a597306c8f911bb46f2e9aed84b2e0f36f79e8809a06c" }, "downloads": -1, "filename": "hybridfactory-0.1.0b0.tar.gz", "has_sig": false, "md5_digest": "5c36fcdf0efd3f2f6563113ec01c2d72", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 43053, "upload_time": "2018-10-09T15:42:27", "url": "https://files.pythonhosted.org/packages/63/ae/68a4bfcda721d33910d04fc0fff1e7a1118b19d30065b8a2624d38976547/hybridfactory-0.1.0b0.tar.gz" } ], "0.1.0b1": [ { "comment_text": "", "digests": { "md5": "db39bc4915f912da4e85220c08c8fbe4", "sha256": "c43bb4e2caa28286f58985fb15319e5010fd183ff259e76fc662d9846167560d" }, "downloads": -1, "filename": "hybridfactory-0.1.0b1-py3-none-any.whl", "has_sig": false, "md5_digest": "db39bc4915f912da4e85220c08c8fbe4", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 36076, "upload_time": "2018-10-15T20:26:13", "url": "https://files.pythonhosted.org/packages/9e/b6/555b1da47764f070bc5086eff52b990624428c17d52fd3af295b45b260e3/hybridfactory-0.1.0b1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "9ff2c8877e870e24772ac91144812c6f", "sha256": "0973df03851720e82270ae417933a224ff3333a6cb138b4238f2a57b371dd96d" }, "downloads": -1, "filename": "hybridfactory-0.1.0b1.tar.gz", "has_sig": false, "md5_digest": "9ff2c8877e870e24772ac91144812c6f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 43191, "upload_time": "2018-10-15T20:26:14", "url": "https://files.pythonhosted.org/packages/22/b6/ddaeef45f680793ce2f7133c112f5cd1e800db90c5eb81a1fd1308167f99/hybridfactory-0.1.0b1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "db39bc4915f912da4e85220c08c8fbe4", "sha256": "c43bb4e2caa28286f58985fb15319e5010fd183ff259e76fc662d9846167560d" }, "downloads": -1, "filename": "hybridfactory-0.1.0b1-py3-none-any.whl", "has_sig": false, "md5_digest": "db39bc4915f912da4e85220c08c8fbe4", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 36076, "upload_time": "2018-10-15T20:26:13", "url": "https://files.pythonhosted.org/packages/9e/b6/555b1da47764f070bc5086eff52b990624428c17d52fd3af295b45b260e3/hybridfactory-0.1.0b1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "9ff2c8877e870e24772ac91144812c6f", "sha256": "0973df03851720e82270ae417933a224ff3333a6cb138b4238f2a57b371dd96d" }, "downloads": -1, "filename": "hybridfactory-0.1.0b1.tar.gz", "has_sig": false, "md5_digest": "9ff2c8877e870e24772ac91144812c6f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 43191, "upload_time": "2018-10-15T20:26:14", "url": "https://files.pythonhosted.org/packages/22/b6/ddaeef45f680793ce2f7133c112f5cd1e800db90c5eb81a1fd1308167f99/hybridfactory-0.1.0b1.tar.gz" } ] }