{ "info": { "author": "Sumner Magruder", "author_email": "sumner.magruder@zmnh.uni-hamburg.de", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3.6" ], "description": "# FIO\n\nFIO is short for Feature I/O.\n\nFIO is a concept stemming TensorFlow's low-level API. More specifically,\nFIO is a concept of how to join the methods dealing with the protocol buffers\nthat write data to the TF.Record format and from the TF.Record format.\n\n\nFIO works by defining your features __once__ in a \"schema\"\n\n\ne.g. for \"normal\" features:\n```\n'my-features': {'length': 'fixed', 'dtype': tf.string, 'shape': []}\n\n```\nand we are done. Make this a `tf.train.Example` or `tf.train.SequenceExample`,\nwrite to a record, and read from it.\n\nFor rank 2 features:\n```\n'seq': {\n 'length': 'fixed',\n 'dtype': tf.int64,\n 'shape': [3, 3],\n 'encode': 'channels',\n 'channel_names': ['Channel 1', 'Channel 2', 'Channel 3'],\n 'data_format': 'channels_last'\n }\n```\n\nand we are done again. I can now encode this as a `tf.train.Example` (with the\noptional, but here provided channel names) or `tf.train.SequenceExample` without the names.\n\nSee the Demo notebook\n\n# Useful resources\n\n## S.O.\n[How to decompose tensors for TF.Record](https://stackoverflow.com/questions/52035692/tensorflow-v1-10-store-images-as-byte-strings-or-per-channel)\n[How to recover TF Record data](https://stackoverflow.com/questions/52064866/tensorflow-1-10-tfrecorddataset-recovering-tfrecords)\n## Colab\n[Playing with Rank 2 Tensors and TF Records](https://colab.research.google.com/drive/1M10tbHih5eJ8LiApJSKKpNM79IconYJX)\n[Recovering TF Records](https://colab.research.google.com/drive/1HUGoXfgxp0A_0eSdaCzutOkFvnYZ-egv)\n\n# Motivation\n\nDisclaimer: I am not familiar with the intricacies of TF, Protocol Buffers, and the like. So my motivation here may actually not make sense to the experts who actually designed this for TF. From my uninformed view, I find parts of this interface cumbersome and lacking documentation and FIO is as much as an over-engineered solution to my specific problem (getting some data in and out of TF Records) as it is an exploration of what works regarding TF Records.\n\n## \"Duplicate code\"\nConsider the exceptionally trivial data of writing and reading:\n```\nmy_feature = 'hi'\n```\nfrom a TF.Record.\n\n\n### Example\nTo get this feature into a record we must first wrap this data as follows:\n\n```\nexample = tf.train.Example(\n features=tf.train.Features(\n feature={\n 'my-feature': tf.train.Feature(\n bytes_list=tf.train.BytesList(\n value=[my_feature.encode()] # wrap as list\n ) # end byteslist\n ) # end feature\n }\n ) # end features\n) # end example\n```\nwhich returns:\n\n\n```\nfeatures {\n feature {\n key: \"my-feature\"\n value {\n bytes_list {\n value: \"hi\"\n }\n }\n }\n}\n```\n\nand we can write this to a record with:\n\n```\nwith tf.python_io.TFRecordWriter('my_record.tfrecord') as writer:\n writer.write(example.SerializeToString())\n```\n\nand we can retrieve it from this file with:\n\n```\nDATASET_FILENAMES = tf.placeholder(tf.string, shape=[None])\ndataset = tf.data.TFRecordDataset(DATASET_FILENAMES).map(lambda r: parse_fn(r)).repeat().batch(1)\niterator = dataset.make_initializable_iterator()\nnext_element = iterator.get_next()\n\nfor _ in range(1): # epochs\n sess.run(iterator.initializer, feed_dict={DATASET_FILENAMES: ['my_record.tfrecord']})\n for _ in range(1): # batch\n recovered = sess.run(next_element)\n```\n\n\nNow it is very important to note that:\n1. the above code is a bit more sophisticated than the base example needed (given the iterator), which I will not go into detail here about,\n\n2. the above code will not run because `parse_fn` is not defined\n\nSince we are using a TF Example, I will demonstrate how we can simply read the example to get the data back:\n\n\n```\ndef parse_fn(record):\n features = {\n 'my-feature': tf.FixedLenFeature([], dtype=tf.string)\n }\n parsed = tf.parse_single_example(record, features)\n # other things can be done if needed\n return parsed\n```\n\nnow running the above code yields:\n\n\n```\n{'my-feature': array([b'hi'], dtype=object)}\n```\n\nSo if we want to get back to `{'my-feature': 'hi'}` we still need to unwrap the list and decode the string:\n\n```\nrecovered['my-feature'] = recovered['my-feature'][0].decode()\n```\n\nNote: this can not be done in the parsing function (at least as I have done here). \n\n### What I would like to solve\nThe point of showing all of this, is that if so much care goes into converting our data for TF.Records, why must I then also write similar code to extract it out?\n\nA goal of FIO is to define a singular schema which gets data both into a TF Record and can recover it (as we put it in).\n\n## De/re-composing tensors:\nI will touch on the difference between `tf.train.Example` and `tf.train.SequenceExample` in the next section. Regardless of which you choose to use, any tensor of rank 2 or greater must be decomposed.\n\nFor simplicity consider the sequence:\n\n```\nseq = [\n # ch1, ch2, channel_3\n [ 1, 1, 1], # element 1\n [ 2, 2, 2], # element 2\n [ 3, 3, 3], # element 3\n [ 4, 5, 6] # element 4\n]\n```\n\nBoth `tf.train.Example` and `tf.train.SequenceExample` require `seq` to be decomposed by channel:\n\neither as:\n\n```\n# for tf.train.Example\ntf.train.Features(\n feature={\n 'channel 1': tf.train.Feature(int64_list=tf.train.Int64List(value=seq[0])),\n 'channel 2': tf.train.Feature(int64_list=tf.train.Int64List(value=seq[1])),\n 'channel 3': tf.train.Feature(int64_list=tf.train.Int64List(value=seq[2]))\n }\n)\n```\n\nor as:\n\n```\ntf.train.FeatureLists(feature_list=\n tf.train.FeatureList(\n feature=[\n tf.train.Feature(int64_list=tf.train.Int64List(value=seq[i]))\n for i in range(number_of_channels(seq))\n ]\n )\n)\n```\n\nThe difference here being that for `SequenceExample`, the channels are unnamed features. For many channels, this is advantageous as one needs not worry about reassembling the sequence from all the channels. However, if one has only a few channels (e.g. rgb), then one could - prior to feeding into the model - rearrange the channels, or if there are multiple-inputs, this may be of use.\n\nEither-way, FIO aims to handle this part, both for decomposing tensors and recomposing them (in the case of the `Example`).\n\n## Singular interface\nAs mentioned above, a core distinction between `Example` and `SequenceExample` is whether or not each feature is named. However, there is one other core distinction: `SequenceExample` is a tuple. Rather, I should say, `SequenceExample` allows one to store \"context\" (metadata), where the context are features that adhere to all the restrictions of those for an `Example`.\n\nThe example given in the docs is that the context might be the same across sequences, but the large sequences may vary. Thus it saves space.\n\nI admit, while that sound like a structure, I do not understand how to utilize that in practice. A `SequenceExample` _requires_ a context (although it can be just an empty `dict`). Thus if one is storing examples individually, they would most likely store the context in each record. Furthermore, given my current understanding, only sequence features can be stored in the `feature_lists` part of `SequenceExample`.\n\nFrom my uninformed perspective, it might have been nicer to just maintain a single `Example` interface, and then using something like (the imaginary) `tf.train.SequenceFeature` allow everything to be stored together. Better yet, `tf.train.TensorFeature` might solve a lot of issues.\n\n\nAnyway, FIO aims to be a singular interface for handling `Example` and `SequenceExample`, making it easy to decompose tensors to (un)named features and export / import correspondingly.\n\n## Decomposing techniques\n\nThere are many ways to decompose a Tensor into its channels. I will only consider some of the possibilities for the aforementioned `seq`:\n\n- separate each channel of `seq` and store as corresponding numeric type, name and store in `Example`\n- separate each channel of `seq`, and store as bytes after converting via `numpy.ndarray.tostring()`, name and store in `Example`\n- separate each channel of `seq`, and convert each element to bytes, and then store as a `BytesList`, name and store in `Example`\n- store altogether by dumping `seq` to bytes via `numpy.ndarray.tostring()`, name and store in `Example`\n- separate each channel of `seq` and store as corresponding numeric type, leave unnamed and store in `SequenceExample`\n- separate each channel of `seq`, and store as bytes after converting via `numpy.ndarray.tostring()`, leave unnamed and store in `SequenceExample`\n- separate each channel of `seq`, and convert each element to bytes, and then store as a `BytesList`, leave unnamed and store in `SequenceExample`\n- store altogether by dumping `seq` to bytes via `numpy.ndarray.tostring()` and wrap as a single element bytes list.\n\nWhich one of these is best? No idea.\nMaybe TF encoded `Float` and `Int64` `Features` to bytes behind the scene.\n\nI will note that sometimes when trying to recover the data, if encoded as bytes, then the tensor is flattened. e.g. `[[1,2],[3,4]]` is restored as `[1,2,3,4]`.\n\nAnyway, FIO aims to allow for tensors to be encoded and decoded via some of these methods, as I honestly do not know which is best (in terms of file size, read efficiency, etc).\n\nAgain, I think TF could do this better via a `tf.train.TensorFeature`\n\np.s. if anyone knows why how to encoded features are under `tf.train` (e.g. `tf.train.Feature(int64_list=tf.train.Int64List(value=value))`) and how to decoded features are just under `tf` (e.g. `tf.FixedLenFeature`) I would like to know. This seems odd to me...\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://gitlab.com/SumNeuron/fio", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "fio", "package_url": "https://pypi.org/project/fio/", "platform": "", "project_url": "https://pypi.org/project/fio/", "project_urls": { "Homepage": "https://gitlab.com/SumNeuron/fio" }, "release_url": "https://pypi.org/project/fio/0.0.6/", "requires_dist": null, "requires_python": "", "summary": "Feature I/O for TensorFlow", "version": "0.0.6" }, "last_serial": 4328046, "releases": { "0.0.0": [ { "comment_text": "", "digests": { "md5": "c7bbdbc85b6dccab0f9bfbdf3c6d0003", "sha256": "34459e47f1a0c2d9748ea82faa14aa649f588f2c525e8068466b35d951271a33" }, "downloads": -1, "filename": "fio-0.0.0-py2-none-any.whl", "has_sig": false, "md5_digest": "c7bbdbc85b6dccab0f9bfbdf3c6d0003", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 10120, "upload_time": "2018-09-09T12:05:01", "url": "https://files.pythonhosted.org/packages/51/2e/9fa2eb796615cf91b694363bef50866b181f5c36043dd5f8d6bead949a90/fio-0.0.0-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "1ebfe2ca0d3b8636609927e5ba09e79f", "sha256": "bcb3f497ef1e7ed4c8995df5a427bd99309dc2ef3c10bd744a126593c251ad38" }, "downloads": -1, "filename": "fio-0.0.0.tar.gz", "has_sig": false, "md5_digest": "1ebfe2ca0d3b8636609927e5ba09e79f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5940, "upload_time": "2018-09-09T12:05:03", "url": "https://files.pythonhosted.org/packages/29/a5/f67e712f90a3e7f7125e3608d0d50ef735ba651c231190cbc7e997db3c09/fio-0.0.0.tar.gz" } ], "0.0.1": [ { "comment_text": "", "digests": { "md5": "38fa738459844002bb321abf0f045d66", "sha256": "5203d824cf6f8c392c4f3bb4bcdce534abba54159cbc10404b626761472a1dc4" }, "downloads": -1, "filename": "fio-0.0.1-py2-none-any.whl", "has_sig": false, "md5_digest": "38fa738459844002bb321abf0f045d66", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 17577, "upload_time": "2018-09-09T13:23:30", "url": "https://files.pythonhosted.org/packages/63/34/d83d90cb73b0b2110960f2d6f21f290bac93496174a0671d1653251c00b6/fio-0.0.1-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "72948688fc8f7e8a074b587e5d43b6de", "sha256": "3bdafbbb7877e027e7e49035c7d204790e89db4f5dc6e53d0f13133d5a609fcd" }, "downloads": -1, "filename": "fio-0.0.1.tar.gz", "has_sig": false, "md5_digest": "72948688fc8f7e8a074b587e5d43b6de", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13925, "upload_time": "2018-09-09T13:23:33", "url": "https://files.pythonhosted.org/packages/3d/66/b717d46b686d3b3e5ac66365244aa2a5fa8bf0d6a40afc0400aa868a317d/fio-0.0.1.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "e981b5bf634677821e21e0af5976f9ee", "sha256": "97f4861dbd2fe02349163c1dfcb9f4853be8aacf226edb2a0f041f54fc0c1158" }, "downloads": -1, "filename": "fio-0.0.3-py2-none-any.whl", "has_sig": false, "md5_digest": "e981b5bf634677821e21e0af5976f9ee", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 17707, "upload_time": "2018-09-09T14:14:21", "url": "https://files.pythonhosted.org/packages/b2/fa/9a2c9b8c53fef2548d55e1f835939131ce4098707fd05b468edfaea1ae6b/fio-0.0.3-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "83f53141e9ca4ba20b854cf492c1b949", "sha256": "75c3ca8fb57bbd3d4d8ae7c824599b3b571d92a1f983baddd06eee3e53a3aac3" }, "downloads": -1, "filename": "fio-0.0.3.tar.gz", "has_sig": false, "md5_digest": "83f53141e9ca4ba20b854cf492c1b949", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13820, "upload_time": "2018-09-09T14:14:25", "url": "https://files.pythonhosted.org/packages/3a/82/78e907c0f983d6259ee14c8bac4f354951c976923b6666fefa748c61eaf7/fio-0.0.3.tar.gz" } ], "0.0.4": [ { "comment_text": "", "digests": { "md5": "d3f814b8305bfa7aae0c979f75d27ec9", "sha256": "07e141a35c7dd193a2cf70e175bd7c67c271cf16cc0d2c31662b0f812f2ad70f" }, "downloads": -1, "filename": "fio-0.0.4-py3-none-any.whl", "has_sig": false, "md5_digest": "d3f814b8305bfa7aae0c979f75d27ec9", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 13157, "upload_time": "2018-09-24T12:23:27", "url": "https://files.pythonhosted.org/packages/bc/99/54af99563d36f976c7d0511f7bf3316510b5a65942ad5e6460c72ac7396d/fio-0.0.4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "ddf6a3c7ba8a1b4f5fb7cee99c45b8f5", "sha256": "c4211a6e6c4299e41bb47aee59948c48dedaa0f1cff8d426e6b614923a577cb6" }, "downloads": -1, "filename": "fio-0.0.4.tar.gz", "has_sig": false, "md5_digest": "ddf6a3c7ba8a1b4f5fb7cee99c45b8f5", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13798, "upload_time": "2018-09-24T12:23:31", "url": "https://files.pythonhosted.org/packages/7e/06/5fda58252054440f6eec0b0327f9f8c0cab8becc7c0be4338518520de7fb/fio-0.0.4.tar.gz" } ], "0.0.5": [ { "comment_text": "", "digests": { "md5": "1144c9c7ead2c312abee11fe1ec7c813", "sha256": "6682638defffe922d86a9f8e3d607b06847ac9fd00eec0054288fa2b9e8fd645" }, "downloads": -1, "filename": "fio-0.0.5-py3-none-any.whl", "has_sig": false, "md5_digest": "1144c9c7ead2c312abee11fe1ec7c813", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 13188, "upload_time": "2018-09-24T13:19:04", "url": "https://files.pythonhosted.org/packages/47/42/53828a4f592bb904e1a633a8ad66b104238dbfdb466363522f21c6bb5e3e/fio-0.0.5-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "f687c66c9819c8ac4b5863c397f0286f", "sha256": "f09e2e28c39d568967aa1f82fae6f6fb3b6c9d7940e5379395165be77e295ff9" }, "downloads": -1, "filename": "fio-0.0.5.tar.gz", "has_sig": false, "md5_digest": "f687c66c9819c8ac4b5863c397f0286f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13832, "upload_time": "2018-09-24T13:19:08", "url": "https://files.pythonhosted.org/packages/94/7f/f84bb415d37c493142176b370249733f7d08a24ca2253f9b3d50b4b60aa5/fio-0.0.5.tar.gz" } ], "0.0.6": [ { "comment_text": "", "digests": { "md5": "f3418962048a8cfc0c5fe2028ca6d0d4", "sha256": "8ec7c94a0a719f406991cf551680d0cceb82bab25664598e82f5dff1232675d2" }, "downloads": -1, "filename": "fio-0.0.6-py3-none-any.whl", "has_sig": false, "md5_digest": "f3418962048a8cfc0c5fe2028ca6d0d4", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 13387, "upload_time": "2018-10-01T11:43:50", "url": "https://files.pythonhosted.org/packages/b8/c9/af34a5c30e4feb5256cc5b4451280d724d4d6bcf51a8309923365752ceb7/fio-0.0.6-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "13a35e7ae1a2b392f7c1f475bd75047d", "sha256": "7deed7b1d9f8f4f32ffd82cf521137caef4c43385867e7a91de5dbff41a8e17b" }, "downloads": -1, "filename": "fio-0.0.6.tar.gz", "has_sig": false, "md5_digest": "13a35e7ae1a2b392f7c1f475bd75047d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14023, "upload_time": "2018-10-01T11:43:55", "url": "https://files.pythonhosted.org/packages/7a/80/6f689ec0239d9379f4157c665a3c04cf73394dad0588357d4654a0262e04/fio-0.0.6.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "f3418962048a8cfc0c5fe2028ca6d0d4", "sha256": "8ec7c94a0a719f406991cf551680d0cceb82bab25664598e82f5dff1232675d2" }, "downloads": -1, "filename": "fio-0.0.6-py3-none-any.whl", "has_sig": false, "md5_digest": "f3418962048a8cfc0c5fe2028ca6d0d4", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 13387, "upload_time": "2018-10-01T11:43:50", "url": "https://files.pythonhosted.org/packages/b8/c9/af34a5c30e4feb5256cc5b4451280d724d4d6bcf51a8309923365752ceb7/fio-0.0.6-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "13a35e7ae1a2b392f7c1f475bd75047d", "sha256": "7deed7b1d9f8f4f32ffd82cf521137caef4c43385867e7a91de5dbff41a8e17b" }, "downloads": -1, "filename": "fio-0.0.6.tar.gz", "has_sig": false, "md5_digest": "13a35e7ae1a2b392f7c1f475bd75047d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14023, "upload_time": "2018-10-01T11:43:55", "url": "https://files.pythonhosted.org/packages/7a/80/6f689ec0239d9379f4157c665a3c04cf73394dad0588357d4654a0262e04/fio-0.0.6.tar.gz" } ] }