{ "info": { "author": "scott huang", "author_email": "scotthuang1989@163.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: GNU General Public License v3 (GPLv3)", "Programming Language :: Python :: 3.5", "Topic :: Software Development :: Build Tools" ], "description": "# image2tfrecords\n--------\n\nConvert images into tfrecord to comply with tensorflow best practice: [tensorflow doc link](https://tensorflow.google.cn/performance/performance_guide#input_pipeline_optimization).\n\n# Supported platform\n\n* OS\n * ubuntu\n\n\n* Python\n * python3\n\n\n* tensorflow\n * 1.4\n\n# Installation\n\n`pip install image2tfrecords`\n\n# Features\n\n* Stratified split between train/validation/test: so each split have same percentage of each class.\n\n* Tensorflow Dataset API support: Provide a Class that read tfrecords files and return a Dataset, so developers can easily build tensorflow program with images.\n\n\n# Tutorial\n\nThis simple tutorial will work you through creating cifar10 tfrecords for kaggle competition. yo can check `example_cifar10.py` for full code.\n\n### Download cifar10 data.\n\n* Download train.7z and trainLabels.csv from kaggle: https://www.kaggle.com/c/cifar-10/data\n* put them into /tmp/cifar10_data\n* extract data with command: `7z x train.7z`\n* Directory:/tmp/cifar10_data should look like this:\n ```\n \u251c\u2500\u2500 train\n \u251c\u2500\u2500 train.7z\n \u2514\u2500\u2500 trainLabels.csv\n ```\n \"train\" contains all cifar10 images\n\n### Convert label file(a file list all images and its corresponding labels)\n\nBecause this module requires label file with following format:\n\n```\nimage2class_file: csv file which contains classname for every image file\n Format:\n filename, class\n 1.jpg , cat\n 2.jpg , dog\n .. , ..\n you must provide a valid header for csv file. Headers will be used in\n tfrecord to represent dataset-specific information.\n```\nFirst, we need convert cifar10 label file to this format:\n\n```\nimport pandas as pd\n\nfrom image2tfrecords.image2tfrecords import Image2TFRecords\n\nCIFAR10_DATA_DIR = \"/tmp/cifar10_data/train\"\nCIFAR10_LABELS = \"/tmp/cifar10_data/trainLabels.csv\"\n\n# Convert label file to required format\n# There is 1 steps\n# 1. add .png extension to filename.\n\nlabel_csv = pd.read_csv(CIFAR10_LABELS)\n# convert id column to str\nlabel_csv = label_csv.astype({\"id\": str}, copy=False)\nlabel_csv[\"id\"] = label_csv[\"id\"]+\".png\"\n\nmodified_label_file = \"/tmp/cifar10_data/train_labels_ext.csv\"\nlabel_csv.to_csv(modified_label_file, index=False)\n```\nAfter this step. the new label file is `/tmp/cifar10_data/train_labels_ext.csv`\n\n### Create tfrecords\n\nsince we have what we need, now pass the image directory and label flle path to **Image2TFRecords** and create the tfrecords\n```\nimg2tf = Image2TFRecords(\n CIFAR10_DATA_DIR,\n modified_label_file,\n val_size=0.3,\n test_size=0.1,\n dataset_name=\"cifar10\"\n )\nimg2tf.create_tfrecords(output_dir=\"/tmp/cifar10_data/tfrecords\")\n```\nAfter run this, the tfrecords file will be at: `/tmp/cifar10_data/tfrecords`, it will looks like this:\n\n```\n-rw-rw-r-- 1 scott scott 169 Dec 21 10:57 cifar10_summary.json\n-rw-rw-r-- 1 scott scott 2.3M Dec 21 10:57 cifar10_test_00001-of-00005.tfrecord\n-rw-rw-r-- 1 scott scott 2.3M Dec 21 10:57 cifar10_test_00002-of-00005.tfrecord\n-rw-rw-r-- 1 scott scott 2.4M Dec 21 10:57 cifar10_test_00003-of-00005.tfrecord\n-rw-rw-r-- 1 scott scott 2.5M Dec 21 10:57 cifar10_test_00004-of-00005.tfrecord\n-rw-rw-r-- 1 scott scott 2.3M Dec 21 10:57 cifar10_test_00005-of-00005.tfrecord\n-rw-rw-r-- 1 scott scott 14M Dec 21 10:56 cifar10_train_00001-of-00005.tfrecord\n-rw-rw-r-- 1 scott scott 14M Dec 21 10:56 cifar10_train_00002-of-00005.tfrecord\n-rw-rw-r-- 1 scott scott 15M Dec 21 10:56 cifar10_train_00003-of-00005.tfrecord\n-rw-rw-r-- 1 scott scott 15M Dec 21 10:56 cifar10_train_00004-of-00005.tfrecord\n-rw-rw-r-- 1 scott scott 14M Dec 21 10:57 cifar10_train_00005-of-00005.tfrecord\n-rw-rw-r-- 1 scott scott 6.8M Dec 21 10:57 cifar10_validation_00001-of-00005.tfrecord\n-rw-rw-r-- 1 scott scott 6.9M Dec 21 10:57 cifar10_validation_00002-of-00005.tfrecord\n-rw-rw-r-- 1 scott scott 7.1M Dec 21 10:57 cifar10_validation_00003-of-00005.tfrecord\n-rw-rw-r-- 1 scott scott 7.4M Dec 21 10:57 cifar10_validation_00004-of-00005.tfrecord\n-rw-rw-r-- 1 scott scott 6.9M Dec 21 10:57 cifar10_validation_00005-of-00005.tfrecord\n-rw-rw-r-- 1 scott scott 95 Dec 21 10:57 labels.csv\n\n```\n\n### Train a model with these tfrecords file.\n\ncheck the example_cifar10.py for full code.\n\nBut the key step is creating a `ImageDataSet`:\n\n```\nimage_dataset = ImageDataSet(\"/tmp/cifar10_data/tfrecords\", 'cifar10')\nnum_class = len(image_dataset.labels_df)\nval_split = image_dataset.get_split(\"validation\")\nval_images, _, val_labels = batch_and_process(val_split, num_class)\n```\nAfter get a tensorflow DataSet. The most common practice is pass it to `DatasetDataProvider`, then get the data from this `DatasetDataProvider`.\n\n```\ndata_provider = slim.dataset_data_provider.DatasetDataProvider(\n data_set,\n common_queue_capacity=64,\n common_queue_min=32,\n num_epochs=1, shuffle=False)\n\nimage_raw, label, _ = data_provider.get(['image', 'label', 'filename'])\n```\n\n\n\n# API Intruduction\n\nThis should be generated from comments automatically.", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/scotthuang1989/image2tfrecords", "keywords": "tensorflow tfrecord", "license": "GPL-3.0", "maintainer": "", "maintainer_email": "", "name": "image2tfrecords", "package_url": "https://pypi.org/project/image2tfrecords/", "platform": "", "project_url": "https://pypi.org/project/image2tfrecords/", "project_urls": { "Homepage": "https://github.com/scotthuang1989/image2tfrecords" }, "release_url": "https://pypi.org/project/image2tfrecords/0.0.4/", "requires_dist": null, "requires_python": "", "summary": "A lib for creating tensorflow tfrecords", "version": "0.0.4" }, "last_serial": 3440692, "releases": { "0.0.2": [ { "comment_text": "", "digests": { "md5": "c58422f0c7c7bb77ded19afcbb5bc9e5", "sha256": "f0abcba12ec86739ef90d810a4fbe248b6294ea034ac84493c02c30cd6f66c32" }, "downloads": -1, "filename": "image2tfrecords-0.0.2.tar.gz", "has_sig": false, "md5_digest": "c58422f0c7c7bb77ded19afcbb5bc9e5", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20505, "upload_time": "2017-12-20T08:57:07", "url": "https://files.pythonhosted.org/packages/13/12/4c242e37c2187edda2c24b8c6841f89f187d2db7ed2ead68d0894911ca85/image2tfrecords-0.0.2.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "44f16ea15d66686cec7d1576d3ed58cc", "sha256": "865e9c32ffe01831b61578b7f5992570d6f5d45cc12ddf30c92e4338276741c1" }, "downloads": -1, "filename": "image2tfrecords-0.0.3.tar.gz", "has_sig": false, "md5_digest": "44f16ea15d66686cec7d1576d3ed58cc", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 24750, "upload_time": "2017-12-24T14:11:49", "url": "https://files.pythonhosted.org/packages/10/d8/c60f617cebb4dcd35f0ae219443ac9c9d2a2ff96f9e303e2b371bc47c2f0/image2tfrecords-0.0.3.tar.gz" } ], "0.0.4": [ { "comment_text": "", "digests": { "md5": "6f4e89c4785a652095f5e7f103f79a7b", "sha256": "3dfedd07d8d24b64f32de48f7fca92fd51a7ff661738f6949a76d9c861918629" }, "downloads": -1, "filename": "image2tfrecords-0.0.4.tar.gz", "has_sig": false, "md5_digest": "6f4e89c4785a652095f5e7f103f79a7b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 24811, "upload_time": "2017-12-24T16:16:58", "url": "https://files.pythonhosted.org/packages/4b/f9/46986d949a6199648e85d785bff1e306859e9e85525e5085d3fb23c229b5/image2tfrecords-0.0.4.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "6f4e89c4785a652095f5e7f103f79a7b", "sha256": "3dfedd07d8d24b64f32de48f7fca92fd51a7ff661738f6949a76d9c861918629" }, "downloads": -1, "filename": "image2tfrecords-0.0.4.tar.gz", "has_sig": false, "md5_digest": "6f4e89c4785a652095f5e7f103f79a7b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 24811, "upload_time": "2017-12-24T16:16:58", "url": "https://files.pythonhosted.org/packages/4b/f9/46986d949a6199648e85d785bff1e306859e9e85525e5085d3fb23c229b5/image2tfrecords-0.0.4.tar.gz" } ] }