{ "info": { "author": "Seria", "author_email": "zzqsummerai@yeah.net", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# Nebulae Brochure\n\n**A novel and simple framework based on concurrent mainstream frameworks and other image processing libraries. It is convenient to deploy almost every module independently.**\n\n------\n\n## Installation\n\n```sh\npip install nebulae\n```\n\nThe latest version supports PyTorch1.6 and TensorFlow2.3\n\n\n------\n\n## Modules Overview\n\nFuel: easily manage and read dataset you need anytime\n\nToolkit: includes many utilities for better support of nebulae\n\n------\n\n## Fuel\n\n**FuelGenerator(file_dir, file_list, dtype, is_seq)**\n\nBuild a FuelGenerator to spatial efficently store data.\n\n- config: [dict] A dictionary containing all parameters. The other arguments and this are mutually exclusive.\n\n- file_dir: [str] Where your raw data is.\n\n- file_list: [str] A csv file in which all the raw datum file name and labels are listed.\n\n- dtype: [list of str] A list of data types of all columns but the first one in *file_list*. Valid data types are 'uint8', 'uint16', 'uint32', 'int8', 'int16', 'int32', 'int64', 'float16', 'float32', 'float64', 'str'. Plus, if you add a 'v' as initial character e.g. 'vuint8', the data of each row in this column is allowed to be saved in variable length.\n\n- is_seq: [bool] If it is data sequence e.g. video frames. Defaults to false.\n\nAn example of file_list.csv is as follow. 'image' and 'label' are the key names of data and labels respectively. Note that the image name is a path relative to *file_dir*.\n\n| image | label |\n| ----------- | ----- |\n| img_1.jpg | 2 |\n| img_2.jpg | 0 |\n| ... | ... |\n| img_100.jpg | 5 |\n\nif *is_seq* is True, the csv file is supposed to look like the example below (when char-separator is ',' and quoting-char is '\"'):\n\n| image | label |\n| ------------------------------------------------------------ | ----- |\n| \"vid_1_frame_1.jpg,vid_1_frame_2.jpg,...,vid_1_frame_24.jpg\" | 2 |\n| \"vid_2_frame_1.jpg,vid_2_frame_2.jpg,...,vid_2_frame_15.jpg\" | 0 |\n| ... | ... |\n| \"vid_100_frame_1.jpg,vid_100_frame_2.jpg,...,vid_100_frame_39.jpg\" | 5 |\n\n\n\n**FuelGenerator.generate(dst_path, height, width, channel=3, encode='JPEG', shards=1, keep_exif=True)**\n\n- dst_path: [str] A hdf5/npz file where you want to save the data.\n- height: [int] range between (0, +\u221e). The height of image data.\n- width: [int] range between (0, +\u221e). The height of image data.\n- channel: [int] The height of image data. Defaults to 3.\n- encode: [str] The mean by which image data is encoded. Valid encoders are 'jpeg' and 'png'. 'PNG' is the way without information loss. Defaults to 'JPEG'.\n- shards: [int] How many files you need to split the data into. Defaults to 1.\n- keep_exif: [bool] Whether to keep EXIF information of photos. Defaults to true.\n\n```python\nimport nebulae as neb\n# create a data generator\nfg = neb.fuel.Generator(file_dir='/home/file_dir',\n file_list='file_list.csv',\n dtype=['vuint8', 'int8'])\n# generate compressed data file\nfg.generate(dst_path='/home/data/fuel.hdf5', \n channel=3,\n height=224,\n width=224)\n```\n\n\n\n**FuelGenerator.modify(config=None)**\n\nYou can edit properties again for generating other file.\n\n```python\nfg.modify(height=200, width=200)\n```\n\nPassing a dictionary of changed parameters is equivalent.\n\n```python\nconfig = {'height': 200, 'width': 200}\nfg.modify(config=config)\n```\n\n\n\n**Tank(data_path, data_specf, batch_size, shuffle, in_same_size, fetch_fn, prep_fn, collate_fn)**\n\nBuild a Fuel Tank that allows you to deposit datasets.\n\n- data_path: [str] The full path of your data file. It must be a hdf5/npz file.\n- data_specf: [dict] A dictionary containing key-dtype pairs.\n- batch_size: [int] The size of a mini-batch.\n- shuffle: [bool] Whether to shuffle data samples every epoch. Defaults to True.\n- in_same_size: [bool] Whether to ensure the last batch has samples as many as other batches. Defaults to True.\n- fetch_fn: [func] The function which takes a single datum from dataset.\n- prep_fn: [func] The function which preprocesses fetched datum. Defaults to None.\n- collate_fn: [func] The function which concatenates data as a mini-batch. Defaults to None.\n\nE.g.\n\n```python\nfrom nebulae.fuel import depot\n# define data-reading functions\ndef fetcher(data, idx):\n ret = {}\n ret['image'] = data['image'][idx]\n ret['label'] = data['label'][idx].astype('int64')\n return ret\n\ndef prep(data):\n # convert to channel-first format\n data['image'] = np.transpose(data['image'], (2, 0, 1)).astype('float32')\n return data\n\n# create a data depot\ntk = depot.Tank(\"/home/dataset.hdf5\",\n {'image': 'vunit8', 'label': 'int64'},\n batch_size=128, shuffle=True, \n fetch_fn=fetcher, prep_fn=prep)\n```\n\n\n\n**Tank.next()** \n\nReturn a batch of data, labels and other information.\n\n\n\n**Tank.MPE**\n\nAttribute: how many iterations there are within an epoch for each dataset.\n\n\n\n**len(Tank)**\n\nAttribute: the number of datum in this dataset.\n\n\n\n**Comburant()**\n\nComburant is a container to pack up all preprocessing methods.\n\n
| Data Source | Augmentation | Usage | \n
|---|---|---|
| Image | flip | flip matrix vertically or horizontally | \n
| crop | crop matrix randomly with a given area and aspect ratio | \n|
| rotate | rotate matrix randomly within a given range | \n|
| brighten | adjust brightness given an increment/decrement factor | \n|
| contrast | adjust contrast given an expansion/shrinkage factor | \n|
| saturate | adjust saturation given an expansion/shrinkage factor | \n|
| Sequence | sampling | positive int, denoted as theta: sample an image every theta frames | \n