{ "info": { "author": "syy", "author_email": "1121225022@qq.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 2 - Pre-Alpha", "Environment :: Console", "Intended Audience :: Science/Research", "Natural Language :: Chinese (Simplified)", "Natural Language :: English", "Operating System :: OS Independent", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Programming Language :: Python :: 3 :: Only", "Topic :: Scientific/Engineering", "Topic :: Scientific/Engineering :: Artificial Intelligence", "Topic :: Scientific/Engineering :: Image Recognition" ], "description": "# \u529f\u80fd\n\u5c06\u6570\u636e\u6e90\u4e2d\u7684\u6570\u636e\u9001\u5230\u673a\u5668\u5b66\u4e60/\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u4e2d\u8fdb\u884c\u8bad\u7ec3\u3002\n\u65e0\u8bba\u6570\u636e\u5b58\u5728\u4e8e\u78c1\u76d8(mysql,mongodb,hive\u7b49)\u3001\u5185\u5b58(pandas,dask,spark,koalas\u7b49)\u6216\u8005\u663e\u5b58(rapids)\uff0c\n\u65e0\u8bba\u6570\u636e\u5927\u5c0f\uff0c\n\u65e0\u8bba\u6570\u636e\u662f\u683c\u5f0f\u5316\u8fd8\u662f\u975e\u683c\u5f0f\u5316\u3002\n\n\u5b9e\u73b0\u6570\u636e\u4e0e\u6a21\u578b\u5206\u79bb\uff0c\u4e0d\u4f1a\u5728\u6a21\u578b\u4e2d\u51fa\u73b0\u6570\u636e\u64cd\u4f5c\u3002\u6570\u636e\u4e0e\u6a21\u578b\u5206\u79bb(\u7b97\u6cd5\u4e0e\u6570\u636e\u7ed3\u6784\u5206\u79bb)\u7684\u4e00\u822c\u5f62\u5f0f\u5982\u4e0b\u793a\u4f8b\uff1a\n\n```python\nclass Model(object):\n ...\n def train(self,dataset):\n \"\"\" Model \u4f9d\u8d56\u4e8e\u4e00\u4e2a\u6570\u636e\u96c6\uff0c\u6216\u8005\u8bf4\uff0c\u4e00\u4e2a\u7b97\u6cd5\u4f9d\u8d56\u4e8e\u4e00\u4e2a\u6570\u636e\u7ed3\u6784\u3002\"\"\"\n pass\n\n\nmodel = Model()\nmodel.train(dataset) # all \n```\n\n\n\u6570\u636e\u4e0e\u6a21\u578b\u5206\u79bb\u7684\u5177\u4f53\u793a\u4f8b\u5982\u4e0b\uff1a\n\n```python\nimport pandas as pd \nfrom dataset import DataSet\n\npdf = pd.DataFrame(...) # \u6570\u636e\u6e90\n\nds = DataSet(pdf) # \u6570\u636e\u96c6\n\ntrainDataSet,testDataSet = ds.split_dataset(frac) # \u5212\u5206\u8bad\u7ec3\u96c6\u3001\u6d4b\u8bd5\u96c6\n\nfor epochs:\n for outputDict in trainDataSet: # trainDataSet \u662f\u53ef\u8fed\u4ee3\u7684\u5f62\u5f0f\n # batch\n x = outputDict[\"columnsx\"]\n y = outputDict[\"columnsy\"]\n feed_dict = {\n columnsx:x,\n columnsy:y\n }\n sess.run(train_op,loss_op,feed_dict=feed_dict)\n```\n\n\n\n\n# \u539f\u7406\u4ecb\u7ecd\n\n## \u6570\u636e\u96c6\n1. \u6570\u636e\u96c6\u4e2d\u6837\u672c\u4e2a\u4f53\u7684\u7279\u70b91\uff1a\u6837\u672c\u662f\u968f\u673a\u7684\uff0c\u6ca1\u6709\u5927\u5c0f\u987a\u5e8f\u7684\u3002\n2. \u6570\u636e\u96c6\u4e2d\u6837\u672c\u4e2a\u4f53\u7684\u7279\u70b92\uff1a\u591a\u4e00\u4e2a\u5c11\u4e00\u4e2a\u65e0\u6240\u8c13\u3002\n3. \u6570\u636e\u96c6\u4e2d\u6837\u672c\u4e2a\u4f53\u7684\u7279\u70b93\uff1a\u72ec\u7acb\u540c\u5206\u5e03\u7684 iid \u3002\n4. \u6570\u636e\u96c6\u7684\u5b50\u96c6\u6027\uff1a\u6570\u636e\u96c6\u5206\u5272\u540e\uff0c\u4ecd\u7136\u662f\u6570\u636e\u96c6\u3002\u5373\u4fbf\u662f\u4e00\u4e2a\u6837\u672c\uff0c\u4e5f\u662f\u4e00\u4e2a\u6570\u636e\u96c6\u3002\n5. \u6570\u636e\u96c6\u4e2d\u7684\u6837\u672c/\u4e2a\u4f53\u8981\u6709\u552f\u4e00\u6807\u8bc6\u7b26\uff08\u7d22\u5f15 idxs\uff09\u3002\n6. \u4e0d\u9700\u8981\u6570\u636e\u6e05\u6d17\u548c\u6570\u636e\u5206\u6790(describe/mean/max)\u7b49\u64cd\u4f5c\u3002\u56e0\u4e3a mean/max \u672c\u8eab\u5c31\u662f\u4e00\u79cd\u7b97\u6cd5\u3002\n7. \u4e0d\u9700\u8981\u89c2\u5bdf\u5168\u8c8c\uff0c\u5206\u6279\u8bfb\u53d6\u5373\u53ef\uff0c\u89c2\u5bdf\u5168\u8c8c\u662f\u6570\u636e\u6e05\u6d17\u7684\u4e8b\u60c5\u3002\n8. \u5f88\u591a\u64cd\u4f5c\uff08merge/append/split\u7b49\uff09\u5728\u7d22\u5f15 idxs \u4e0a\u64cd\u4f5c\u5373\u53ef\uff0c\u4e0d\u9700\u8981\u771f\u5b9e\u5230\u6570\u636e\u4e0a\u64cd\u4f5c\u3002\n9. \u6240\u6709\u6570\u636e\u64cd\u4f5c\uff08\u6570\u636e\u6e05\u6d17\u7b49\uff09\u90fd\u5e94\u8be5\u7528\u5176\u4ed6\u5305\u6765\u5904\u7406\uff0c\u800c\u4e0d\u662fdataset\u3002\u5f53\u6700\u7ec8\u5f62\u6210df\u7684\u65f6\u5019\uff0c\u624d\u8f6c\u6362\u4e3a dataset\u3002\n\n\n\n\n## \u7d22\u5f15\n1. idxs \u662f\u6837\u672c\u5728\u6570\u636e\u96c6\u4e2d\u7684\u552f\u4e00\u6807\u8bc6\uff0c\u53ef\u4ee5\u968f\u610f\u66f4\u6539\uff0c\u53ea\u8981\u80fd\u533a\u5206\u4e0d\u540c\u4e2a\u4f53\u3002\n2. \u7d22\u5f15\u7684\u6570\u636e\u7c7b\u578b\uff1a\u539f\u7406\u4e0a\uff0c\u53ea\u8981\u80fd hash \u7684\u6570\u636e\u90fd\u53ef\u4ee5\u4f5c\u4e3a\u7d22\u5f15\uff0c\u4f46\u540c\u4e00\u4e2a\u6570\u636e\u96c6\u4e2d\u7684\u7d22\u5f15\u5e94\u8be5\u5c3d\u53ef\u80fd\u7684\u4fdd\u6301\u4e00\u81f4\u3002\n3. \u7d22\u5f15\u7684\u6570\u636e\u6027\uff1a\u6837\u672c\u7d22\u5f15 idxs \u4e0d\u53ef\u4ee5\u91cd\u590d\uff0c\u4f46\u6837\u672c\u7684\u6570\u503c\u662f\u53ef\u4ee5\u91cd\u590d\u7684\u3002\n4. \u7d22\u5f15\u6620\u5c04\uff1a\u7531\u4e8e\u5bf9\u539f\u59cb\u6570\u636e\u53ea\u8bfb\u4e0d\u5199\uff0c\u6240\u4ee5\u5728\u66f4\u6539\u7d22\u5f15\u65f6\uff0c\u53ea\u662f\u5bf9\u539f\u59cb\u6570\u636e\u7684\u7d22\u5f15\u8fdb\u884c\u4e86\u4e00\u5bf9\u4e00\u6620\u5c04\u3002\n5. \u5c3d\u53ef\u80fd\u7684\u7528\u6570\u503c\u6765\u505a\u7d22\u5f15\uff0c\u5e76\u4e14\u7531\u4e8e\u7d22\u5f15\u6620\u5c04\u7684\u5b58\u5728\uff0c\u4e0d\u5e94\u8be5\u5728\u4f7f\u7528 dataset \u65f6\u592a\u591a\u7684\u5173\u6ce8\u7d22\u5f15\u7684\u503c\uff0c\u53ea\u9700\u8981\u80fd\u591f\u6309\u7167\u7d22\u5f15\u627e\u5230\u76f8\u5e94\u7684\u6570\u636e\u5373\u53ef\u3002\n6. \u7d22\u5f15\u7684\u72ec\u7acb\u6027\uff1a\u6570\u636e\u96c6\u4e2d\u7684\u4efb\u4f55\u5c5e\u6027/\u5b57\u6bb5\u90fd\u4e0d\u5e94\u8be5\u548c idxs \u6709\u5173\u8054(\u5c5e\u6027\u503c=f(idxs))\u3002\u6bd4\u5982\uff1a\u56fe\u7247\u540d\u79f0\u5c31\u53ef\u80fd\u662f\u7528 idxs \u4f5c\u4e3a\u540d\u79f0\u7684\uff0c\u53ef\u4ee5\u7528\u65f6\u95f4\u7b49\u5177\u6709\u552f\u4e00\u6027\u7684\u91cf\u4f5c\u4e3a\u56fe\u7247\u540d\u79f0\u3002\u5c06 fileName = f(id) \u6539\u4e3a fileName = f(time)\n7. \u7d22\u5f15\u7684\u672c\u8d28\uff1a\u8981\u80fd\u591f\u4f9d\u636e idxs \u53d6\u51fa\u6570\u636e\u3002\n\n\n\n## \u6570\u636e\u672c\u8eab\n* \u5bf9\u6570\u636e\u6e90 text,sql,pandas \u4e2d\u7684\u6570\u636e\uff0c\u53ea\u8bfb\u4e0d\u5199\uff01\n\n\n## \u8fed\u4ee3 batch\n1. \u6837\u672c\u6570\u91cf\uff1a\u539f\u7406\u4e0a\uff0c\u6bcfbatch\u7684\u6837\u672c\u6570\u91cf <= batchSize\uff0c\u5982\u679c\u9700\u8981\uff0c\u53ef\u4ee5\u8bbe\u7f6e batch \u7684\u6837\u672c\u6570\u91cf\u4e25\u683c\u7b49\u4e8e batchSize\u3002\n2. batch \u7684\u6570\u636e\u64cd\u4f5c1\uff1a\u6240\u6709\u53d8\u6362/\u64cd\u4f5c\u662f\u5728 batch data \u4e0a\u8fdb\u884c\uff0c\u800c\u4e0d\u662f\u5148\u53d8\u6362\u540e \u5212\u5206 batch \uff01\uff01\uff01\n3. batch \u7684\u6570\u636e\u64cd\u4f5c2\uff1a\u7531\u4e8e2\u7684\u5b58\u5728\uff0cappend,merge,fillna,dropna,select_columns \u7b49\u7b49\u51e0\u4e4e\u6240\u6709\u7684\u6570\u636e\u64cd\u4f5c\uff0c\u4ee5\u53ca\u66f4\u4e00\u822c\u7684\uff0c\u50cfconvert_to_numpy(\u672c\u8d28\u662fapply)\u8fd9\u6837\u7684\uff0c\u90fd\u662f\u5728 batch data \u4e0a\u8fdb\u884c\u7684\u3002\n4. batch \u7684\u8f93\u51fa\u6570\u636e\u7c7b\u578b\uff1a\u8f93\u51fa\u662f dict \u7c7b\u578b\uff0c{\"columnname1\":batch data, \"columnname2\":batch data}\n5. batch \u7684\u8f93\u51fa\u6570\u636e\u7279\u70b91\uff1abatch \u7684\u8f93\u51fa\u53ea\u80fd\u662f\u6570\u503c\u7c7b\u578b\u3002\n6. batch \u7684\u8f93\u51fa\u6570\u636e\u7279\u70b92\uff1abatch \u7684\u8f93\u51fa\u53ea\u80fd\u662f\u6570\u503c\u7c7b\u578b\uff0c\u4e14\u4e0d\u80fd\u6709 nan\u3002\n7. batch \u8f93\u51fa\u4e2d\u7684 nan \u9700\u8981\u5728batch data\u4e0a\u8fdb\u884c\u64cd\u4f5c\uff0c\u5c06\u5176\u53bb\u6389\u6216\u6362\u6389\u3002\n8. batch \u901f\u5ea6\u6027\uff1a\u4e0d\u9700\u8981\u5b9e\u65f6\uff08\u78c1\u76d8\u5185\u5b58\u65e0\u6240\u8c13\uff09\u3002\n9. batch \u9ad8\u53ef\u7528\u6027\uff1a\u6570\u636e\u96c6\u5728\u8fed\u4ee3 iter \u65f6\u9700\u8981\u6709\u9ad8\u53ef\u7528\u6027\uff0c\u4e0d\u80fd\u5728\u8bad\u7ec3\u4e2d\u9014\u51fa\u73b0\u6545\u969c\u3002\n\n\n## \u6570\u503c\u8f6c\u6362 convert fun\n1. \u8fdb\u5165\u5230\u6a21\u578bf(x)\u4e2d\u7684\u6570\u636e\u53ea\u80fd\u662f\u6570\u503c\u6570\u636e\uff0c\u800c\u4e0d\u662f\u5b57\u7b26\u4e32str/json/\u5b57\u8282\u6570\u7ec4/\u6811tree/\u56fegraph\u3002\u67d0\u4e2a\u5b57\u6bb5/\u5c5e\u6027\u7684\u503c\u53ef\u80fd\u662f\u5f88\u590d\u6742\u7684\u6570\u636e\u7ed3\u6784\uff1a\u6811/\u56fe\u7b49\u7b49\uff0c\u8981\u628a\u8fd9\u4e9b\u590d\u6742\u7684\u6570\u636e\u7ed3\u6784\u8f6c\u5316\u4e3a\u6570\u503c\u6570\u636e\u3002\n2. \u6570\u503c\u8f6c\u6362\u51fd\u6570\u7684\u5f62\u5f0f\uff1a\u6570\u503c\u8f6c\u6362\u51fd\u6570 convert_fun \u53ef\u4ee5\u662f\u4efb\u610f\u5f62\u5f0f\u7684\u3002\u6bd4\u5982\uff0c\u8f93\u5165\u662f\u4e00\u5f20\u56fe\u7247\uff0c\u8f6c\u6362\u51fd\u6570\u53ef\u4ee5\u662f\u4e00\u4e2a\u795e\u7ecf\u7f51\u7edc\uff0c\u8f93\u51fa\u662f\u4ece\u56fe\u7247\u4e2d\u63d0\u53d6\u7684\u7279\u5f81\u5411\u91cf\uff0c\u6216\u8005\u7279\u5f81\u56fe\u3002\n\n3. \u6570\u503c\u8f6c\u6362\u51fd\u6570\u7684\u8f93\u5165\uff1a\u6570\u503c\u8f6c\u6362\u51fd\u6570f\u53ea\u5e94\u8be5\u6709\u4e00\u4e2a\u8f93\u5165\u503cb=f(a)\u3002\u5982\u679c\u6709\u5176\u4ed6\u7684\u53c2\u6570\uff0c\u5e94\u8be5\u5c06\u53c2\u6570\u8bb0\u5f55\u5230\u9759\u6001\u91cfconfig\u4e2d\uff0c\u5e76\u5728\u51fd\u6570\u4f53\u5185\uff0c\u76f4\u63a5\u5f15\u7528\u3002\n4. \u6570\u503c\u8f6c\u6362\u51fd\u6570\u7684\u8f93\u51fa\uff1a\u6570\u503c\u8f6c\u6362\u51fd\u6570f\u53ea\u5e94\u8be5\u6709\u4e00\u4e2a\u8f93\u51fa\uff0c\u4e14\u8f93\u51fa\u7684\u7c7b\u578b\u662fnumpy.ndarray\n\n5. convert \u51fd\u6570\u4e2d\u4e0d\u80fd\u8ba1\u7b97\u5168\u5c40\u91cf mean \u7b49\u3002\u4f8b\u5982\uff1a\u4e0d\u80fd\u5728 convert \u51fd\u6570\u4e2d\uff0c\u8c03\u7528\u8be5\u5217\u7684\u5e73\u5747\u503c\u7b49\u3002\n6. \u9006\u51fd\u6570\uff1a\u5bf9\u6bcf\u4e2a\u6570\u503c\u8f6c\u6362\u51fd\u6570\u90fd\u5e94\u8be5\u5c3d\u53ef\u80fd\u7684\u63d0\u4f9b\u5176\u53cd\u65b9\u5411\u7684\u51fd\u6570\uff0c\u6bd4\u5982\u5bf9\u6587\u672c\u7684onehot\u7f16\u7801 array=one_hot(str) \u800c\u8a00\uff0c\u63d0\u4f9b\u5176\u53cd\u51fd\u6570 str=one_hot_inv(array). \n7. \u6570\u503c\u8f6c\u6362\u51fd\u6570\u662f\u5c06\u6570\u636e\u96c6\u4e2d\u67d0\u5217\u7684\u6570\u636e\uff0c\u7531\u975e\u6570\u503c\u7c7b\u578b\uff0c\u8f6c\u6362\u5230\u6700\u7ec8\u7684\u6570\u503c\u7c7b\u578b\uff0c\u6bd4\u5982\uff1a\u5c06\u56fe\u7247\u8def\u5f84\u8f6c\u6362\u4e3a\u56fe\u7247\u6570\u636e\uff0c\u5982\u4e0b\uff1a\n\n```python\ndef convert_fun(image_path:str) -> np.ndarray:\n \n a = imread(image_path)\n\n b = imresize(a)\n\n c = other(b)\n\n tensor = resnet_fc7(c) # resnet \u795e\u7ecf\u7f51\u7edc\u4ece\u56fe\u7247\u4e2d\u63d0\u53d6\u7279\u5f81\u5411\u91cf\n\n return tensor\n\n# \u7136\u540e\uff0c\u6307\u5b9a dataset \u7684 image_path \u5217\u5173\u8054\u8fd9\u4e2a\u8f6c\u6362\u51fd\u6570(convert_fun)\u5373\u53ef\uff0c\u5982\u4e0b\u793a\u4f8b\uff1a\nb = {\n \"image_path\" : convert_fun,\n}\n\nds.convertFunDict = b\n```\n\n## config\n1. \u6570\u636e\u96c6\u4e2d\u7684\u5176\u4ed6\u8d85\u53c2\u6570\u653e\u5230 config \u4e2d\uff0c\u4f5c\u4e3a\u5168\u5c40\u53d8\u91cf\u5b58\u5728\u3002\n2. \u5982\u679c\u6709\u5176\u4ed6\u7684\u53c2\u6570\uff0c\u5f88\u53ef\u80fd\u662f\u6570\u503c\u8f6c\u6362\u51fd\u6570f\u7684\u53c2\u6570\u3002\n3. \u6709\u4e9b\u53c2\u6570\u53ef\u80fd\u5728 model \u4e2d\u7528\u5230\uff0c\u5728 dataset \u4e2d\u4e5f\u7528\u5230\u3002\n4. dataset \u672c\u6765\u53ea\u6709\u4e00\u4e2a\u53c2\u6570batchsize\uff0c\n\n\n\n# \u5b89\u88c5\n## \u5e73\u53f0\nwin linux\n\n## \u4f9d\u8d56\n\u53ef\u9009\u4f9d\u8d56\n* spark \uff1a\u5927\u6570\u636e\u64cd\u4f5c\u5c3d\u53ef\u80fd\u9009\u62e9\u4e4b\uff01\n* koalas \uff1a\u5927\u6570\u636e\u64cd\u4f5c\u5c3d\u53ef\u80fd\u9009\u62e9\u4e4b\uff01\n* dask \uff1a\u4e2d\u7b49\u6570\u636e\u64cd\u4f5c\u5c3d\u53ef\u80fd\u9009\u62e9\u4e4b\uff01\n\n## pip \u5b89\u88c5\u547d\u4ee4 \n\n### only core\n\u4e0d\u4f1a\u5b89\u88c5 dask spark \u7684\u6269\u5c55\u4f9d\u8d56\uff0c\u53ea\u6709\u5f53\nYou can also install only the dataset library. spark,dask,koalas\u7b49\u6570\u636e\u6e90 won\u2019t work until you also install pyspark,koalas,dask,django, respectively. This is common for downstream library maintainers:\n\npip install dataset # Install only core parts of dask\n\n\n### big data\npip install dataset[koalas]\npip install dataset[pyspark]\npip install dataset[dask]\npip install dataset[bigdata]\n\n\n\n### orm\npip install dataset[django]\n\n### gpu \nTODO pip install dataset\n\n\n\n# \u793a\u4f8b\n* easy_demo\n```python\n\"\"\" ds \u57fa\u672c\u529f\u80fd\n\u521b\u5efa\n\u67e5\u770b show\u3001idxs\u3001nSamples\n\u7b80\u5355\u8fed\u4ee3\n\u5212\u5206\u8bad\u7ec3\u96c6\u548c\u6d4b\u8bd5\u96c6\niter --> tf\u3001keras\u3001pytorch\u3001sklearn\niterepoch\n\u4e25\u683c\u6a21\u5f0f\n\n\"\"\"\n\nimport pandas as pd \nimport numpy as np \nfrom datasets import DataSet\n\n\n\n\n# ----------------------------------------------------\n# \u521b\u5efa\n\ndf = pd.DataFrame(np.random.uniform(size=(20,3)),columns=list(\"abc\"))\ndf.reset_index(inplace=True) # --> index a b c\n\nds = DataSet(df,batchSize=7)\n\n\n\n\n# ----------------------------------------------------\n# \u67e5\u770b\n\nds.sampleNum # \u6570\u636e\u96c6\u5927\u5c0f\n# >>> \n# 20\n\nidxs = ds.get_idxs() # \u6570\u636e\u96c6\u4e2d\u6bcf\u4e2a\u6837\u672c\u7684 id\n# >>> \n# [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]\n\nds.show() # \u6253\u53701\u6279\u6b21\u7684\u6570\u503c\u6570\u636e\uff0c\u4ee5\u53ca\u6bcf\u4e2a\u53d8\u91cf\u7684\u5f62\u72b6\n# >>> \n# a b c index\n# 0 0.62919 0.0851551 0.225027 0\n# 1 0.848418 0.708781 0.440574 1\n# 2 0.521759 0.620585 0.932888 2\n# 3 0.727612 0.920076 0.154726 3\n# 4 0.90353 0.243451 0.881314 4\n# 5 0.102216 0.80945 0.200341 5\n# 6 0.750237 0.332719 0.618396 6\n# shape (7,) (7,) (7,) (7,)\n\n\n\n\n\n# ----------------------------------------------------\n# \u8fed\u4ee3\n\nfor outputDict in ds:\n print(outputDict.keys())\n # >>> \n # dict_keys(['c', 'index', 'a', 'b'])\n index = outputDict[\"index\"] # \u63d0\u53d6\u8be5 batch \u67d0\u4e00\u53d8\u91cf\u7684\u6570\u503c\u6570\u636e\uff0c\u7528\u4e8e\u6a21\u578b\u8f93\u5165\n # >>> \n # array([0, 1, 2, 3, 4, 5, 6])\n # array([ 7, 8, 9, 10, 11, 12, 13])\n # array([14, 15, 16, 17, 18, 19])\n\nfor outputDict in ds.iter():\n # \u4f7f\u7528 iter() \u548c\u4e0d\u4f7f\u7528\u4e8c\u8005\u6548\u679c\u662f\u4e00\u6837\u7684\n print(outputDict.keys())\n index = outputDict[\"index\"]\n\n \nfor outputDict in ds.iter_epochs(3):\n # iter() \u53ea\u904d\u5386\u6574\u4e2a\u6570\u636e\u96c6\u4e00\u6b21\n # iter_epochs(3) \u904d\u5386\u6574\u4e2a\u6570\u636e\u96c6 3 \u6b21\n print(outputDict.keys())\n index = outputDict[\"index\"]\n\n\n\n\n# ----------------------------------------------------\n# \u8fed\u4ee3\u7684\u4e25\u683c\u6a21\u5f0f isStrictBatchSize\n# \n# \u8bf4\u660e\uff1a\n# \u975e\u4e25\u683c\u6a21\u5f0f\u4e0b\uff0c\u6bcf\u4e2a batch \u7684\u6837\u672c\u6570\u91cf\u662f\u4e0d\u786e\u5b9a\u7684\uff0c\u53ea\u80fd\u4fdd\u8bc1\u6837\u672c\u6570\u91cf\u662f<=batchSize\u7684\u3002\n# \u4e25\u683c\u6a21\u5f0f\u53ef\u4ee5\u8ba9\u6bcf batch \u7684\u6837\u672c\u6570\u91cf\u4e25\u683c\u7b49\u4e8e batchSize\n#\n# \u4e25\u683c\u6a21\u5f0f\u7684\u4f7f\u7528\u5f88\u7b80\u5355\uff0c\u53ea\u9700\u8981\u8bbe\u7f6e\u5c5e\u6027 isStrictBatchSize=True \u5373\u53ef.\n# \n\nds = DataSet(...)\nds.isStrictBatchSize = True\n\n# \u6216\u8005\nds = DataSet(...,isStrictBatchSize=True)\n\nfor outputDict in ds.iter_epochs(3):\n # iter() \u53ea\u904d\u5386\u6574\u4e2a\u6570\u636e\u96c6\u4e00\u6b21\n # iter_epochs(3) \u904d\u5386\u6574\u4e2a\u6570\u636e\u96c6 3 \u6b21\n print(outputDict.keys())\n index = outputDict[\"index\"]\n\n\n\n\n# ----------------------------------------------------\n# \u5212\u5206\u8bad\u7ec3\u96c6\u548c\u6d4b\u8bd5\u96c6\n# \n# \u8bf4\u660e\uff1a\n# 1\u3001\u5212\u5206\u6570\u636e\u96c6\u65f6\uff0c\u5185\u90e8\u6df1\u5ea6 copy \u4e86\u539f\u6570\u636e\u96c6\u7684idxs\uff0c\n# \u4f46\u6570\u636e\u96c6\u672c\u8eab\u7684\u6570\u636e\u6ca1\u6709\u53d8\u5316\uff0c\u76f8\u5f53\u4e8e\u8be5\u6570\u636e\u96c6\u6709\u591a\u4e2a idxs \u3002\n#\n# 2\u3001\u9a8c\u8bc1\u96c6\u8bf4\u660e\uff1a\n# isHasValidate:bool=True \u662f\u5426\u6709\u9a8c\u8bc1\u96c6\n# strategyValidate:int=1 \u9a8c\u8bc1\u96c6\u7684\u751f\u6210\u65b9\u5f0f\uff1a\u9a8c\u8bc1\u96c6\u7684\u751f\u6210\u65b9\u5f0f default=1 1\u8868\u793a\u9a8c\u8bc1\u96c6\u5728\u8bad\u7ec3\u96c6\u4e2d\uff0c\u53c2\u52a0\u8bad\u7ec3\uff0c2\u8868\u793a\u9a8c\u8bc1\u96c6\u4e0d\u5728\u8bad\u7ec3\u96c6\u4e2d\uff0c\u4e0d\u53c2\u52a0\u8bad\u7ec3\n# \u5982\u679c\u6709\u9a8c\u8bc1\u96c6\uff1a\n# \u5982\u679c\u9a8c\u8bc1\u96c6\u4e0d\u53c2\u4e0e\u8bad\u7ec3\uff1a\n# \u5c06\u8bad\u7ec3\u96c6\u518d\u6b21\u6309 1-frac \u5212\u5206\u6210\u8bad\u7ec3\u96c6\u548c\u9a8c\u8bc1\u96c6\n# \u5982\u679c\u9a8c\u8bc1\u96c6\u53c2\u4e0e\u8bad\u7ec3\uff1a\n# \u5c06\u9a8c\u8bc1\u96c6\u6309 1-frac \u6311\u9009\u90e8\u5206\u505a\u4e3a\u9a8c\u8bc1\u96c6\n# \n\nfrac=0.8 # \u8bad\u7ec3\u96c6\u6240\u5360\u6bd4\u4f8b\ntrainDataSet, validDataSet, testDataSet = ds.split_dataset(frac)\n\n\n\n\n# ----------------------------------------------------\n# \u8bad\u7ec3\n\n# tensorflow \u8bad\u7ec3\u7248\u672c\nfor epoch:\n for outputDict in ds:\n # batch\n x = outputDict[\"columnsx\"]\n y = outputDict[\"columnsy\"]\n feed_dict = {\n columnsx:x,\n columnsy:y\n }\n sess.run(train_op,loss_op,feed_dict=feed_dict)\n ds.shuffle_idxs() # \u6bcf\u4e00 epoch \u90fd\u6253\u4e71\u4e00\u4e0b idxs\n\n\n# tf epoch\u8bad\u7ec3\u7248\u672c\nfor outputDict in ds.iter_epochs(12):\n # iter_epochs \u81ea\u52a8\u6bcf\u4e00 epoch \u90fd\u6253\u4e71\u4e00\u4e0b idxs\n x = outputDict[\"columnsx\"]\n y = outputDict[\"columnsy\"]\n feed_dict = {\n columnsx:x,\n columnsy:y\n }\n sess.run(train_op,loss_op,feed_dict=feed_dict)\n\n# pytorch \u8bad\u7ec3\u7248\u672c\nfor epoch :\n for outputDict in ds:\n # batch\n x = outputDict[\"columnsx\"]\n y = outputDict[\"columnsy\"] \n \n optimizer.zero_grad()\n yhat = model(x)\n loss = criterien(y,yhat)\n loss.backend()\n optimizer.step()\n ds.shuffle_idxs() # \u6bcf\u4e00 epoch \u90fd\u6253\u4e71\u4e00\u4e0b idxs\n\n\n# pytorch epoch \u8bad\u7ec3\u7248\u672c\uff1a\nfor outputDict in ds.iter_epochs(12):\n x = outputDict[\"columnsx\"]\n y = outputDict[\"columnsy\"]\n\n optimizer.zero_grad()\n yhat = model(x)\n loss = criterien(y,yhat)\n loss.backend()\n optimizer.step()\n\n\n# keras \u8bad\u7ec3\u7248\u672c\nfor epochs:\n for outputDict in ds:\n # batch\n x = outputDict[\"columnsx\"]\n y = outputDict[\"columnsy\"] \n model.train_on_batch(x,y)\n ds.shuffle_idxs() # \u6bcf\u4e00 epoch \u90fd\u6253\u4e71\u4e00\u4e0b idxs\n\n# keras epochs \u8bad\u7ec3\u7248\u672c\nmodel.fit_generator(ds.iter_epochs(12))\n```\n\n* \u66f4\u591a\u4f8b\u5b50\u53c2\u8003\uff1a```example```\n\n\n# \u652f\u6301\u7684\u6570\u636e\u6e90/\u5185\u5b58\u6570\u636e\n* PandasDataManager \uff1a \u63a5\u7ba1 pandas \u4e2d\u7684\u6570\u636e\uff0c\u53ea\u8bfb\u4e0d\u5199\n* DaskDataManager \uff1a \u63a5\u7ba1 dask \u4e2d\u7684\u6570\u636e\uff0c\u53ea\u8bfb\u4e0d\u5199\n* SparkDataManager \uff1a \u63a5\u7ba1 spark \u4e2d\u7684\u6570\u636e\uff0c\u53ea\u8bfb\u4e0d\u5199\n* KoalasDataManager : \u63a5\u7ba1 koalas \u4e2d\u7684\u6570\u636e\uff0c\u53ea\u8bfb\u4e0d\u5199\n* DjangoOrmDataManager \uff1a\u63a5\u7ba1 django model \u4e2d\u7684\u6570\u636e\uff0c\u53ea\u8bfb\u4e0d\u5199\n* SqlDataManager \uff1a\u63a5\u7ba1 mysql \u4e2d\u7684\u6570\u636e\uff0c\u53ea\u8bfb\u4e0d\u5199\n\n* \u7528\u6237\u81ea\u5b9a\u4e49\uff01\n* RapidsDataManager \uff1aTODO : \u63a5\u7ba1 Rapids \n* RayDataManager \uff1aTODO : \u63a5\u7ba1 Ray\n* DparkDataManager \uff1aTODO : \u63a5\u7ba1 Dpark\n* MarsDataManager \uff1aTODO : \u63a5\u7ba1 Mars\n\n\u793a\u4f8b\uff1a\n```python\nimport dataset\nfrom dataset import DataSet\n\nimport pandas as pd \nfrom dataset import PandasDataManager\n\npdf = pd.DataFrame(...)\npdm = PandasDataManager(pdf)\npds = DataSet(pdf)\n\n```\n```python\nfrom dataset import DataSet\nfrom dataset import DjangoOrmDataManager\n\nfrom django.db import models\n\nclass OrmModel(models.Model):\n pass\n\ninputClass = Person\normdm = DjangoOrmDataManager(inputClass,idColumnsName=\"id\")\n\normds = DataSet(ormdm)\n```\n\n\n\n# \u652f\u6301\u7684\u78c1\u76d8\u6570\u636e\u6e90/IO\n* csv\n* excel\n* json\n* txt\n* hdf5\n* parquet\n* orc\n* hive\n* mongodb\n* elasticearch\n* solr\n\n\u793a\u4f8b\uff1a\n```python\nimport dataset\nfrom dataset import DataSet\n\nds = dataset.read_csv(path_str)\n```\n\n\n# \u529f\u80fd\n## \u65b0\u6837\u672c\n## \u6570\u636e\u96c6\u5212\u5206\n\n\n# \u56de\u6536\n\n## convert\n\u6709\u597d\u7684\u6570\u503c\u8f6c\u6362\u64cd\u4f5c\u53ef\u4ee5\u63d0\u8fc7\u7ed9\u6211\uff0c\u5341\u5206\u611f\u8c22\uff01\n\nconvert \u51fd\u6570\u8bbe\u8ba1\u89c4\u5219\uff1a\n1. \u5c3d\u53ef\u80fd\u53ea\u6709\u4e00\u4e2a\u8f93\u5165\u53c2\u6570\u3002\n2. \u53ea\u6709\u4e00\u4e2a\u8f93\u51fa\u53c2\u6570\uff0c\u4e14\u8f93\u51fa\u7c7b\u578b\u662f np.ndarray.\n3. \u5c3d\u53ef\u80fd\u63d0\u4f9b\u9006\u51fd\u6570.\n\n\n\n## io\n\u6709\u5176\u4ed6\u683c\u5f0f\u6587\u4ef6\u7684io\u53ef\u4ee5\u63d0\u4f9b\u7ed9\u6211\uff0c\u5341\u5206\u611f\u8c22\uff01\n\n## datamanager\n\u5b9e\u73b0\u4e86\u5176\u4ed6\u6570\u636e\u6e90\u7684 datamanager \u7c7b\u7684\uff0c\u53ef\u4ee5\u63d0\u4f9b\u7ed9\u6211\uff0c\u5341\u5206\u611f\u8c22\uff01\n\ndatamanager \u6a21\u677f\u5982\u4e0b\uff1a\n```python\n\"\"\" datamanager \u6a21\u677f \"\"\"\n\nimport numpy as np \nimport pandas as pd \nfrom typing import Optional\n\nfrom datamanager import DataManager\nfrom datamanager import PYInterIterDataManager\n\nfrom idxsutils import _drop_idxs_nan\n\n\nclass ADataManager(DataManager):\n def __init__(self, df:?.DataFrame, idColumnsName:Optional[str]=None):\n self._dataFrame = df\n\n if idColumnsName is None:\n self._isIndex = True\n else:\n self._isIndex = False\n\n self._idColumnsName = idColumnsName\n\n @property\n def idColumnsName(self):\n return self._idColumnsName\n \n\n def get_idxs(self) -> np.ndarray:\n if self._isIndex:\n seriesIdxs = self._dataFrame.index # --> cudf.Index\n else:\n seriesIdxs = self._dataFrame[self._idColumnsName] # --> cudf.Series\n\n arrayIdxs = seriesIdxs.?.values\n arrayIdxs = _drop_idxs_nan(arrayIdxs) # -> np.ndarray(int64)\n\n return arrayIdxs\n\n\n def get_data_by_idxs(self,idxs:np.ndarray) -> PYInterIterDataManager:\n \n # NOTE: Index/Series \u662f\u5426\u90fd\u6709 isin ? \n # NOTE: isin \u8f93\u5165\u53c2\u6570\u7c7b\u578b ?\n \n if self._isIndex:\n _tableGetData = self._dataFrame[self._dataFrame.index.isin(idxs)] # --> ?.DataFrame\n else:\n _tableGetData = self._dataFrame[self._dataFrame[self._idColumnsName].isin(idxs)] \n\n tableGetData = _tableGetData.to_pandas() ?\n \n interIterDataManager = PYInterIterDataManager(self._idColumnsName,tableGetData)\n \n return interIterDataManager\n```", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/songyanyi/DataSet", "keywords": "dataset,data,machine learning,deep learning", "license": "BSD", "maintainer": "syy", "maintainer_email": "1121225022@qq.com", "name": "dataset-xy", "package_url": "https://pypi.org/project/dataset-xy/", "platform": "", "project_url": "https://pypi.org/project/dataset-xy/", "project_urls": { "Homepage": "https://github.com/songyanyi/DataSet" }, "release_url": "https://pypi.org/project/dataset-xy/0.1.4/", "requires_dist": null, "requires_python": ">=3.5", "summary": "\u5c06\u6570\u636e\u6e90\u4e2d\u7684\u6570\u636e\u9001\u5230\u673a\u5668\u5b66\u4e60/\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u4e2d\u8fdb\u884c\u8bad\u7ec3", "version": "0.1.4" }, "last_serial": 5813471, "releases": { "0.1.1": [ { "comment_text": "", "digests": { "md5": "17dbc5682160dca748adcf60dee1f6e6", "sha256": "dd7632b7dbfbc894da7142674b7310286e2f5d1b95974627dc44c907628983a8" }, "downloads": -1, "filename": "dataset-xy-0.1.1.tar.gz", "has_sig": false, "md5_digest": "17dbc5682160dca748adcf60dee1f6e6", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5", "size": 68606, "upload_time": "2019-09-10T04:10:54", "url": "https://files.pythonhosted.org/packages/69/f6/96aba09058efc19c395e2a3bbcaa77064fdf6e11f953f0b4059d9e7c0238/dataset-xy-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "49cb32b4df2c16ca3b0b37e8d7e0f85c", "sha256": "e3465ccd6497ff4281134d69a396f93fa6db07176aaa1d93d365c8fdbf274559" }, "downloads": -1, "filename": "dataset-xy-0.1.2.tar.gz", "has_sig": false, "md5_digest": "49cb32b4df2c16ca3b0b37e8d7e0f85c", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5", "size": 68578, "upload_time": "2019-09-10T10:11:01", "url": "https://files.pythonhosted.org/packages/18/b0/43fd5bbb700f0437a28f924630557ac397eac82858d87e5f180325f84374/dataset-xy-0.1.2.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "283ece195f912ee6c82e13c36e119260", "sha256": "fcf2199a219fa04bb783f4fe4b3ae6f8a3f83191e478a455b5c8d8fe3de4c5f0" }, "downloads": -1, "filename": "dataset-xy-0.1.3.tar.gz", "has_sig": false, "md5_digest": "283ece195f912ee6c82e13c36e119260", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5", "size": 68642, "upload_time": "2019-09-11T07:41:06", "url": "https://files.pythonhosted.org/packages/93/44/0886c363f0cfdba9d29d59a35aca70223c80f0ba2d455e459f6c95182e96/dataset-xy-0.1.3.tar.gz" } ], "0.1.4": [ { "comment_text": "", "digests": { "md5": "b910bf3e9a7f940369d966946f97307e", "sha256": "80a4cf94598a8d138b5ff710ebf85e45fc74938c22ed0b3be00cd41393191f46" }, "downloads": -1, "filename": "dataset-xy-0.1.4.tar.gz", "has_sig": false, "md5_digest": "b910bf3e9a7f940369d966946f97307e", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5", "size": 68766, "upload_time": "2019-09-11T07:48:32", "url": "https://files.pythonhosted.org/packages/fd/35/8a764da407d24814b76beba525a9b8b59a60d5d268d52ae47f3e749c5ec8/dataset-xy-0.1.4.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "b910bf3e9a7f940369d966946f97307e", "sha256": "80a4cf94598a8d138b5ff710ebf85e45fc74938c22ed0b3be00cd41393191f46" }, "downloads": -1, "filename": "dataset-xy-0.1.4.tar.gz", "has_sig": false, "md5_digest": "b910bf3e9a7f940369d966946f97307e", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5", "size": 68766, "upload_time": "2019-09-11T07:48:32", "url": "https://files.pythonhosted.org/packages/fd/35/8a764da407d24814b76beba525a9b8b59a60d5d268d52ae47f3e749c5ec8/dataset-xy-0.1.4.tar.gz" } ] }