{ "info": { "author": "Nicholas Teague", "author_email": "pitg888@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: GNU General Public License v3 (GPLv3)", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# Automunge\n# \nAutomunge is a tool to prepare tabular data for machine learning. A user\nhas options between automated inference of column properties for application\nof appropriate feature engineering methods, or may also assign to distinct \ncolumns custom feature engineering transformations, custom sets (e.g. \"family \ntrees\") of feature engineering transformations, or custom infill methods. The \nfeature engineering transformation functions may be accessed from the internal \nlibrary of transformation categories (a.k.a. a \"feature store\"), or may also \nbe user defined with minimal requirements of simple data structures for \nincorporation into the platform. The tool includes options for automated \nfeature importance evaluation, automated derivation of infill predictions\nusing machine learning models trained on the set in a fully generalized and\nautomated fashion, automated preparation for oversampling for class imbalance, \nautomated dimensionality reductions such as based on feature importance or \nprinciple component analysis, automated evaluation of data property drift \nbetween training data and subsequent data, and perhaps most importantly the \nsimplest means for consistent processing of additional data with just a single \nfunction call. In short, we make machine learning easy.\n\nThe automunge(.) function takes as input structured training data intended \nto train a machine learning model with any corresponding labels if available \nincluded in the set, and also if available consistently formatted test data \nthat can then be used to generate predictions from that trained model. When \nfed pandas dataframes or numpy arrays for these sets the function returns a \nseries of transformed numpy arrays or pandas dataframes (per selection) which \nare numerically encoded and suitable for the direct application of machine \nlearning algorithms. A user has an option between default feature engineering \nbased on inferred properties of the data with feature transformations such as \nz score normalization, standard deviation bins for numerical sets, box-cox \npower law transform for all positive numerical sets, one-hot encoding for \ncategorical sets, time series agregation to sin and cos transforms (with bins\nfor business hours, weekdays, and holidays), and more (full documentation \nbelow); assigning specific column feature engineering methods using a built-in \nlibrary of feature engineering transformations; or alternatively the passing \nof user-defined custom transformation functions incorporating simple data \nstructures such as to allow custom methods to each column while still making \nuse of all of the built-in features of the tool (such as ML infill, feature \nimportance, dimensionality reduction, and most importantly the simplest way \nfor the consistent processing of subsequently available data using just a \nsingle function call of the postmunge(.) function). Missing data points in the \nsets are also available to be addressed by either assigning distinct methods \nto each column or alternatively by the automated \"ML infill\" method which \npredicts infill using machine learning models trained on the rest of the set \nin a fully generalized and automated fashion. automunge(.) returns a python \ndictionary which can be used as an input along with a subsequent test data \nset to the function postmunge(.) for consistent processing of test data \nwhich wasn't available for the initial address.\n\nIn addition to it's use for feature engineering transformations, automunge(.) \nalso can serve an evaluatory purpose by way of a feature importance evaluation \nthrough the derivation of a series of metrics which provide an indication for \nthe importance of original and derived features towards the accuracy of a \npredictive model.\n\nIf elected, a user can also use the tool to perform a dimensionality reduction \nvia principle component analysis (a type of entity embedding via unsupervised \nlearning) of the data sets with the automunge(.) function or consistently for \nsubsequently available data with the postmunge(.) function.\n\nAutoMunge is now available for free pip install for your open source\npython data-wrangling\n\n```\npip install Automunge\n```\n\n```\n#or to upgrade (we currently roll out upgrades pretty frequently)\npip install Automunge --upgrade\n```\n\nOnce installed, run this in a local session to initialize:\n\n```\nfrom Automunge import Automunger\nam = Automunger.AutoMunge()\n```\n\nWhere eg for train/test set processing run:\n\n```\ntrain, trainID, labels, \\\nvalidation1, validationID1, validationlabels1, \\\nvalidation2, validationID2, validationlabels2, \\\ntest, testID, testlabels, \\\ntestlabelsencoding_dict, finalcolumns_train, finalcolumns_test, \\\nfeatureimportance, postprocess_dict \\\n= am.automunge(df_train, df_test, etc)\n```\n\nor for subsequent consistant processing of test data, using the\ndictionary returned from original application of automunge(.), run:\n\n```\ntest, testID, testlabels, \\\nlabelsencoding_dict, finalcolumns_test \\\n= am.postmunge(postprocess_dict, df_test)\n```\n\nI find it helpful to pass these functions with the full range of arguments\nincluded for reference, thus a user may simply copy and past this form.\n\n```\n#for automunge(.) function on original train and test data\n\ntrain, trainID, labels, \\\nvalidation1, validationID1, validationlabels1, \\\nvalidation2, validationID2, validationlabels2, \\\ntest, testID, testlabels, \\\nlabelsencoding_dict, finalcolumns_train, finalcolumns_test, \\\nfeatureimportance, postprocess_dict = \\\nam.automunge(df_train, df_test = False, labels_column = False, trainID_column = False, \\\n testID_column = False, valpercent1=0.0, valpercent2 = 0.0, floatprecision = 32, \\\n shuffletrain = False, TrainLabelFreqLevel = False, powertransform = False, \\\n binstransform = False, MLinfill = False, infilliterate=1, randomseed = 42, \\\n numbercategoryheuristic = 15, pandasoutput = True, NArw_marker = True, \\\n featureselection = False, featurepct = 1.0, featuremetric = .02, \\\n featuremethod = 'default', PCAn_components = None, PCAexcl = [], \\\n ML_cmnd = {'MLinfill_type':'default', \\\n 'MLinfill_cmnd':{'RandomForestClassifier':{}, 'RandomForestRegressor':{}}, \\\n 'PCA_type':'default', \\\n 'PCA_cmnd':{}}, \\\n assigncat = {'mnmx':[], 'mnm2':[], 'mnm3':[], 'mnm4':[], 'mnm5':[], 'mnm6':[], \\\n\t\t 'nmbr':[], 'nbr2':[], 'nbr3':[], 'MADn':[], 'MAD2':[], 'MAD3':[], \\\n\t\t 'dxdt':[], 'd2dt':[], 'd3dt':[], 'dxd2':[], 'd2d2':[], 'd3d2':[], \\\n\t\t 'nmdx':[], 'nmd2':[], 'nmd3':[], 'mmdx':[], 'mmd2':[], 'mmd3':[], \\\n\t\t 'bins':[], 'bint':[], \\\n\t\t 'bxcx':[], 'bxc2':[], 'bxc3':[], 'bxc4':[], \\\n\t\t 'log0':[], 'log1':[], 'pwrs':[], \\\n\t\t 'bnry':[], 'text':[], 'txt2':[], 'txt3':[], '1010':[], 'or10':[], \\\n\t\t 'ordl':[], 'ord2':[], 'ord3':[], 'ord4':[], 'om10':[], 'mmor':[], \\\n 'splt':[], 'spl2':[], 'spl3':[], 'spl4':[], 'spl5':[], \\\n 'ors2':[], 'ors5':[], 'ors6':[], \\\n\t\t 'date':[], 'dat2':[], 'dat6':[], 'wkdy':[], 'bshr':[], 'hldy':[], \\\n\t\t 'yea2':[], 'mnt2':[], 'mnt6':[], 'day2':[], 'day5':[], \\\n\t\t 'hrs2':[], 'hrs4':[], 'min2':[], 'min4':[], 'scn2':[], \\\n\t\t 'excl':[], 'exc2':[], 'exc3':[], 'null':[], 'eval':[]}, \\\n assigninfill = {'stdrdinfill':[], 'MLinfill':[], 'zeroinfill':[], 'oneinfill':[], \\\n 'adjinfill':[], 'meaninfill':[], 'medianinfill':[], 'modeinfill':[]}, \\\n transformdict = {}, processdict = {}, evalcat = False, \\\n printstatus = True)\n```\n\nPlease remember to save the automunge(.) returned object postprocess_dict \nsuch as using pickle library, which can then be later passed to the postmunge(.) \nfunction to consistently process subsequently available data.\n\n```\n#Sample pickle code:\n\n#sample code to download postprocess_dict dictionary returned from automunge(.)\nimport pickle\nwith open('filename.pickle', 'wb') as handle:\n pickle.dump(postprocess_dict, handle, protocol=pickle.HIGHEST_PROTOCOL)\n\n#to upload for later use in postmunge(.) in another notebook\nimport pickle\nwith open('filename.pickle', 'rb') as handle:\n postprocess_dict = pickle.load(handle)\n\n```\nWe can then apply the postprocess_dict saved from a prior application of automunge\nfor consistent processing of additional data.\n```\n#for postmunge(.) function on additional available train or test data\n#using the postprocess_dict object returned from original automunge(.) application\n\ntest, testID, testlabels, \\\nlabelsencoding_dict, finalcolumns_test = \\\nam.postmunge(postprocess_dict, df_test, testID_column = False, \\\n labelscolumn = False, pandasoutput=True, printstatus = True, \\\n TrainLabelFreqLevel = False, featureeval = False, driftreport = False)\n```\n\n\nThe functions depend on pandas dataframe formatted train and test data\nor numpy arrays with consistent order of columns between train and test data. \nThe functions return numpy arrays or pandas dataframes numerically encoded \nand normalized such as to make them suitable for direct application to a \nmachine learning model in the framework of a user's choice, including sets for \nthe various activities of a generic machine learning project such as training, \nhyperparameter tuning validation (validation1), final validation (validation2), \nor data intended for use in generation of predictions from the trained model \n(test set). The functions also return a few other sets such as labels, column \nheaders, ID sets, and etc if elected - a full list of returned arrays is below.\n\nWhen left to automation, the function works by inferring a category of \ndata based on properties of each column to select the type of processing \nfunction to apply, for example whether a column is a numerical, categorical,\nbinary, or time-series set. Alternately, a user can pass column header IDs to \nassign specific processing functions to distinct columns - which processing functions\nmay be pulled from the internal library of transformations or alternately user\ndefined. Normalization parameters from the initial automunge application are\nsaved to a returned dictionary for subsequent consistent processing of test data\nthat wasn't available at initial address with the postmunge(.) function. \n\nThe feature engineering transformations are recorded with a series of suffixes \nappended to the column header title in the returned sets, for one example the \napplication of z-score normalization returns a column with header origname + '_nmbr'. \nAs another example, for one-hot encoded sets the set of columns are returned with\nheader origname + '_category' where category is the category from the set indicated \nby a column. Each transformation category has a unique suffix appender.\n\nIn automation, for numerical data, the functions generate a series of derived\ntransformations resulting in multiple child columns. For numerical data, if the\npowertransform option is selected distribution properties are evaluated for \npotential application of z-score normalization, min-max scaling, power law transform \nvia box-cox method, or mean absolute deviation scaling. Otherwise numerical data \ndefaults to z-score, with z-score normalization options for standard\ndeviation bins for values in range <-2, -2-1, -10, 01, 12, >2 from the\nmean. For numerical sets with all positive values the functions also optionally\ncan return a power-law transformed set using the box-cox method, along with\na corresponding set with z-score normalization applied. For time-series\ndata the model segregates the data by time-scale (year, month, day, hour, minute, \nsecond) and returns year z-score normalized, a pair of sets for combined month/day \nand combined hour / minute / second with sin and cos transformations at period of \ntime-scale, and also returns binned sets identifying business hours, weekdays, and \nUS holidays. For binary categorical data the functions return a single column with \n1/0 designation. For multimodal categorical data the functions return one-hot \nencoded sets using the naming convention origname + _ + category. (I believe this \nautomation of the one-hot encoding method to be a particularily useful feature of \nthe tool.) For all cases the functions generate a supplemental column (NArw)\nwith a boolean identifier for cells that were subject to infill due to missing or \nimproperly formatted data. (Please note that I don't consider the current methods \nof numerical set distribution evaluation highly sophisticated and have some work to \ndo here). \n\nThe functions also include a method we call 'ML infill' which if elected\npredicts infill for missing values in both the train and test sets using\nmachine learning models trained on the rest of the set in a fully\ngeneralized and automated fashion. The ML infill works by initially\napplying infill using traditional methods such as mean for a numerical\nset, most common value for a binary set, and a boolean identifier for\ncategorical. The functions then generate a column specific set of\ntraining data, labels, and feature sets for the derivation of infill.\nThe column's trained model is included in the outputted dictionary for\napplication of the same model in the postmunge function. Alternately, a\nuser can pass column headers to assign different infill methods to distinct \ncolumns. The method currently makes use of Scikit Random Forest models by \ndefault. Extension into more sophisticated methods such as that may employ \nautomated hyperparameter tuning for instance is intended for a future extension.\n\nThe automunge(.) function also includes a method for feature importance \nevaluation, in which metrics are derived to measure the impact to predictive \naccuracy of original source columns as well as relative importance of \nderived columns using a permutation importance method. Permutation importance \nmethod was inspired by a fast.ai lecture and more information can be found in \nthe paper \"Beware Default Random Forest Importances\" by Terrence Parr, Kerem \nTurgutlu, Christopher Csiszar, and Jeremy Howard. This method currently makes \nuse of Scikit-Learns Random Forest predictors. I believe the metric we refer to\nas metric2 which evaluates relative importance between features derived from the \nsame source column is a unique approach.\n\nThe function also includes a method we call 'LabelFreqLevel' which\nif elected applies multiples of the feature sets associated with each\nlabel category in the returned training data so as to enable\noversampling of those labels which may be underrepresented in the\ntraining data. This method is available for categorical labels or also\nfor numerical labels when the label processing includes standard deviation\nbins. This method is expected to improve downstream model\naccuracy for training data with uneven distribution of labels. For more\non the class imbalance problem see \"A systematic study of the class imbalance \nproblem in convolutional neural networks\" - Buda, Maki, Mazurowski.\n\nThe function also can perform dimensionality reduction of the sets via \nprinciple component analysis (PCA). The function automatically performs a \ntransformation when the number of features is more than 50% of the number\nof observations in the train set (this is a somewhat arbitrary heuristic).\nAlternately, the user can pass a desired number of features and their \npreference of type and parameters between linear PCA, Sparse PCA, or Kernel \nPCA - all currently implemented in Scikit-Learn.\n\nThe application of the automunge and postmunge functions requires the\nassignment of the function to a series of named sets. We suggest using\nconsistent naming convention as follows:\n\n```\ntrain, trainID, labels, \\\nvalidation1, validationID1, validationlabels1, \\\nvalidation2, validationID2, validationlabels2, \\ \ntest, testID, testlabels, \\\nlabelsencoding_dict, finalcolumns_train, finalcolumns_test, \\\nfeatureimportance, postprocess_dict \\\n= am.automunge(df_train, ...)\n```\n\nThe full set of arguments available to be passed are given here, with\nexplanations provided below: \n\n```\ntrain, trainID, labels, \\\nvalidation1, validationID1, validationlabels1, \\\nvalidation2, validationID2, validationlabels2, \\\ntest, testID, testlabels, \\\nlabelsencoding_dict, finalcolumns_train, finalcolumns_test, \\\nfeatureimportance, postprocess_dict = \\\nam.automunge(df_train, df_test = False, labels_column = False, trainID_column = False, \\\n testID_column = False, valpercent1=0.0, valpercent2 = 0.0, floatprecision = 32, \\\n shuffletrain = False, TrainLabelFreqLevel = False, powertransform = False, \\\n binstransform = False, MLinfill = False, infilliterate=1, randomseed = 42, \\\n numbercategoryheuristic = 15, pandasoutput = True, NArw_marker = True, \\\n featureselection = False, featurepct = 1.0, featuremetric = .02, \\\n featuremethod = 'default', PCAn_components = None, PCAexcl = [], \\\n ML_cmnd = {'MLinfill_type':'default', \\\n 'MLinfill_cmnd':{'RandomForestClassifier':{}, 'RandomForestRegressor':{}}, \\\n 'PCA_type':'default', \\\n 'PCA_cmnd':{}}, \\\n assigncat = {'mnmx':[], 'mnm2':[], 'mnm3':[], 'mnm4':[], 'mnm5':[], 'mnm6':[], \\\n\t\t 'nmbr':[], 'nbr2':[], 'nbr3':[], 'MADn':[], 'MAD2':[], 'MAD3':[], \\\n\t\t 'dxdt':[], 'd2dt':[], 'd3dt':[], 'dxd2':[], 'd2d2':[], 'd3d2':[], \\\n\t\t 'nmdx':[], 'nmd2':[], 'nmd3':[], 'mmdx':[], 'mmd2':[], 'mmd3':[], \\\n\t\t 'bins':[], 'bint':[], \\\n\t\t 'bxcx':[], 'bxc2':[], 'bxc3':[], 'bxc4':[], \\\n\t\t 'log0':[], 'log1':[], 'pwrs':[], \\\n\t\t 'bnry':[], 'text':[], 'txt2':[], 'txt3':[], '1010':[], 'or10':[], \\\n\t\t 'ordl':[], 'ord2':[], 'ord3':[], 'ord4':[], 'om10':[], 'mmor':[], \\\n 'splt':[], 'spl2':[], 'spl3':[], 'spl4':[], 'spl5':[], \\\n 'ors2':[], 'ors5':[], 'ors6':[], \\\n\t\t 'date':[], 'dat2':[], 'dat6':[], 'wkdy':[], 'bshr':[], 'hldy':[], \\\n\t\t 'yea2':[], 'mnt2':[], 'mnt6':[], 'day2':[], 'day5':[], \\\n\t\t 'hrs2':[], 'hrs4':[], 'min2':[], 'min4':[], 'scn2':[], \\\n\t\t 'excl':[], 'exc2':[], 'exc3':[], 'null':[], 'eval':[]}, \\\n assigninfill = {'stdrdinfill':[], 'MLinfill':[], 'zeroinfill':[], 'oneinfill':[], \\\n 'adjinfill':[], 'meaninfill':[], 'medianinfill':[], 'modeinfill':[]}, \\\n transformdict = {}, processdict = {}, evalcat = False, \\\n printstatus = True)\n```\n\nOr for the postmunge function:\n\n```\n#for postmunge(.) function on additional or subsequently available train or test data\n#using the postprocess_dict object returned from original automunge(.) application\n\ntest, testID, testlabels, \\\nlabelsencoding_dict, finalcolumns_test = \\\n```\n\nWith the full set of arguments available to be passed as:\n\n```\nam.postmunge(postprocess_dict, df_test, testID_column = False, \\\n labelscolumn = False, pandasoutput=True, printstatus = True, \\\n TrainLabelFreqLevel = False, featureeval = False, driftreport = False):\n```\n\nNote that the only required argument to the automunge function is the\ntrain set dataframe, the other arguments all have default values if\nnothing is passed. The postmunge function requires as minimum the\npostprocess_dict object (a python dictionary returned from the application of\nautomunge) and a dataframe test set consistently formatted as those sets\nthat were originally applied to automunge.\n\n...\n\nHere now are descriptions for the returned sets from automunge, which\nwill be followed by descriptions of the arguments which can be passed to\nthe function, followed by similar treatment for postmunge returned sets\nand arguments.\n\n...\n\n## automunge returned sets:\n\n* train: a numerically encoded set of data intended to be used to train a\ndownstream machine learning model in the framework of a user's choice\n\n* trainID: the set of ID values corresponding to the train set if a ID\ncolumn(s) was passed to the function. This set may be useful if the shuffle\noption was applied.\n\n* labels: a set of numerically encoded labels corresponding to the\ntrain set if a label column was passed. Note that the function\nassumes the label column is originally included in the train set. Note\nthat if the labels set is a single column a returned numpy array is \nflattened (e.g. [[1,2,3]] converted to [1,2,3] )\n\n* validation1: a set of training data carved out from the train set\nthat is intended for use in hyperparameter tuning of a downstream model.\n\n* validationID1: the set of ID values coresponding to the validation1\nset\n\n* validationlabels1: the set of labels coresponding to the validation1\nset\n\n* validation2: the set of training data carved out from the train set\nthat is intended for the final validation of a downstream model (this\nset should not be applied extensively for hyperparameter tuning).\n\n* validationID2: the set of ID values coresponding to the validation2\nset.\n\n* validationlabels2: the set of labels coresponding to the validation2\nset\n\n* test: the set of features, consistently encoded and normalized as the\ntraining data, that can be used to generate predictions from a\ndownstream model trained with train. Note that if no test data is\navailable during initial address this processing will take place in the\npostmunge(.) function. \n\n* testID: the set of ID values coresponding to the test set.\n\n* testlabels: a set of numerically encoded labels corresponding to the\ntest set if a label column was passed. Note that the function\nassumes the label column is originally included in the train set.\n\n* labelsencoding_dict: a dictionary that can be used to reverse encode\npredictions that were generated from a downstream model (such as to\nconvert a one-hot encoded set back to a single categorical set).\n\n* finalcolumns_train: a list of the column headers corresponding to the\ntraining data. Note that the inclusion of suffix appenders is used to\nidentify which feature engineering transformations were applied to each\ncolumn.\n\n* finalcolumns_test: a list of the column headers corresponding to the\ntest data. Note that the inclusion of suffix appenders is used to\nidentify which feature engineering transformations were applied to each\ncolumn. Note that this list should match the one preceeding.\n\n* featureimportance: a dictionary containing summary of feature importance\nranking and metrics for each of the derived sets. Note that the metric\nvalue provides an indication of the importance of the original source\ncolumn such that larger value suggests greater importance, and the metric2 \nvalue provides an indication of the relative importance of columns derived\nfrom the original source column such that smaller metric2 value suggests \ngreater relative importance. One can print the values here such as with\nthis code:\n\n```\n#to inspect values returned in featureimportance object one could run\nfor keys,values in featureimportance.items():\n print(keys)\n print('metric = ', values['metric'])\n print('metric2 = ', values['metric2'])\n print()\n```\n\n\n* postprocess_dict: a returned python dictionary that includes\nnormalization parameters and trained machine learning models used to\ngenerate consistent processing of additional train or test data such as \nmay not have been available at initial application of automunge. It is \nrecommended that this dictionary be externally saved on each application \nused to train a downstream model so that it may be passed to postmunge(.) \nto consistently process subsequently available test data, such as \ndemonstrated with the pickle library above.\n\n...\n\n## automunge(.) passed arguments\n\n```\nam.automunge(df_train, df_test = False, labels_column = False, trainID_column = False, \\\n testID_column = False, valpercent1=0.0, valpercent2 = 0.0, floatprecision = 32, \\\n shuffletrain = False, TrainLabelFreqLevel = False, powertransform = False, \\\n binstransform = False, MLinfill = False, infilliterate=1, randomseed = 42, \\\n numbercategoryheuristic = 15, pandasoutput = True, NArw_marker = True, \\\n featureselection = False, featurepct = 1.0, featuremetric = .02, \\\n featuremethod = 'default', PCAn_components = None, PCAexcl = [], \\\n ML_cmnd = {'MLinfill_type':'default', \\\n 'MLinfill_cmnd':{'RandomForestClassifier':{}, 'RandomForestRegressor':{}}, \\\n 'PCA_type':'default', \\\n 'PCA_cmnd':{}}, \\\n assigncat = {'mnmx':[], 'mnm2':[], 'mnm3':[], 'mnm4':[], 'mnm5':[], 'mnm6':[], \\\n\t\t 'nmbr':[], 'nbr2':[], 'nbr3':[], 'MADn':[], 'MAD2':[], 'MAD3':[], \\\n\t\t 'dxdt':[], 'd2dt':[], 'd3dt':[], 'dxd2':[], 'd2d2':[], 'd3d2':[], \\\n\t\t 'nmdx':[], 'nmd2':[], 'nmd3':[], 'mmdx':[], 'mmd2':[], 'mmd3':[], \\\n\t\t 'bins':[], 'bint':[], \\\n\t\t 'bxcx':[], 'bxc2':[], 'bxc3':[], 'bxc4':[], \\\n\t\t 'log0':[], 'log1':[], 'pwrs':[], \\\n\t\t 'bnry':[], 'text':[], 'txt2':[], 'txt3':[], '1010':[], 'or10':[], \\\n\t\t 'ordl':[], 'ord2':[], 'ord3':[], 'ord4':[], 'om10':[], 'mmor':[], \\\n 'splt':[], 'spl2':[], 'spl3':[], 'spl4':[], 'spl5':[], \\\n 'ors2':[], 'ors5':[], 'ors6':[], \\\n\t\t 'date':[], 'dat2':[], 'dat6':[], 'wkdy':[], 'bshr':[], 'hldy':[], \\\n\t\t 'yea2':[], 'mnt2':[], 'mnt6':[], 'day2':[], 'day5':[], \\\n\t\t 'hrs2':[], 'hrs4':[], 'min2':[], 'min4':[], 'scn2':[], \\\n\t\t 'excl':[], 'exc2':[], 'exc3':[], 'null':[], 'eval':[]}, \\\n assigninfill = {'stdrdinfill':[], 'MLinfill':[], 'zeroinfill':[], 'oneinfill':[], \\\n 'adjinfill':[], 'meaninfill':[], 'medianinfill':[], 'modeinfill':[]}, \\\n transformdict = {}, processdict = {}, evalcat = False, \\\n printstatus = True)\n```\n\n* df_train: a pandas dataframe or numpy array containing a structured \ndataset intended for use to subsequently train a machine learning model. \nThe set at a minimum should be 'tidy' meaning a single column per feature \nand a single row per observation. If desired the set may include one are more\n\"ID\" columns (intended to be carved out and consitently shuffled or partitioned\nsuch as an index column) and one or more columns intended to be used as labels \nfor a downstream training operation. The tool supports the inclusion of \nnon-index-range column as index or multicolumn index (requires named index \ncolumns). Such index types are added to the returned \"ID\" sets which are \nconsistently shuffled and partitioned as the train and test sets. \n\n* df_test: a pandas dataframe or numpy array containing a structured \ndataset intended for use to generate predictions from a downstream machine \nlearning model trained from the automunge returned sets. The set must be \nconsistantly formated as the train set with consistent column labels and/or\norder of columns. (This set may optionally contain a labels column if one \nwas included in the train set although it's inclusion is not required). If \ndesired the set may include one or more ID column(s) or column(s) intended \nfor use as labels. A user may pass False if this set not available. The tool \nsupports the inclusion of non-index-range column as index or multicolumn index \n(requires named index columns). Such index types are added to the returned \n\"ID\" sets which are consistently shuffled and partitioned as the train and \ntest sets.\n\n* labels_column: a string of the column title for the column from the\ndf_train set intended for use as labels in training a downstream machine\nlearning model. The function defaults to False for cases where the\ntraining set does not include a label column. An integer column index may \nalso be passed such as if the source dataset was numpy array.\n\n* trainID_column: a string of the column title for the column from the\ndf_train set intended for use as a row identifier value (such as could\nbe sequential numbers for instance). The function defaults to False for\ncases where the training set does not include an ID column. A user can \nalso pass a list of string columns titles such as to carve out multiple\ncolumns to be excluded from processing but consistently partitioned. An \ninteger column index or list of integer column indexes may also be passed \nsuch as if the source dataset was numpy array.\n\n* testID_column: a string of the column title for the column from the\ndf_test set intended for use as a row identifier value (such as could be\nsequential numbers for instance). The function defaults to False for\ncases where the training set does not include an ID column. A user can \nalso pass a list of string columns titles such as to carve out multiple\ncolumns to be excluded from processing but consistently partitioned. An \ninteger column index or list of integer column indexes may also be passed \nsuch as if the source dataset was numpy array.\n\n* valpercent1: a float value between 0 and 1 which designates the percent\nof the training data which will be set aside for the first validation\nset (generally used for hyperparameter tuning of a downstream model).\nThis value defaults to 0. (Previously the default here was set at 0.20 but \nthat is fairly an arbitrary value and a user may wish to deviate for \ndifferent size sets.) Note that this value may be set to 0 if no validation \nset is needed (such as may be the case for k-means validation). Please see \nalso the note below for the shuffletrain parameter.\n\n* valpercent2: a float value between 0 and 1 which designates the percent\nof the training data which will be set aside for the second validation\nset (generally used for final validation of a model prior to release).\nThis value defaults to 0. (Previously the default was set at 0.10 but that \nis fairly an arbitrary value and a user may wish to deviate for different \nsize sets.)\n\n* floatprecision: an integer with acceptable values of 16/32/64 designating\nthe memory precision for returned float values. (A tradeoff between memory\nusage and floating point precision, smaller for smaller footprint.)\n\n* shuffletrain: a boolean identifier (True/False) which indicates if the\nrows in df_train will be shuffled prior to carving out the validation\nsets. Note that if this value is set to False then the validation sets\nwill be pulled from the bottom x% sequential rows of the dataframe.\n(Where x% is the sum of validation ratios.) Note that if this value is\nset to False although the validations will be pulled from sequential\nrows, the split between validaiton1 and validation2 sets will be\nrandomized. This value defaults to False.\n\n* TrainLabelFreqLevel: a boolean identifier (True/False) which indicates\nif the TrainLabelFreqLevel method will be applied to prepare for oversampling \ntraining data associated with underrepresented labels (aka class imbalance). \nThe method adds multiples to training data rows for those labels with lower \nfrequency resulting in an (approximately) levelized frequency. This defaults \nto False. Note that this feature may be applied to numerical label sets if \nthe processing applied to the set includes standard deviation bins.\n\n* powertransform: a boolean identifier (True/False) which indicates if an\nevaluation will be performed of distribution properties to select between\nbox-cox, z-score, min-max scaling, or mean absolute deviaiton scaling \nnormalization. Note that after application of box-cox transform child columns \nare generated for a subsequent z-score normalization as well as a set of bins\nassociated with number of standard deviations from the mean. Please note that\nI don't consider the current means of distribution property evaluation very\nsophisticated and we will continue to refine this method with further research\ngoing forward. This defaults to False.\n\n* binstransform: a boolean identifier (True/False) which indicates if the\nnumerical sets will receive bin processing such as to generate child\ncolumns with boolean identifiers for number of standard deviations from\nthe mean, with groups for values <-2, -2-1, -10, 01, 12, and >2 . Note\nthat the bins and bint transformations are the same, only difference is\nthat the bint transform assumes the column has already been normalized\nwhile the bins transform does not. This value defaults to False.\n\n* MLinfill: a boolean identifier (True/False) which indicates if the ML\ninfill method will be applied as a default to predict infill for missing \nor improperly formatted data using machine learning models trained on the\nrest of the set. This defaults to False. Note that ML infill may alternatively\nbe assigned to distinct columns in assigninfill.\n\n* infilliterate: an integer indicating how many applications of the ML\ninfill processing are to be performed for purposes of predicting infill.\nThe assumption is that for sets with high frequency of missing values\nthat multiple applications of ML infill may improve accuracy although\nnote this is not an extensively tested hypothesis. This defaults to 1.\n\n* randomseed: a postitive integer used as a seed for randomness throughout \nsuch as for data set shuffling, ML infill, and feature importance algorithms. \nThis defaults to 42, a nice round number.\n\n* numbercategoryheuristic: an integer used as a heuristic. When a \ncategorical set has more unique values than this heuristic, it defaults \nto categorical treatment via ordinal processing, otherwise categorical sets\ndefault to one-hot encoding. This defaults to 15.\n\n* pandasoutput: a selector for format of returned sets. Defaults to False\nfor returned Numpy arrays. If set to True returns pandas dataframes\n(note that index is not preserved in the train/validation split, an ID\ncolumn may be passed for index identification).\n\n* NArw_marker: a boolean identifier (True/False) which indicates if the\nreturned sets will include columns with markers for rows subject to \ninfill (columns with suffix 'NArw'). This value defaults to True.\n\n* featureselection: a boolean identifier telling the function whether to\nperform a feature importance evaluation. If selected automunge will\nreturn a summary of feature importance findings in the featureimportance\nreturned dictionary. This also activates the trimming of derived sets\nthat did not meet the importance threshold if [featurepct < 1.0 and \nfeaturemethod = 'pct'] or if [fesaturemetric > 0.0 and featuremethod = \n'metric']. Note this defaults to False because it cannot operate without\na designated label column in the train set. (Note that any user-specified\nsize of validationratios if passed are used in this method, otherwise \ndefaults to 0.33.)\n\n* featurepct: the percentage of derived sets that are kept in the output\nbased on the feature importance evaluation. Note that NArw columns are\nexcluded from the trimming for now (the inclusion of NArws in trimming\nwill likely be included in a future expansion). This item only used if\nfeaturemethod passed as 'pct' (the default).\n\n* featuremetric: the feature importance metric below which derived sets\nare trimmed from the output. Note that this item only used if\nfeaturemethod passed as 'metric'.\n\n* featuremethod: can be passed as either 'pct' or 'metric' to select which\nfeature importance method is used for trimming the derived sets. Or can pass\nas 'default' for ignoring the featurepct/featuremetric parameters or can \npass as 'report' to return the featureimportance results with no further\nprocessing (other returned sets are empty).\n\n* PCAn_components: a user can pass an integer to define the number of PCA\nderived features for purposes of dimensionality reduction, such integer to \nbe less than the otherwise returned number of sets. Function will default \nto kernel PCA for all non-negative sets or otherwise Sparse PCA. Also if\nthis values passed as a float <1.0 then linear PCA will be applied such \nthat the returned number of sets are the minimum number that can reproduce\nthat percent of the variance. Note this can also be passed in conjunction \nwith assigned PCA type or parameters in the ML_cmnd object.\n\n* PCAexcl: a list of column headers for columns that are to be excluded from\nany application of PCA\n\n* ML_cmnd: \n\n```\nML_cmnd = {'MLinfill_type':'default', \\\n 'MLinfill_cmnd':{'RandomForestClassifier':{}, 'RandomForestRegressor':{}}, \\\n 'PCA_type':'default', \\\n 'PCA_cmnd':{}}, \\\n```\nThe ML_cmnd allows a user to pass parameters to the predictive algorithms\nused for ML infill / feature importance evaluation or PCA. Currently the only\noption for 'MLinfill_type' is default which uses Scikit-learn's Random \nForest implementation, the intent is to add other options in a future extension.\nFor example, a user wishing to pass a custom parameter of max_depth for to the \nRandom Forest algorithms could pass:\n_\n```\nML_cmnd = {'MLinfill_type':'default', \\\n 'MLinfill_cmnd':{'RandomForestClassifier':{'max_depth':4}, \\\n 'RandomForestRegressor':{'max_depth':4}}, \\\n 'PCA_type':'default', \\\n 'PCA_cmnd':{}}, \\\n\n#(note that currently unable to pass RF parameters to criterion and n_jobs)\n```\nA user can also assign specific methods for PCA transforms. Current PCA_types\nsupported include 'PCA', 'SparsePCA', and 'KernelPCA', all via Scikit-Learn.\nNote that the n_components are passed seperately with the PCAn_components \nargument noted above. A user can also pass parameters to the PCA functions\nthrough the PCA_cmnd, for example one could pass a kernel type for KernelPCA\nas:\n```\nML_cmnd = {'MLinfill_type':'default', \\\n 'MLinfill_cmnd':{'RandomForestClassifier':{}, \\\n 'RandomForestRegressor':{}}, \\\n 'PCA_type':'KernelPCA', \\\n 'PCA_cmnd':{'kernel':'sigmoid'}}, \\\n\n#Also note that SparsePCA currently doesn't have available\n#n_jobs or normalize_components, and similarily KernelPCA \n#doesn't have available n_jobs.\n```\nNote that the PCA is currently defaulted to active for cases where the \ntrain set number of features is >0.50 the number of rows. A user can \nchange this ratio by passing 'PCA_cmnd':{'col_row_ratio':0.22}} for \ninstance. Also a user can simply turn off default PCA transforms by \npassing 'PCA_cmnd':{'PCA_type':'off'}. A user can also exclude returned\nboolean (0/1) columns from any PCA application by passing \n'PCA_cmnd':{'bool_PCA_excl':True}\nor exclude returned boolean and ordinal columns from PCA application by\n'PCA_cmnd':{'bool_ordl_PCAexcl':True}\nsuch as could potentially result in memory savings.\n\n\n* assigncat:\n\n```\n#Here are the current trasnformation options built into our library, which\n#we are continuing to build out. A user may also define their own.\n\n assigncat = {'mnmx':[], 'mnm2':[], 'mnm3':[], 'mnm4':[], 'mnm5':[], 'mnm6':[], \\\n\t\t 'nmbr':[], 'nbr2':[], 'nbr3':[], 'MADn':[], 'MAD2':[], 'MAD3':[], \\\n\t\t 'dxdt':[], 'd2dt':[], 'd3dt':[], 'dxd2':[], 'd2d2':[], 'd3d2':[], \\\n\t\t 'nmdx':[], 'nmd2':[], 'nmd3':[], 'mmdx':[], 'mmd2':[], 'mmd3':[], \\\n\t\t 'bins':[], 'bint':[], \\\n\t\t 'bxcx':[], 'bxc2':[], 'bxc3':[], 'bxc4':[], \\\n\t\t 'log0':[], 'log1':[], 'pwrs':[], \\\n\t\t 'bnry':[], 'text':[], 'txt2':[], 'txt3':[], '1010':[], 'or10':[], \\\n\t\t 'ordl':[], 'ord2':[], 'ord3':[], 'ord4':[], 'om10':[], 'mmor':[], \\\n 'splt':[], 'spl2':[], 'spl3':[], 'spl4':[], 'spl5':[], \\\n 'ors2':[], 'ors5':[], 'ors6':[], \\\n\t\t 'date':[], 'dat2':[], 'dat6':[], 'wkdy':[], 'bshr':[], 'hldy':[], \\\n\t\t 'yea2':[], 'mnt2':[], 'mnt6':[], 'day2':[], 'day5':[], \\\n\t\t 'hrs2':[], 'hrs4':[], 'min2':[], 'min4':[], 'scn2':[], \\\n\t\t 'excl':[], 'exc2':[], 'exc3':[], 'null':[], 'eval':[]}, \\\n``` \n\nDescriptions of these transformations are provided in document below (in section\ntitled \"Library of Transformations\").\n\nA user may add column header identifier strings to each of these lists to assign \na distinct specific processing approach to any column (including labels). Note \nthat this processing category will serve as the \"root\" of the tree of transforms \nas defined in the transformdict. Note that additional categories may be passed if \ndefined in the passed transformdict and processdict. An example of usage here \ncould be if a user wanted to only process numerical columns 'nmbrcolumn1' and \n'nmbrcolumn2' with z-score normalization instead of the full range of numerical \nderivations when implementing the binstransform parameter they could pass \n```\nassigncat = {'nbr2':['nmbrcolumn1', 'nmbrcolumn2']}\n```\n\n* assigninfill \n```\n#Here are the current infill options built into our library, which\n#we are continuing to build out.\nassigninfill = {'stdrdinfill':[], 'MLinfill':[], 'zeroinfill':[], 'oneinfill':[], \\\n 'adjinfill':[], 'meaninfill':[], 'medianinfill':[], 'modeinfill':[]}, \\\n```\nA user may add column identifier strings to each of these lists to \ndesignate the column-specific infill approach for missing or\nimproperly formated values. Note that this infill category defaults to\nMLinfill if nothing assigned and the MLinfill argument to automunge is\nset to True. stdrdinfill means: mean for numeric sets, most common for \nbinary, and new column boolean for categorical. zeroinfill means inserting \nthe integer 0 to missing cells. oneinfill means inserting the integer 1.\nadjinfill means passing the value from the preceding row to missing cells. \nmeaninfill means inserting the mean derived from the train set to numeric \ncolumns. medianinfill means inserting the median derived from the train \nset to numeric columns. (Note currently boolean columns derived from \nnumeric are not supported for mean/median and for those cases default to \nthose infill from stdrdinfill.) modeinfill means inserting the most common\nvalue for a set, note that modeinfill supports one-hot encoded sets.\n\n* transformdict: allows a user to pass a custom tree of transformations.\nNote that a user may define their own (traditionally 4 character) string \"root\"\nidentifiers for a series of processing steps using the categories of processing \nalready defibned in our library and then assign columns in assigncat, or for \ncustom processing functions this method should be combined with processdict \nwhich is only slightly more complex. For example, a user wishing to define a \nnew set of transformations for numerical series 'newt' that combines NArows, \nmin-max, box-cox, z-score, and standard deviation bins could do so by passing a \ntrasnformdict as:\n```\ntransformdict = {'newt' : {'parents' : ['bxc4'], \\\n 'siblings': [], \\\n 'auntsuncles' : ['mnmx'], \\\n 'cousins' : ['NArw'], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}}\n\n#Where since bxc4 is passed as a parent, this will result in pulling\n#offspring keys from the bxc4 family tree, which has a nbr2 key as children.\n\n#from automunge library:\n transform_dict.update({'bxc4' : {'parents' : ['bxcx'], \\\n 'siblings': [], \\\n 'auntsuncles' : [], \\\n 'cousins' : ['NArw'], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : ['nbr2'], \\\n 'friends' : []}})\n\n#note that 'nbr2' is passed as a coworker primitive meaning no downstream \n#primitives would be accessed from the nbr2 family tree. If we wanted nbr2 to\n#incorporate any offspring from the nbr2 tree we could instead assign as children\n#or niecesnephews.\n\n```\nBasically here 'newt' is the key and when passed to one of the family primitives\nthe corresponding process function is applied, and if it is passed to a family\nprimitive with downstream offspring then those offspring keys are pulled from\nthat key's family tree. For example, here mnmx is passed as an auntsuncles which\nmeans the mnmx processing function is applied with no downstream offspring. The\nbxcx key is passed as a parent which means the bxcx trasnform is applied coupled\nwith any downstream transforms from the bxcx key family tree, which we also show.\nNote the family primitives tree of transformations can be summarized as:\n```\n'parents' : upstream / first generation / replaces column / with offspring\n'siblings': upstream / first generation / supplements column / with offspring\n'auntsuncles' : upstream / first generation / replaces column / no offspring\n'cousins' : upstream / first generation / supplements column / no offspring\n'children' : downstream parents / offspring generations / replaces column / with offspring\n'niecesnephews' : downstream siblings / offspring generations / supplements column / with offspring\n'coworkers' : downstream auntsuncles / offspring generations / replaces column / no offspring\n'friends' : downstream cousins / offspring generations / supplements column / no offspring\n```\nNote that when we define a new transform such as 'newt' above, we also need \nto define a corresponding processdict entry for the new category, which we \ndemonstrate here:\n\n\n* processdict: allows a user to define their own processing functions \ncorresponding to new transformdict keys. We'll describe the entries here:\n```\n#for example \nprocessdict = {'newt' : {'dualprocess' : None, \\\n\t\t\t 'singleprocess' : None, \\\n\t\t\t 'postprocess' : None, \\\n \t 'NArowtype' : 'numeric', \\\n \t\t 'MLinfilltype' : 'numeric', \\\n \t\t 'labelctgy' : 'mnmx'}}\n\n#A user should pass either a pair of processing functions to both \n#dualprocess and postprocess, or alternatively just a single processing\n#function to singleprocess, and pass None to those not used.\n#For now, if just using the category as a root key and not as a family primitive, \n#can simply pass None to all the processing slots. We'll demonstrate their \n#composition and data structures for custom processing functions later in this \n#document.\n\n#dualprocess: for passing a processing function in which normalization \n# parameters are derived from properties of the training set\n# and jointly process the train set and if available test set\n\n#singleprocess: for passing a processing function in which no normalization\n# parameters are needed from the train set to process the\n# test set, such that train and test sets processed seperately\n\n#postprocess: for passing a processing function in which normalization \n# parameters originally derived from the train set are applied\n# to seperately process a test set\n\n#NArowtype: can be entries of either 'numeric', 'positivenumeric', 'justNaN', \n#or 'exclude' where\n#\t\t\t'numeric' refers to columns where non-numeric entries are subject\n#\t\t\t\t\t to infill\n# 'positivenumeric' refers to columns where entries <= 0 are subject\n# to infill\n#\t\t\t'justNaN' refers to columns where only NaN entries are subject\n#\t\t\t to infill\n#\t\t\t'exclude' refers to columns where no infill will be performed\n\n#MLinfilltype: can be entries of 'numeric', 'singlct', 'multirt', 'exclude'\n# 'multisp', 'exclude', or 'label' where\n#\t\t\t 'numeric' refers to columns where predictive algorithms treat\n#\t\t\t as a regression for numeric sets\n#\t\t\t 'singlect' refers to columns where category gives a single column\n#\t\t\t where predictive algorithms treat as a classification target\n#\t\t\t 'multirt' refers to category returning multiple columns where \n#\t\t\t predictive algorithms treat as a multi modal classifier\n#\t\t\t 'exclude' refers to categories excluded from predcitive address\n#\t\t\t 'multisp' tbh I think this is just a duplicate of multirt, a\n#\t\t\t future update may strike this one\n#\t\t\t 'label' refers to categories specifically intended for label\n#\t\t\t processing\n\n```\n\n* evalcat: modularizes the automated evaluation of column properties for assignment \nof root transformation categories, allowing user to pass custom functions for this \npurpose. Passed functions should follow format:\n\n```\ndef evalcat(df, column, numbercategoryheuristic, powertransform):\n \"\"\"\n #user defined function that takes as input a dataframe df and column id string column\n #evaluates the contents of cells and classifies the column for root category of \n #transformation (e.g. comparable to categories otherwise assigned in assigncat)\n #returns category id as a string\n \"\"\"\n ...\n return category\n```\nAnd could then be passed to automunge function call such as:\n```\nevalcat = evalcat\n```\nI recomend using the evalcategory function defined in master file as starting point. \n(Minus the 'self' parameter since defining external to class.) Note that the \nparameters numbercategoryheuristic and powertransform are passed as user parameters \nin automunge call and only used in evalcategory function, so if user wants to \nrepurpose them totally can do so. (They default to 15, False.) Note evalcat defaults \nto False to use built-in evalcategory function. Note evalcat will only be applied to \ncolumns not assigned in assigncat. (Note that columns assigned to 'eval' in assigncat\nwill be passed to this function for evaluation with powertransform = True.)\n\n* printstatus: user can pass True/False indicating whether the function will print \nstatus of processing during operation. Defaults to True.\n\nOk well we'll demonstrate further below how to build custom processing functions,\nfor now this just gives you sufficient tools to build sets of processing using\nthe built in sets in the library.\n\n...\n\n# postmunge\n\nThe postmunge(.) function is intended to consistently process subsequently available\nand consistently formatted test data with just a single function call. It requires \npassing the postprocess_dict object returned from the original application of automunge \nand that the passed test data have consistent column header labeling as the original \ntrain set (or for Numpy arrays consistent order of columns).\n\n```\n\n#for postmunge(.) function on subsequently available test data\n#using the postprocess_dict object returned from original automunge(.) application\n\n#Remember to initialize automunge\nfrom Automunge import Automunger\nam = Automunger.AutoMunge()\n\n\n#Then we can run postmunge function as:\n\ntest, testID, testlabels, \\\nlabelsencoding_dict, finalcolumns_test = \\\nam.postmunge(postprocess_dict, df_test, testID_column = False, \\\n labelscolumn = False, pandasoutput=True, printstatus = True, \\\n TrainLabelFreqLevel = False, featureeval = False, driftreport = False):\n```\n\n\n\n## postmunge(.) returned sets:\nHere now are descriptions for the returned sets from postmunge, which\nwill be followed by descriptions of the arguments which can be passed to\nthe function. \n\n* test: the set of features, consistently encoded and normalized as the\ntraining data, that can be used to generate predictions from a model\ntrained with the np_train set from automunge.\n\n* testID: the set of ID values coresponding to the test set.\n\n* testlabels: a set of numerically encoded labels corresponding to the\ntest set if a label column was passed. Note that the function\nassumes the label column is originally included in the train set. Note\nthat if the labels set is a single column a returned numpy array is \nflattened (e.g. [[1,2,3]] converted to [1,2,3] )\n\n* labelsencoding_dict: this is the same labelsencoding_dict returned from\nautomunge, it's used in case one wants to reverse encode predicted labels\n\n* finalcolumns_test: a list of the column headers corresponding to the\ntest data. Note that the inclusion of suffix appenders is used to\nidentify which feature engineering transformations were applied to each\ncolumn. Note that this list should match the one from automunge.\n\n...\n\n\n## postmunge(.) passed arguments\n\n```\n\n#for postmunge(.) function on subsequently available test data\n#using the postprocess_dict object returned from original automunge(.) application\n\n#Remember to initialize automunge\nfrom Automunge import Automunger\nam = Automunger.AutoMunge()\n\n\n#Then we can run postmunge function as:\n\ntest, testID, testlabels, \\\nlabelsencoding_dict, finalcolumns_test = \\\nam.postmunge(postprocess_dict, df_test, testID_column = False, \\\n labelscolumn = False, pandasoutput=True, printstatus = True, \\\n TrainLabelFreqLevel = False, featureeval = False, driftreport = False)\n```\n\n* postprocess_dict: this is the dictionary returned from the initial\napplication of automunge which included normalization parameters to\nfacilitate consistent processing of test data to the original processing\nof the train set. This requires a user to remember to download the\ndictionary at the original application of automunge, otherwise if this\ndictionary is not available a user can feed this subsequent test data to\nthe automunge along with the original train data exactly as was used in\nthe original automunge call.\n\n* df_test: a pandas dataframe or numpy array containing a structured \ndataset intended for use to generate predictions from a machine learning \nmodel trained from the automunge returned sets. The set must be consistantly \nformatted as the train set with consistent order of columns and if labels are\nincluded consistent labels. If desired the set may include an ID column. The \ntool supports the inclusion of non-index-range column as index or multicolumn \nindex (requires named index columns). Such index types are added to the \nreturned \"ID\" sets which are consistently shuffled and partitioned as the \ntrain and test sets.\n\n* testID_column: a string of the column title for the column from the\ndf_test set intended for use as a row identifier value (such as could be\nsequential numbers for instance). The function defaults to False for\ncases where the training set does not include an ID column. A user can \nalso pass a list of string columns titles such as to carve out multiple\ncolumns to be excluded from processing but consistently partitioned. An \ninteger column index or list of integer column indexes may also be passed \nsuch as if the source dataset was numpy array.\n\n* labelscolumn: default to False indicates that a labels column is not \nincluded in the test set passed to postmunge. A user can either pass\nTrue or the string ID of the labels column, noting that it is a requirement\nthat the labels column header string must be consistent with that from\nthe original train set. An integer column index may also be passed such\nas if the source dataset was numpy array. A user should take care to set \nthis parameter if they are passing data with labels.\n\n* pandasoutput: a selector for format of returned sets. Defaults to False\nfor returned Numpy arrays. If set to True returns pandas dataframes\n(note that index is not preserved, an ID column may be passed for index\nidentification).\n\n* printstatus: user can pass True/False indicating whether the function \nwill print status of processing during operation. Defaults to True.\n\n* TrainLabelFreqLevel: a boolean identifier (True/False) which indicates\nif the TrainLabelFreqLevel method will be applied to oversample test\ndata associated with underrepresented labels. The method adds multiples\nto test data rows for those labels with lower frequency resulting in\nan (approximately) levelized frequency. This defaults to False. Note that\nthis feature may be applied to numerical label sets if the processing\napplied to the set in automunge had included standard deviation bins. Note \nthis requires the inclusion of a designated labels column.\n\n* featureeval: a boolean identifier (True/False) to activate a feature\nimportance evaluation, comparable to one performed in automunge but based\non the test set passed to postmunge. Currently the results report is not\nreturned as an object, the results are printed in the output (for backward\ncompatibility).\n\n* driftreport: a boolean identifier (True/False) to activate a drift report \nevaluation, in which the normalization parameters are recalculated for the \ncolumns of the test data passed to postmunge for comparison to the original \nnormalization parameters derived from the coresponding columns of the \nautomunge train data set. Currently the results report is not returned as \nan object, the results are printed in the output (for backward compatibility).\n\n...\n\n## Default Transformations\n\nWhen root categories of transformations are not assigned for a given column in\nassigncat, automunge performs an evaluation of data properties to infer \nappropriate means of feature engineering and numerical encoding. The default\ncategories of transformations are as follows:\n- nmbr: for numerical data, columns are treated with z-score normalization. If \nbinstransform parameter was activated this will be supplemented by a collection\nof bins indicating number of standard deviations from the mean.\n- text: for categorical data, columns are subject to one-hot encoding. If the \nnumber of unique entries in the column exceeds the parameter 'numbercategoryheuristic'\n(which defaults to 15), the encoding will instead be by ord3 which is an ordinal\n(integer) encoding sorted by most common value.\n- bnry: for categorical data of <=2 unique values excluding infill (eg NaN), the \ncolumn is encoded to 0/1.\n- dat6: for time-series data, a set of derivations are performed returning\n'year', 'mdsn', 'mdcs', 'hmss', 'hmsc', 'bshr', 'wkdy', 'hldy' (these are defined \nin next section)\n- null: for columns with single entry column is deleted\n\n- PCA: if the number of features exceeds 0.5 the number of rows (an arbitrary heuristic)\na default PCA transform is applied defaulting to kernel if all positive or sparse Otherwise\nusing scikit library. Note that this heuristic ratio can be changed or PCA turned off\nin the ML_cmnd.\n\n- powertransform: if the powertransform parameter is activated, a statistical evaluation\nwill be performed on numerical sets to distinguish between columns to be subject to\nbxcx, nmbr, or mnmx. Please note that we intend to further refine the specifics of this\nprocess in future implementations. \n\n- floatprecision: parameter indicates the precision of floats in returned sets (16/32/64)\nsuch as for memory considerations.\n\nIn all cases, if the parameter NArw_marker is activated returned sets will be\nsupplemented with a NArw column indicating rows that were subject to infill. Each \ntransformation category has a default infill approach detailed below.\n\n...\n\n## Library of Transformations\n\nAutomunge has a built in library of transformations that can be passed for\nspecific columns with assigncat. (A column if left unassigned will defer to\nthe automated default methods to evaluate properties of the data to infer \nappropriate methods of numerical encoding.) For example, a user can pass a \nmin-max scaling method to a specific column 'col1' with: \n```\nassigncat = {'mnmx':['col1']}\n```\nWhen a user assigns a column to a specific category, that category is treated\nas the root category for the tree of transformations. Each key has an \nassociated transformation function, and that transformation function is only\napplied if the root key is also found in the tree of family primitives. The\ntree of family primitives, as introduced earlier, applies first the keys found \nin upstream primitives i.e. parents/siblings/auntsuncles/cousins. If a transform \nis applied for a primitive that includes downstream offspring, such as parents/\nsiblings, then the family tree for that key with offspring is inspected to determine\ndownstream offspring categories, for example if we have a parents key of 'mnmx',\nthen any children/niecesnephews/coworkers/friends in the 'mnmx' family tree will\nbe applied as parents/siblings/auntsuncles/cousins, respectively. Note that the\ndesignation for supplements/replaces refers purely to the question of whether the\ncolumn to which the transform is being applied is kept in place or removed. Please\nnote that it is a quirck of the function that no original column can be left in \nplace without the application of some transformation such as to allow the building\nof the apppropriate data structures, thus at least one replacement primitive must\nalways be included. If a user does wish to leave a column in place unaltered, they \ncan simply assign that column to the 'excl' root category.\n\nNow we'll start here by listing again the family tree primitives for those root \ncategories built into the automunge library. After that we'll give a quick \nnarrative for each of the associated transformation functions. First here again\nare the family tree primitives.\n\n```\n'parents' : \nupstream / first generation / replaces column / with offspring\n\n'siblings': \nupstream / first generation / supplements column / with offspring\n\n'auntsuncles' : \nupstream / first generation / replaces column / no offspring\n\n'cousins' : \nupstream / first generation / supplements column / no offspring\n\n'children' : \ndownstream parents / offspring generations / replaces column / with offspring\n\n'niecesnephews' : \ndownstream siblings / offspring generations / supplements column / with offspring\n\n'coworkers' : \ndownstream auntsuncles / offspring generations / replaces column / no offspring\n\n'friends' : \ndownstream cousins / offspring generations / supplements column / no offspring\n```\n\nHere is a quick description of the transformation functions associated \nwith each key which can be assigned to a primitive (and not just used as \na root key). We're continuing to build out this library of transformations.\n\nNote the design philosophy is that any transform can be applied to any type \nof data and if the data is not suited (such as applying a numeric transform\nto a categorical set) the transform will just return all zeros. Note the \ndefault infill refers to the infill applied under 'standardinfill'. Note the\ndefault NArowtype refers to the categories of data that won't be subject to \ninfill.\n\n* nmbr/nbr2/nbr3: z-score normalization\n - default infill: mean\n - default NArowtype: numeric\n* dxdt: rate of change (row value minus value in preceding row)\n - default infill: adjacent cells\n - default NArowtype: numeric\n* dxd2: denoised rate of change (average of last two rows minus average\nof preceding two rows)\n - default infill: adjacent cells\n - default NArowtype: numeric\n* MADn/MAD2: mean absolute deviation normalization, subtract set mean\n - default infill: mean\n - default NArowtype: numeric\n* MAD3: mean absolute deviation normalization, subtract set maximum\n - default infill: mean\n - default NArowtype: numeric\n* mnmx/mnm2/mnm5: vanilla min-max scaling\n - default infill: mean\n - default NArowtype: numeric\n* mnm3/mnm4: min-max scaling with outliers capped at 0.01 and 0.99 quantiles\n - default infill: mean\n - default NArowtype: numeric\n* mnm6: min-max scaling with test floor set capped at min of train set (ensures\ntest set returned values >= 0, such as might be useful for kernel PCA for instance)\n - default infill: mean\n - default NArowtype: numeric\n* bnry: converts sets with two values to boolean identifiers. Defaults to assiging\n1 to most common value and 0 to second most common, unless 1 or 0 is already included\nin most common of the set then defaults to maintaining those designations. If applied \nto set with >2 entries applies infill to those entries beyond two most common. \n - default infill: most common value\n - default NArowtype: justNaN\n* text: converts categorical sets to one-hot encoded set of boolean identifiers\n - default infill: all entries zero\n - default NArowtype: justNaN\n*Please note I recommend caution on using splt/spl2/spl5/spl6 transforms on categorical*\n*sets that may include scientific units for instance, as prefixes will not be noted*\n*for overlaps, e.g. this wouldn't distinguish between kilometer and meter for instance.*\n* splt: searches categorical sets for overlaps between strings and returns new boolean column\nfor identified overlap categories. Note this treats numeric values as strings eg 1.3 = '1.3'.\nNote that priority is given to overlaps of higher length, and by default overlap searches\nstart at 20 character length and go down to 5 character length.\n - default infill: none\n - default NArowtype: justNaN\n* spl2: similar to splt, but instead of creating new column identifier it replaces categorical \nentries with the abbreviated string overlap\n - default infill: none\n - default NArowtype: justNaN\n* spl5: similar to spl2, but those entries without idenitified string overlap are set to 0,\n(used in ors5 in conjunction with ord3)\n - default infill: none\n - default NArowtype: justNaN\n* spl6: similar to spl5, but with a splt performed downstream for identification of overlaps\nwithin the overlaps\n - default infill: none\n - default NArowtype: justNaN\n* ordl/ord2: converts categorical sets to ordinally encoded set of integer identifiers\n - default infill: plug value 'zzzinfill'\n - default NArowtype: justNaN\n* ord3/ord4: converts categorical sets to ordinally encoded set of integer identifiers\nsorted by frequency of category occurance\n - default infill: plug value 'zzzinfill'\n - default NArowtype: justNaN\n* 1010: converts categorical sets of >2 unique values to binary encoding (more memory \nefficent than one-hot encoding)\n - default infill: plug value 'zzzinfill'\n - default NArowtype: justNaN\n* bxcx/bxc2/bxc3/bxc4: performs Box-Cox power law transformation. Applies infill to values \n<= 0. Note we currently have a test for overflow in returned results and if found set to 0.\n - default infill: mean\n - default NArowtype: positivenumeric\n* log0/log1: performs logarithmic transofrm (base 10). Applies infill to values <= 0.\n - default infill: mean\n - default NArowtype: positivenumeric\n* pwrs: bins groupings by powers of 10\n - default infill: mean\n - default NArowtype: positivenumeric\n* bins: for numerical sets, outputs a set of 6 columns indicating where a\nvalue fell with respect to number of standard deviations from the mean of the\nset (i.e. <-2, -2-1, -10, 01, 12, >2)\n - default infill: mean\n - default NArowtype: numeric\n* bint: comparable to bins but assumes data has already been z-score normalized\n - default infill: mean\n - default NArowtype: numeric\n* date/dat2: for datetime formatted data, segregates data by time scale to multiple\ncolumns (year/month/day/hour/minute/second) and then performs z-score normalization\n - default infill: mean\n - default NArowtype: justNaN\n* wkdy: boolean identifier indicating whether a datetime object is a weekday\n - default infill: none\n - default NArowtype: justNaN\n* bshr: boolean identifier indicating whether a datetime object falls within business\nhours (9-5, time zone unaware)\n - default infill: none\n - default NArowtype: justNaN\n* hldy: boolean identifier indicating whether a datetime object is a US Federal\nholiday\n - default infill: none\n - default NArowtype: justNaN\n* year/mnth/days/hour/mint/scnd: segregated by time scale and z-score normalization\n - default infill: mean\n - default NArowtype: justNaN\n* mnsn/mncs/dysn/dycs/hrsn/hrcs/misn/mics/scsn/sccs: segregated by time scale and \ndual columns with sin and cos transformations for time scale period\n - default infill: mean\n - default NArowtype: justNaN\n* mdsn/mdcs: similar sin/cos treatment, but for combined month/day\n - default infill: mean\n - default NArowtype: justNaN\n* hmss/hmsc: similar sin/cos treatment, but for combined hour/minute/second\n - default infill: mean\n - default NArowtype: justNaN\n* dat6: default transformation set for time series data, returns:\n'year', 'mdsn', 'mdcs', 'hmss', 'hmsc', 'bshr', 'wkdy', 'hldy'\n - default infill: mean\n - default NArowtype: justNaN\n* null: deletes source column\n - default infill: none\n - default NArowtype: exclude\n* excl: passes source column un-altered\n - default infill: none\n - default NArowtype: exclude\n* exc2: passes source column unaltered other than force to numeric, mode infill applied\n - default infill: mode\n - default NArowtype: numeric\n* eval: performs distribution property evaluation consistent with the automunge\n'powertransform' parameter activated to designated column\n - default infill: based on evaluation\n - default NArowtype: based on evaluation\n* NArw: produces a column of boolean identifiers for rows in the source\ncolumn with missing or improperly formatted values. Note that when NArw\nis assigned in a family tree it bases NArowtype on the root category, \nwhen NArw is passed as the root category it bases NArowtype on default.\n - default infill: not applicable\n - default NArowtype: justNaN\n* NAr2: produces a column of boolean identifiers for rows in the source\ncolumn with missing or improperly formatted values.\n - default infill: not applicable\n - default NArowtype: numeric\n* NAr3: produces a column of boolean identifiers for rows in the source\ncolumn with missing or improperly formatted values.\n - default infill: not applicable\n - default NArowtype: positivenumeric\n\n\n\nAnd here are the series of family trees currently built into the internal library.\n\n```\n transform_dict.update({'nmbr' : {'parents' : ['nmbr'], \\\n 'siblings': [], \\\n 'auntsuncles' : [], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : [bint]}})\n\n transform_dict.update({'dxdt' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['dxdt'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'d2dt' : {'parents' : ['d2dt'], \\\n 'siblings': [], \\\n 'auntsuncles' : [], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : ['dxdt']}})\n\n transform_dict.update({'d3dt' : {'parents' : ['d3dt'], \\\n 'siblings': [], \\\n 'auntsuncles' : [], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : ['d2dt'], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'dxd2' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['dxd2'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'d2d2' : {'parents' : ['d2d2'], \\\n 'siblings': [], \\\n 'auntsuncles' : [], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : ['dxd2']}})\n\n transform_dict.update({'d3d2' : {'parents' : ['d3d2'], \\\n 'siblings': [], \\\n 'auntsuncles' : [], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : ['d2d2'], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'nmdx' : {'parents' : ['nmdx'], \\\n 'siblings': [], \\\n 'auntsuncles' : [], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : ['dxdt']}})\n\n transform_dict.update({'nmd2' : {'parents' : ['nmd2'], \\\n 'siblings': [], \\\n 'auntsuncles' : [], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : ['d2dt'], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'nmd3' : {'parents' : ['nmd3'], \\\n 'siblings': [], \\\n 'auntsuncles' : [], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : ['d3dt'], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'mmdx' : {'parents' : ['mnmx'], \\\n 'siblings': [], \\\n 'auntsuncles' : [], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : ['dxdt']}})\n\n transform_dict.update({'mmd2' : {'parents' : ['mmd2'], \\\n 'siblings': [], \\\n 'auntsuncles' : [], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : ['d2dt'], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'mmd3' : {'parents' : ['mmd3'], \\\n 'siblings': [], \\\n 'auntsuncles' : [], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : ['d3dt'], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'bnry' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['bnry'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'text' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['text'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'txt2' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['text'], \\\n 'cousins' : [NArw, 'splt'], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'txt3' : {'parents' : ['txt3'], \\\n 'siblings': [], \\\n 'auntsuncles' : [], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : ['text'], \\\n 'friends' : []}})\n\n transform_dict.update({'splt' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['splt'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'spl2' : {'parents' : ['spl2'], \\\n 'siblings': [], \\\n 'auntsuncles' : [], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : ['ordl'], \\\n 'friends' : []}})\n\n transform_dict.update({'spl3' : {'parents' : ['spl2'], \\\n 'siblings': [], \\\n 'auntsuncles' : [], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : ['ord3'], \\\n 'friends' : []}})\n\n transform_dict.update({'spl4' : {'parents' : ['spl4'], \\\n 'siblings': [], \\\n 'auntsuncles' : [], \\\n 'cousins' : [NArw], \\\n 'children' : ['spl3'], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'spl5' : {'parents' : ['spl5'], \\\n 'siblings': [], \\\n 'auntsuncles' : [], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : ['ord3'], \\\n 'friends' : []}})\n\n transform_dict.update({'spl6' : {'parents' : ['spl6'], \\\n 'siblings': [], \\\n 'auntsuncles' : [], \\\n 'cousins' : [NArw], \\\n 'children' : ['splt'], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : ['ord3']}})\n\n transform_dict.update({'ors5' : {'parents' : ['spl5'], \\\n 'siblings': [], \\\n 'auntsuncles' : ['ord3'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'ors6' : {'parents' : ['spl6'], \\\n 'siblings': [], \\\n 'auntsuncles' : ['ord3'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'ordl' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['ordl'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'ord2' : {'parents' : ['ord2'], \\\n 'siblings': [], \\\n 'auntsuncles' : [], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : ['mnmx'], \\\n 'friends' : []}})\n\n transform_dict.update({'ord3' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['ord3'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'ord4' : {'parents' : ['ord4'], \\\n 'siblings': [], \\\n 'auntsuncles' : [], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : ['mnmx'], \\\n 'friends' : []}})\n\n transform_dict.update({'ors2' : {'parents' : ['spl3'], \\\n 'siblings': [], \\\n 'auntsuncles' : ['ord3'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'or10' : {'parents' : ['ord4'], \\\n 'siblings': [], \\\n 'auntsuncles' : ['1010'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : ['mnmx'], \\\n 'friends' : []}})\n\n transform_dict.update({'om10' : {'parents' : ['ord4'], \\\n 'siblings': [], \\\n 'auntsuncles' : ['1010', 'mnmx'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : ['mnmx'], \\\n 'friends' : []}})\n\n transform_dict.update({'mmor' : {'parents' : ['ord4'], \\\n 'siblings': [], \\\n 'auntsuncles' : ['mnmx'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'1010' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['1010'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'null' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['null'], \\\n 'cousins' : [], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'NArw' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['NArw'], \\\n 'cousins' : [], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'NAr2' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['NArw'], \\\n 'cousins' : [], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'NAr3' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['NArw'], \\\n 'cousins' : [], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'nbr2' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['nmbr'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'nbr3' : {'parents' : ['nbr3'], \\\n 'siblings': [], \\\n 'auntsuncles' : [], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : ['bint']}})\n\n transform_dict.update({'MADn' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['MADn'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'MAD2' : {'parents' : ['MAD2'], \\\n 'siblings': [], \\\n 'auntsuncles' : [], \\\n 'cousins' : [NArw], \\\n 'children' : ['nmbr'], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'MAD3' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['MAD3'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'mnmx' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['mnmx'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'mnm2' : {'parents' : ['nmbr'], \\\n 'siblings': [], \\\n 'auntsuncles' : ['mnmx'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'mnm3' : {'parents' : ['nmbr'], \\\n 'siblings': [], \\\n 'auntsuncles' : ['mnm3'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'mnm4' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['mnm3'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'mnm5' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['mnmx'], \\\n 'cousins' : ['nmbr', NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'mnm6' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['mnm6'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'mnm7' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['mnmx', 'bins'], \\\n 'cousins' : [], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'date' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['year', 'mnth', 'days', 'hour', 'mint', 'scnd'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'dat2' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['bshr', 'wkdy', 'hldy'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'dat3' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['year', 'mnsn', 'mncs', 'dysn', 'dycs', 'hrsn', 'hrcs', 'misn', 'mics', 'scsn', 'sccs'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'dat4' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['year', 'mdsn', 'mdcs', 'hmss', 'hmsc'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'dat5' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['year', 'mdsn', 'mdcs', 'dysn', 'dycs', 'hmss', 'hmsc'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'dat6' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['year', 'mdsn', 'mdcs', 'hmss', 'hmsc', 'bshr', 'wkdy', 'hldy'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'year' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['year'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'yea2' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['year', 'mdsn', 'mdcs'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'mnth' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['mnth'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'mnt2' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['mnsn', 'mncs'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'mnt3' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['mnsn', 'mncs', 'dysn', 'dycs', 'hrsn', 'hrcs', 'misn', 'mics', 'scsn', 'sccs'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'mnt4' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['mdsn', 'mdcs'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'mnt5' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['mdsn', 'mdcs', 'hmss', 'hmsc'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'mnt6' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['mdsn', 'mdcs', 'dysn', 'dycs', 'hmss', 'hmsc'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'mnsn' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['mnsn'], \\\n 'cousins' : [], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'mncs' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['mncs'], \\\n 'cousins' : [], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'mdsn' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['mdsn'], \\\n 'cousins' : [], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'mdcs' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['mdcs'], \\\n 'cousins' : [], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'days' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['days'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'day2' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['dysn', 'dycs'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'day3' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['dysn', 'dycs', 'hrsn', 'hrcs', 'misn', 'mics', 'scsn', 'sccs'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'day4' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['dhms', 'dhmc'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'day5' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['dhms', 'dhmc', 'hmss', 'hmsc'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'dysn' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['dysn'], \\\n 'cousins' : [], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'dycs' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['dycs'], \\\n 'cousins' : [], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'dhms' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['dhms'], \\\n 'cousins' : [], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'dhmc' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['dhmc'], \\\n 'cousins' : [], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'hour' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['hour'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'hrs2' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['hrsn', 'hrcs'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'hrs3' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['hrsn', 'hrcs', 'misn', 'mics', 'scsn', 'sccs'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'hrs4' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['hmss', 'hmsc'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'hrsn' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['hrsn'], \\\n 'cousins' : [], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'hrcs' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['hrcs'], \\\n 'cousins' : [], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'hmss' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['hmss'], \\\n 'cousins' : [], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'hmsc' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['hmsc'], \\\n 'cousins' : [], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'mint' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['mint'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'min2' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['misn', 'mics'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'min3' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['misn', 'mics', 'scsn', 'sccs'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'min4' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['mssn', 'mscs'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'misn' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['misn'], \\\n 'cousins' : [], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'mics' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['mics'], \\\n 'cousins' : [], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'mssn' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['mssn'], \\\n 'cousins' : [], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'mscs' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['mscs'], \\\n 'cousins' : [], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'scnd' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['scnd'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'scn2' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['scsn', 'sccs'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'scsn' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['scsn'], \\\n 'cousins' : [], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'sccs' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['sccs'], \\\n 'cousins' : [], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'bxcx' : {'parents' : ['bxcx'], \\\n 'siblings': [], \\\n 'auntsuncles' : [], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : ['nmbr'], \\\n 'friends' : []}})\n\n transform_dict.update({'bxc2' : {'parents' : ['bxc2'], \\\n 'siblings': ['nmbr'], \\\n 'auntsuncles' : [], \\\n 'cousins' : [NArw], \\\n 'children' : ['nmbr'], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'bxc3' : {'parents' : ['bxc3'], \\\n 'siblings': [], \\\n 'auntsuncles' : [], \\\n 'cousins' : [NArw], \\\n 'children' : ['nmbr'], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'bxc4' : {'parents' : ['bxc4'], \\\n 'siblings': [], \\\n 'auntsuncles' : [], \\\n 'cousins' : [NArw], \\\n 'children' : ['nbr2'], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'pwrs' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['pwrs'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'log0' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['log0'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'log1' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['log0', 'pwrs'], \\\n 'cousins' : [NArw], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'wkdy' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['wkdy'], \\\n 'cousins' : [], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'bshr' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['bshr'], \\\n 'cousins' : [], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'hldy' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['hldy'], \\\n 'cousins' : [], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'bins' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['bins'], \\\n 'cousins' : [], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'bint' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['bint'], \\\n 'cousins' : [], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'excl' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['excl'], \\\n 'cousins' : [], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'exc2' : {'parents' : ['exc2'], \\\n 'siblings': [], \\\n 'auntsuncles' : [], \\\n 'cousins' : [], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}})\n\n transform_dict.update({'exc3' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['exc2'], \\\n 'cousins' : [], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : ['bins']}})\n```\n\n\n...\n\n## Custom Transformation Functions\n\nOk final item on the agenda, we're going to demonstrate methods to create custom\ntransformation functions, such that a user may customize the feature engineering\nwhile building on all of the extremely useful built in features of automunge such\nas infill methods including ML infill, feature importance, dimensionality reduction,\npreparation for class imbalance oversampling, and perhaps most importantly the \nsimplest possible way for consistent processing of additional data with just a single \nfunction call. The transformation functions will need to be channeled through pandas \nand incorproate a handful of simple data structures, which we'll demonstrate below.\n\nLet's say we want to recreate the mm3 category which caps outliers at 0.01 and 0.99\nquantiles, but instead make it the 0.001 and 0.999 quantiles. Well we'll call this \ncateogry mnm8. So in order to pass a custom transformation function, first we'll need \nto define a new root category transformdict and a corresponding processdict.\n\n```\n#Let's creat a really simple family tree for the new root category mnmn8 which\n#simply creates a column identifying any rows subject to infill (NArw), performs \n#a z-score normalization, and seperately performs a version of the new transform\n#mnm8 which we'll define below.\n\ntransformdict = {'mnm8' : {'parents' : [], \\\n 'siblings': [], \\\n 'auntsuncles' : ['mnm8', 'nmbr'], \\\n 'cousins' : ['NArw'], \\\n 'children' : [], \\\n 'niecesnephews' : [], \\\n 'coworkers' : [], \\\n 'friends' : []}, \\\n\n#Note that since this mnm8 requires passing normalization parameters derived\n#from the train set to process the test set, we'll need to create two seperate \n#trasnformation functions, the first a \"dualprocess\" function that processes\n#both the train and if available a test set simultaneously, and the second\n#a \"postprocess\" that only processes the test set on it's own.\n\n#So what's being demonstrated here is that we're passing the functions under\n#dualprocess and postprocess that we'll define below.\n\nprocessdict = {'mnm8' : {'dualprocess' : process_mnm8_class, \\\n 'singleprocess' : None, \\\n 'postprocess' : postprocess_mnm8_class, \\\n 'NArowtype' : 'numeric', \\\n 'MLinfilltype' : 'numeric', \\\n 'labelctgy' : 'mnm8'}}\n\n#Now we have to define the custom processing functions which we are passing through\n#the processdict to automunge.\n\n#Here we'll define a \"dualprocess\" function intended to process both a train and\n#test set simulateously. We'll also need to create a seperate \"postprocess\"\n#function intended to just process the test set.\n\n#define the function\ndef process_mnm8_class(mdf_train, mdf_test, column, category, \\\n postprocess_dict):\n #where\n #mdf_train is the train data set (pandas dataframe)\n #mdf_test is the consistently formatted test dataset (if no test data \n #set is passed to automunge a small dummy set will be passed in it's place)\n #column is the string identifying the column header\n #category is the (traditionally 4 character) string category identifier, here is \n #will be 'mnm8', postprocess_dict is an object we pass to share data between \n #functions and later returned from automunge.\n\n #create thee new column, using the category key as a suffix identifier\n\n #copy source column into new column\n mdf_train[column + '_mnm8'] = mdf_train[column].copy()\n mdf_test[column + '_mnm8'] = mdf_test[column].copy()\n\n\n #perform an initial infill method, here we use mean as a plug, automunge\n #will seperately perform a infill method per user specifications elsewhere\n #convert all values to either numeric or NaN\n mdf_train[column + '_mnm8'] = pd.to_numeric(mdf_train[column + '_mnm8'], errors='coerce')\n mdf_test[column + '_mnm8'] = pd.to_numeric(mdf_test[column + '_mnm8'], errors='coerce')\n\n #if we want to collect any statistics for the driftreport we could do so prior\n #to transformations and save them in the normalization dictionary below with the\n #other normalization parameters, e.g.\n min = mdf_train[column + '_mnm8'].min()\n max = mdf_train[column + '_mnm8'].max()\n\n #Now we do the specifics of the processing function, here we're demonstrating\n #the min-max scaling method capping values at 0.001 and 0.999 quantiles\n #in some cases we would address infill first, here to preserve the quantile evaluation\n #we'll do that first\n\n #get high quantile of training column for min-max scaling\n quantilemax = mdf_train[column + '_mnm8'].quantile(.999)\n\n #outlier scenario for when data wasn't numeric (nan != nan)\n if quantilemax != quantilemax:\n quantilemax = 0\n\n #get low quantile of training column for min-max scaling\n quantilemin = mdf_train[column + '_mnm8'].quantile(.001)\n\n if quantilemax != quantilemax:\n quantilemax = 0\n\n #replace values > quantilemax with quantilemax for both train and test data\n mdf_train.loc[mdf_train[column + '_mnm8'] > quantilemax, (column + '_mnm8')] \\\n = quantilemax\n mdf_test.loc[mdf_train[column + '_mnm8'] > quantilemax, (column + '_mnm8')] \\\n = quantilemax\n\n #replace values < quantile10 with quantilemin for both train and test data\n mdf_train.loc[mdf_train[column + '_mnm8'] < quantilemin, (column + '_mnm8')] \\\n = quantilemin\n mdf_test.loc[mdf_train[column + '_mnm8'] < quantilemin, (column + '_mnm8')] \\\n = quantilemin\n\n\n #note the infill method is now completed after the quantile evaluation / replacement\n #get mean of training data for infill\n mean = mdf_train[column + '_mnm8'].mean()\n\n if mean != mean:\n mean = 0\n\n #replace missing data with training set mean\n mdf_train[column + '_mnm8'] = mdf_train[column + '_mnm8'].fillna(mean)\n mdf_test[column + '_mnm8'] = mdf_test[column + '_mnm8'].fillna(mean)\n\n #this is to avoid outlier div by zero when max = min\n maxminusmin = quantilemax - quantilemin\n if maxminusmin == 0:\n maxminusmin = 1\n\n #perform min-max scaling to train and test sets using values derived from train\n mdf_train[column + '_mnm8'] = (mdf_train[column + '_mnm8'] - quantilemin) / \\\n (maxminusmin)\n mdf_test[column + '_mnm8'] = (mdf_test[column + '_mnm8'] - quantilemin) / \\\n (maxminusmin)\n\n\n #ok here's where we populate the data structures\n\n #create list of columns (here it will only be one column returned)\n nmbrcolumns = [column + '_mnm8']\n\n #The normalization dictionary is how we pass values between the \"dualprocess\"\n #function and the \"postprocess\" function. This is also where we save any metrics\n #we want to track such as to track drift in the postmunge driftreport.\n\n #Here we populate the normalization dictionary with any values derived from\n #the train set that we'll need to process the test set.\n nmbrnormalization_dict = {column + '_mnm8' : {'quantilemin' : quantilemin, \\\n 'quantilemax' : quantilemax, \\\n 'mean' : mean, \\\n 'minimum' : min, \\\n 'maximum' : max}}\n\n #the column_dict_list is returned from the function call and supports the \n #automunge methods. We populate it as follows:\n\n #initialize\n column_dict_list = []\n\n #where we're storing following\n #{'category' : 'mnm8', \\ -> identifier of the category fo transform applied\n # 'origcategory' : category, \\ -> category of original column in train set, passed in function call\n # 'normalization_dict' : nmbrnormalization_dict, \\ -> normalization parameters of train set\n # 'origcolumn' : column, \\ -> ID of original column in train set\n # 'columnslist' : nmbrcolumns, \\ -> a list of columns created in this transform, \n # later fleshed out to include all columns derived from same source column\n # 'categorylist' : [nc], \\ -> a list of columns created in this transform\n # 'infillmodel' : False, \\ -> populated elsewhere, for now enter False\n # 'infillcomplete' : False, \\ -> populated elsewhere, for now enter False\n # 'deletecolumn' : False}} -> populated elsewhere, for now enter False\n\n #for column in nmbrcolumns\n for nc in nmbrcolumns:\n\n if nc[-5:] == '_mnm8':\n\n column_dict = { nc : {'category' : 'mnm8', \\\n 'origcategory' : category, \\\n 'normalization_dict' : nmbrnormalization_dict, \\\n 'origcolumn' : column, \\\n 'columnslist' : nmbrcolumns, \\\n 'categorylist' : nmbrcolumns, \\\n 'infillmodel' : False, \\\n 'infillcomplete' : False, \\\n 'deletecolumn' : False}}\n\n column_dict_list.append(column_dict.copy())\n\n\n\n return mdf_train, mdf_test, column_dict_list\n\n #where mdf_train and mdf_test now have the new column incorporated\n #and column_dict_list carries the data structures supporting the operation \n #of automunge. (If the original column was intended for replacement it \n #will be stricken elsewhere)\n\n\n#and then since this is a method that passes values between the train\n#and test sets, we'll need to define a corresponding \"postprocess\" function\n#intended for use on just the test set\n\ndef postprocess_mnm3_class(mdf_test, column, postprocess_dict, columnkey):\n #where mdf_test is a dataframe of the test set\n #column is the string of the column header\n #postprocess_dict is how we carry packets of data between the \n #functions in automunge and postmunge\n #columnkey is a key used to access stuff in postprocess_dict if needed\n #(columnkey is only valid for initial root categories, if you want to use function\n #as a downstream category we have to recreate a columnkey such as follows for normkey)\n\n #retrieve normalization parameters from postprocess_dict\n normkey = column + '_mnm8'\n\n mean = \\\n postprocess_dict['column_dict'][normkey]['normalization_dict'][normkey]['mean']\n\n quantilemin = \\\n postprocess_dict['column_dict'][normkey]['normalization_dict'][normkey]['quantilemin']\n\n quantilemax = \\\n postprocess_dict['column_dict'][normkey]['normalization_dict'][normkey]['quantilemax']\n\n #copy original column for implementation\n mdf_test[column + '_mnm8'] = mdf_test[column].copy()\n\n\n #convert all values to either numeric or NaN\n mdf_test[column + '_mnm8'] = pd.to_numeric(mdf_test[column + '_mnm8'], errors='coerce')\n\n #get mean of training data\n mean = mean \n\n #replace missing data with training set mean\n mdf_test[column + '_mnm8'] = mdf_test[column + '_mnm8'].fillna(mean)\n\n #this is to avoid outlier div by zero when max = min\n maxminusmin = quantilemax - quantilemin\n if maxminusmin == 0:\n maxminusmin = 1\n\n #perform min-max scaling to test set using values from train\n mdf_test[column + '_mnm8'] = (mdf_test[column + '_mnm8'] - quantilemin) / \\\n (maxminusmin)\n\n\n return mdf_test\n\n#Voila\n\n#One more demonstration, note that if we didn't need to pass any properties\n#between the train and test set, we could have just processed one at a time,\n#and in that case we wouldn't need to define seperate functions for \n#dualprocess and postprocess, we could just define what we call a singleprocess \n#function incorproating similar data strucures but without only a single dataframe \n#passed\n\n#Such as:\ndef process_mnm8_class(df, column, category, postprocess_dict):\n\n #etc\n\n return return df, column_dict_list\n\n#For a full demonstration check out my essay \n\"Automunge 1.79: An Open Source Platform for Feature Engineering\"\n\n\n```\n\nAnd there you have it, you now have all you need to wrangle data on the \nAutomunge platform. Feedback is welcome.\n\n\n...\n\nAs a citation, please note that the Automunge package makes use of \nthe Pandas, Scikit-learn, and NumPy libraries.\n\nWes McKinney. Data Structures for Statistical Computing in Python,\nProceedings of the 9th Python in Science Conference, 51-56 (2010)\n[publisher\nlink](http://conference.scipy.org/proceedings/scipy2010/mckinney.html)\n\nFabian Pedregosa, Ga\u00ebl Varoquaux, Alexandre Gramfort, Vincent Michel,\nBertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer,\nRon Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David\nCournapeau, Matthieu Brucher, Matthieu Perrot, \u00c9douard Duchesnay.\nScikit-learn: Machine Learning in Python, Journal of Machine Learning\nResearch, 12, 2825-2830 (2011) [publisher\nlink](http://jmlr.org/papers/v12/pedregosa11a.html)\n\nSorry I don't know paper to cite, but Numpy website at:\nhttps://www.numpy.org/\n\n...\n\nHave fun munging!\n\n...\n\nYou can read more about the tool through the blog posts documenting the\ndevelopment on medium [here](https://medium.com/automunge) or for more\nwriting I recently completed my first collection of essays titled \"From\nthe Diaries of John Henry\" which is also available on Medium\n[turingsquared.com](https://turingsquared.com).\n\nThe Automunge website is helpfully located at URL\n[automunge.com](https://automunge.com).\n\n...\n\nPatent Pending, application 16552857\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/Automunge/AutoMunge", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "Automunge", "package_url": "https://pypi.org/project/Automunge/", "platform": "", "project_url": "https://pypi.org/project/Automunge/", "project_urls": { "Homepage": "https://github.com/Automunge/AutoMunge" }, "release_url": "https://pypi.org/project/Automunge/2.70/", "requires_dist": [ "numpy", "pandas", "scikit-learn", "scipy" ], "requires_python": "", "summary": "A tool for automated data wrangling", "version": "2.70" }, "last_serial": 5993453, "releases": { "2.55": [ { "comment_text": "", "digests": { "md5": "6ccea8d80d34b5d26c5c4ff1d7264f7d", "sha256": "23f0cc5139622ad0cd23592d6c93b3f6b69fc9f40b0a96c18a9a6d536cd5d061" }, "downloads": -1, "filename": "Automunge-2.55-py3-none-any.whl", "has_sig": false, "md5_digest": "6ccea8d80d34b5d26c5c4ff1d7264f7d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 120566, "upload_time": "2019-09-22T01:06:54", "url": "https://files.pythonhosted.org/packages/81/12/483c92790370f47f9da3073b68f963b5515f9e02eea3aba9be0aff103f7d/Automunge-2.55-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "58907cfe9c3d4dbec175e633dd1cdad2", "sha256": "dd50c1ecbd9c2721c09517672eaafb164fb898dea34a42cc108e20ebecb2bc75" }, "downloads": -1, "filename": "Automunge-2.55.tar.gz", "has_sig": false, "md5_digest": "58907cfe9c3d4dbec175e633dd1cdad2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 144519, "upload_time": "2019-09-22T01:06:57", "url": "https://files.pythonhosted.org/packages/82/eb/7e322a25e33e59a770d33d28c56a8a1b7a730265df2e47ce715543f517d5/Automunge-2.55.tar.gz" } ], "2.56": [ { "comment_text": "", "digests": { "md5": "981a09496ad01766388dcab11fae9582", "sha256": "a7439e4fee8122d2da2ce4d673a18c7186cf340d69b86a99510cf222e636ae87" }, "downloads": -1, "filename": "Automunge-2.56-py3-none-any.whl", "has_sig": false, "md5_digest": "981a09496ad01766388dcab11fae9582", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 208920, "upload_time": "2019-09-22T01:33:25", "url": "https://files.pythonhosted.org/packages/f8/d2/6539917655882b90ac312878ac574a9909e079d18f73bf04c31619a80e96/Automunge-2.56-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e4e80de421fb9e513fab02e5a812ce3c", "sha256": "213ddd911127c836224424bc81036c1de6ffcfd44599d4d8796cd99a2f94afcd" }, "downloads": -1, "filename": "Automunge-2.56.tar.gz", "has_sig": false, "md5_digest": "e4e80de421fb9e513fab02e5a812ce3c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 231289, "upload_time": "2019-09-22T01:33:28", "url": "https://files.pythonhosted.org/packages/6a/98/41b72a7bab97cc09249254326cbed05b192f0cfde2fdb3b1723e08a32478/Automunge-2.56.tar.gz" } ], "2.57": [ { "comment_text": "", "digests": { "md5": "7e3480e31705e188e6c086de134ca78b", "sha256": "39a956862c8e624ace388c6f88d7f865be9239283d95058380694a5d30a978c7" }, "downloads": -1, "filename": "Automunge-2.57-py3-none-any.whl", "has_sig": false, "md5_digest": "7e3480e31705e188e6c086de134ca78b", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 208924, "upload_time": "2019-09-22T01:53:24", "url": "https://files.pythonhosted.org/packages/b9/ed/95565bd802a425f6b7a2e99c7e46230d0cdda953d40a61e4819ac06ec538/Automunge-2.57-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "a9876ed9401009161fc3c3a3a68b3732", "sha256": "b857bbe8a69717ea75d85d165036634d632cd6032bab63c144e71db154f96d8c" }, "downloads": -1, "filename": "Automunge-2.57.tar.gz", "has_sig": false, "md5_digest": "a9876ed9401009161fc3c3a3a68b3732", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 231289, "upload_time": "2019-09-22T01:53:26", "url": "https://files.pythonhosted.org/packages/3b/a6/7c3bdec231114492dac2a7b2480cf47e557d9bcb4e7ce1cd0a99296031b7/Automunge-2.57.tar.gz" } ], "2.58": [ { "comment_text": "", "digests": { "md5": "695d469113fa01663f8f996cae611bc3", "sha256": "b609c04447922717ba760507755e3262417c0c39da571b599a33855da8fb61e1" }, "downloads": -1, "filename": "Automunge-2.58-py3-none-any.whl", "has_sig": false, "md5_digest": "695d469113fa01663f8f996cae611bc3", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 208932, "upload_time": "2019-09-22T02:29:10", "url": "https://files.pythonhosted.org/packages/60/99/d126fc4e3adf4ca3554acce00b94a4a5e49e4e7da4bda56cd9ec81c608f1/Automunge-2.58-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "1f659c8a0bcea79c7d5d3f4d964bbd60", "sha256": "6a09a42a7d43af596effcd9cf42b93d858cb25f55b48b241780af7dbeb20b0c3" }, "downloads": -1, "filename": "Automunge-2.58.tar.gz", "has_sig": false, "md5_digest": "1f659c8a0bcea79c7d5d3f4d964bbd60", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 231294, "upload_time": "2019-09-22T02:29:12", "url": "https://files.pythonhosted.org/packages/b7/ae/13e3d50d1b6ecc04d2e2ac61389a2b43a3a1becf94c236d6e36a0a40610c/Automunge-2.58.tar.gz" } ], "2.59": [ { "comment_text": "", "digests": { "md5": "5ed65f2be5d98e2b1e1fe852a508e749", "sha256": "95d5d24159561a54cce2b96722e38ff3f26b5be976222d5161047b2cb14ae30b" }, "downloads": -1, "filename": "Automunge-2.59-py3-none-any.whl", "has_sig": false, "md5_digest": "5ed65f2be5d98e2b1e1fe852a508e749", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 210170, "upload_time": "2019-09-23T02:47:00", "url": "https://files.pythonhosted.org/packages/8b/4d/d4ba69253ed53360e3630aae20510ad42998c262f44202116245874f22d0/Automunge-2.59-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "b5a89739ac6d0ac4d1f968292a0f1d6e", "sha256": "563ac1a2d0bd3626c53f555cdc4f8271ecbaa2ee4272a3f1368478d245f613a4" }, "downloads": -1, "filename": "Automunge-2.59.tar.gz", "has_sig": false, "md5_digest": "b5a89739ac6d0ac4d1f968292a0f1d6e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 232976, "upload_time": "2019-09-23T02:47:03", "url": "https://files.pythonhosted.org/packages/5d/f7/af35bf82c7c55290754af16cfcf939796c416066270f4fd94acf7d1b2313/Automunge-2.59.tar.gz" } ], "2.60": [ { "comment_text": "", "digests": { "md5": "e86af004cf12145144a360302cf022a0", "sha256": "5c33d42f7241f719bf40f88eba29090a9f40a4f4910dcd41d929e878b43d61c3" }, "downloads": -1, "filename": "Automunge-2.60-py3-none-any.whl", "has_sig": false, "md5_digest": "e86af004cf12145144a360302cf022a0", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 121757, "upload_time": "2019-09-26T01:04:49", "url": "https://files.pythonhosted.org/packages/c3/85/5478f363e08a304c471ccf7917a2f3ddab7a784a9d60af737bc579425744/Automunge-2.60-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "4072ab14577927dcb6a824b0c62bb286", "sha256": "5b5bfe3b514b8680e65375f2cc90e8d80f052a2546a8231c92bc7aea2989949a" }, "downloads": -1, "filename": "Automunge-2.60.tar.gz", "has_sig": false, "md5_digest": "4072ab14577927dcb6a824b0c62bb286", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 146084, "upload_time": "2019-09-26T01:04:52", "url": "https://files.pythonhosted.org/packages/51/5a/740b77a18dc481f8d1fcb5b0b69e68e8e296b0fb6801693364f2ca5c073d/Automunge-2.60.tar.gz" } ], "2.61": [ { "comment_text": "", "digests": { "md5": "162909d678027f5ceda6f5cc2c082f56", "sha256": "347ca8aad3ded4421ab89892b962d31df6aba05bc628e2bc62485062a01b2a28" }, "downloads": -1, "filename": "Automunge-2.61-py3-none-any.whl", "has_sig": false, "md5_digest": "162909d678027f5ceda6f5cc2c082f56", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 121471, "upload_time": "2019-09-28T16:35:09", "url": "https://files.pythonhosted.org/packages/0b/f2/c452f6706156eb7aa20c6d3e956e29fc2db75f29251bbbd248773a693632/Automunge-2.61-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "31f0d1e3b0cc96ac98389c9b5dcb4b5f", "sha256": "acbc01bfd8d9a61e6366d9f8874266f66f2078fe1e5a685718e5c44e13f31f1c" }, "downloads": -1, "filename": "Automunge-2.61.tar.gz", "has_sig": false, "md5_digest": "31f0d1e3b0cc96ac98389c9b5dcb4b5f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 145802, "upload_time": "2019-09-28T16:35:12", "url": "https://files.pythonhosted.org/packages/ab/81/20ed322f99e5b1c5cf29f52ca15180c45b82d9ba22fcacd3021c69fc3780/Automunge-2.61.tar.gz" } ], "2.62": [ { "comment_text": "", "digests": { "md5": "b1565f344db9b3aec396554b7a448bb6", "sha256": "86b5f10ebc720c1d8b0cc4f1fd35a80e6bda2950e6a5ee1fbd302532a7c1dab7" }, "downloads": -1, "filename": "Automunge-2.62-py3-none-any.whl", "has_sig": false, "md5_digest": "b1565f344db9b3aec396554b7a448bb6", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 122996, "upload_time": "2019-10-03T05:05:20", "url": "https://files.pythonhosted.org/packages/02/5d/820d673a048edaaf01e554ab482d2f86eaccaa8bed7e2300c7c96697eaee/Automunge-2.62-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "be8d39b1dc03114269842635654a88df", "sha256": "97300e6ad50a2d045fba441e0cffcce69eb8756d189f7ef2d98386c76cb4eed8" }, "downloads": -1, "filename": "Automunge-2.62.tar.gz", "has_sig": false, "md5_digest": "be8d39b1dc03114269842635654a88df", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 147716, "upload_time": "2019-10-03T05:05:49", "url": "https://files.pythonhosted.org/packages/40/48/05b64ef08816a487fb9d20de7771ef79153230ee8a3f107b8bb880842c6e/Automunge-2.62.tar.gz" } ], "2.63": [ { "comment_text": "", "digests": { "md5": "ef883a0da972905412d54d0e188a9700", "sha256": "5f1d6b51e940a4229882d0ac42324a7bfacfa9a5f592fc1da4c1ab34252fae8e" }, "downloads": -1, "filename": "Automunge-2.63-py3-none-any.whl", "has_sig": false, "md5_digest": "ef883a0da972905412d54d0e188a9700", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 123324, "upload_time": "2019-10-03T20:01:59", "url": "https://files.pythonhosted.org/packages/a9/1c/2ebdb41398063492386f40411d84ccc932c29fb0b6e6fef80e405f66a1f7/Automunge-2.63-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "311fb19bb783688ba3ad4020186b9c6c", "sha256": "df902ef9a97d146bae85445b8d529300f26c73428da8354e871b36cf84a9c49c" }, "downloads": -1, "filename": "Automunge-2.63.tar.gz", "has_sig": false, "md5_digest": "311fb19bb783688ba3ad4020186b9c6c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 148018, "upload_time": "2019-10-03T20:02:13", "url": "https://files.pythonhosted.org/packages/2c/f0/3c8b998e58a7e12f21c2b05c1d8665a5ebd1459c1b8f8fd5111be0c51e10/Automunge-2.63.tar.gz" } ], "2.64": [ { "comment_text": "", "digests": { "md5": "0cb4686a0d5f3ba10d041c8b2cc2c03a", "sha256": "16aa31121fda65c6706390857e611b98a4aa80ca90dc4db882c6543468e447d8" }, "downloads": -1, "filename": "Automunge-2.64-py3-none-any.whl", "has_sig": false, "md5_digest": "0cb4686a0d5f3ba10d041c8b2cc2c03a", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 125436, "upload_time": "2019-10-05T08:04:34", "url": "https://files.pythonhosted.org/packages/cb/51/ca5534623c87fde97b0b4706e732b90b2074a77bd722fe132545c34c58de/Automunge-2.64-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "f2bf797db59fde6eeea7371658c437eb", "sha256": "b66f7eb1fe68747e390e315621109593d252b8302450e6102d76ed526be3c118" }, "downloads": -1, "filename": "Automunge-2.64.tar.gz", "has_sig": false, "md5_digest": "f2bf797db59fde6eeea7371658c437eb", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 150290, "upload_time": "2019-10-05T08:04:36", "url": "https://files.pythonhosted.org/packages/a0/ef/d09ff253240dfbc253199c90ab659ffc2fc3997de3c7742fc4ba8166fbf8/Automunge-2.64.tar.gz" } ], "2.65": [ { "comment_text": "", "digests": { "md5": "603fad20b27595e2cefefdedc93a6291", "sha256": "30a001c2347993e644d9130741e2c51b4abaafca7aea6e9143b642d10495a4d8" }, "downloads": -1, "filename": "Automunge-2.65-py3-none-any.whl", "has_sig": false, "md5_digest": "603fad20b27595e2cefefdedc93a6291", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 126810, "upload_time": "2019-10-05T17:32:36", "url": "https://files.pythonhosted.org/packages/46/f3/a29cadf32128fb17a2197fe5f3913f7ca28d0fc28b83860f0c2f8786c817/Automunge-2.65-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "85c08c09e98ccaf0d05c5b4f106f363a", "sha256": "dcebdde6094cf4afc1fa00f40e3cbc41714501180f4269581e0b08bb7cb17d47" }, "downloads": -1, "filename": "Automunge-2.65.tar.gz", "has_sig": false, "md5_digest": "85c08c09e98ccaf0d05c5b4f106f363a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 151918, "upload_time": "2019-10-05T17:32:38", "url": "https://files.pythonhosted.org/packages/5a/5a/0e3d218d3571b76c17454a90666a5bf4f3431f13dc100854bf07938ebc82/Automunge-2.65.tar.gz" } ], "2.66": [ { "comment_text": "", "digests": { "md5": "ff922b48ee3ca6caa46a2e1968f62a3d", "sha256": "c74216e76b1195781c314506bbda92003fbc28a2a18c84d8b12663041697d7e4" }, "downloads": -1, "filename": "Automunge-2.66-py3-none-any.whl", "has_sig": false, "md5_digest": "ff922b48ee3ca6caa46a2e1968f62a3d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 127090, "upload_time": "2019-10-05T20:45:50", "url": "https://files.pythonhosted.org/packages/d7/11/0cad09d26bf5c066901df6d98a2bea2d2d0729b73541c191ca2aafcef58f/Automunge-2.66-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "441b5143003a38c2c1a01e7df35b187b", "sha256": "287a2c04339b57d9ee1f7a8b8b13c7eb03795d46800a4d8fc2004e8ea354fedb" }, "downloads": -1, "filename": "Automunge-2.66.tar.gz", "has_sig": false, "md5_digest": "441b5143003a38c2c1a01e7df35b187b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 152678, "upload_time": "2019-10-05T20:45:53", "url": "https://files.pythonhosted.org/packages/d6/b8/9f16d2541b8eda1ab65337d778743bcbfd712bdf92e6cd868a6b53ed6864/Automunge-2.66.tar.gz" } ], "2.67": [ { "comment_text": "", "digests": { "md5": "0e1f163744f52e73d0bc1f18cd854926", "sha256": "f418c5406de95325e000f20e3e4419a85ce4011fe39bc1d18aef1b4397099759" }, "downloads": -1, "filename": "Automunge-2.67-py3-none-any.whl", "has_sig": false, "md5_digest": "0e1f163744f52e73d0bc1f18cd854926", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 127959, "upload_time": "2019-10-09T23:51:36", "url": "https://files.pythonhosted.org/packages/dc/00/565c0d88ddf066695a4ad2fc8333dea281b598d844cd518a760aebe0d393/Automunge-2.67-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "1049a9f5cb1b085916d993518b6dec75", "sha256": "fcc28fa0144a0f16bd558513100c6e25e8b116b380d1db30626c37d3a5ba1e66" }, "downloads": -1, "filename": "Automunge-2.67.tar.gz", "has_sig": false, "md5_digest": "1049a9f5cb1b085916d993518b6dec75", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 154139, "upload_time": "2019-10-09T23:51:41", "url": "https://files.pythonhosted.org/packages/42/56/2192514193426db8a5cc73cda0dbd0f849d24cbaabacf0413b5ef19b3aac/Automunge-2.67.tar.gz" } ], "2.68": [ { "comment_text": "", "digests": { "md5": "66dab5e6f944854afb477499eaafe26a", "sha256": "06b28ab409f01d4adae1a526c0d431cfd9d6db5bf43b228dcfe113fcabb96084" }, "downloads": -1, "filename": "Automunge-2.68-py3-none-any.whl", "has_sig": false, "md5_digest": "66dab5e6f944854afb477499eaafe26a", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 128706, "upload_time": "2019-10-11T21:20:49", "url": "https://files.pythonhosted.org/packages/0b/b2/c776e9fbc58d7d15aba16527c86d6d8c4b34f2bf7f0c02f9091436fe3a7a/Automunge-2.68-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "17e0d2cf9b4b814afbfa66992d38cc20", "sha256": "89faab686db29a777b3b204af764d51005a5fee7c3668fa4f25818904055318e" }, "downloads": -1, "filename": "Automunge-2.68.tar.gz", "has_sig": false, "md5_digest": "17e0d2cf9b4b814afbfa66992d38cc20", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 155576, "upload_time": "2019-10-11T21:20:52", "url": "https://files.pythonhosted.org/packages/d6/06/dff4267f85c853877dd9b8ab28d224ae786fcabc6f537d43a9a883d23850/Automunge-2.68.tar.gz" } ], "2.69": [ { "comment_text": "", "digests": { "md5": "5c4589a63ef47f7b6d95546e94261f2d", "sha256": "36023fe6dbee6c623c996c303f38b3ede0a423c0ecb02bde40c9ea04e0e979d5" }, "downloads": -1, "filename": "Automunge-2.69-py3-none-any.whl", "has_sig": false, "md5_digest": "5c4589a63ef47f7b6d95546e94261f2d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 128976, "upload_time": "2019-10-12T22:30:19", "url": "https://files.pythonhosted.org/packages/9f/1f/c4ed310eab42d12b9c6bc7a3a74115d20b76d54d9cdb286be531b9819a39/Automunge-2.69-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "f80f86052c561d0b9d23e11e95fe3dcf", "sha256": "e58c97ba7e889c3722ca35c393800204a846f1b813568f67d6feaa9c6ad09045" }, "downloads": -1, "filename": "Automunge-2.69.tar.gz", "has_sig": false, "md5_digest": "f80f86052c561d0b9d23e11e95fe3dcf", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 156045, "upload_time": "2019-10-12T22:30:24", "url": "https://files.pythonhosted.org/packages/27/8f/751f55aeeaa340dfc9836c067c12bf5f1b32109955940d578b64c85518fa/Automunge-2.69.tar.gz" } ], "2.70": [ { "comment_text": "", "digests": { "md5": "b1febeee3b66e9f45e38b64b0562f6f7", "sha256": "94713fd92fd02a78b05a95a7baea605a8c3c8a0758477f1a3d34a7905361b1ec" }, "downloads": -1, "filename": "Automunge-2.70-py3-none-any.whl", "has_sig": false, "md5_digest": "b1febeee3b66e9f45e38b64b0562f6f7", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 133648, "upload_time": "2019-10-18T03:07:09", "url": "https://files.pythonhosted.org/packages/52/b8/4e0bad35915bb7e79700dd63b2b8efe90f67d66e117609cd50ab502867d0/Automunge-2.70-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "904c187ea7fd5e5b9fd987aabe404170", "sha256": "faa5744eadb874bfe81fe33ef60f19809bb2a300c01349917749d32ab9725c8c" }, "downloads": -1, "filename": "Automunge-2.70.tar.gz", "has_sig": false, "md5_digest": "904c187ea7fd5e5b9fd987aabe404170", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 165480, "upload_time": "2019-10-18T03:07:14", "url": "https://files.pythonhosted.org/packages/30/98/878613cd7c9c99241b120aae017258604e8b911b514abce6d893165d7889/Automunge-2.70.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "b1febeee3b66e9f45e38b64b0562f6f7", "sha256": "94713fd92fd02a78b05a95a7baea605a8c3c8a0758477f1a3d34a7905361b1ec" }, "downloads": -1, "filename": "Automunge-2.70-py3-none-any.whl", "has_sig": false, "md5_digest": "b1febeee3b66e9f45e38b64b0562f6f7", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 133648, "upload_time": "2019-10-18T03:07:09", "url": "https://files.pythonhosted.org/packages/52/b8/4e0bad35915bb7e79700dd63b2b8efe90f67d66e117609cd50ab502867d0/Automunge-2.70-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "904c187ea7fd5e5b9fd987aabe404170", "sha256": "faa5744eadb874bfe81fe33ef60f19809bb2a300c01349917749d32ab9725c8c" }, "downloads": -1, "filename": "Automunge-2.70.tar.gz", "has_sig": false, "md5_digest": "904c187ea7fd5e5b9fd987aabe404170", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 165480, "upload_time": "2019-10-18T03:07:14", "url": "https://files.pythonhosted.org/packages/30/98/878613cd7c9c99241b120aae017258604e8b911b514abce6d893165d7889/Automunge-2.70.tar.gz" } ] }