{
    "info": {
        "author": "Ezequiel Erbaro",
        "author_email": "eerbaro@gmail.com",
        "bugtrack_url": null,
        "classifiers": [
            "Development Status :: 2 - Pre-Alpha",
            "Intended Audience :: Developers",
            "License :: OSI Approved :: MIT License",
            "Natural Language :: English",
            "Programming Language :: Python :: 2",
            "Programming Language :: Python :: 2.6",
            "Programming Language :: Python :: 2.7",
            "Programming Language :: Python :: 3",
            "Programming Language :: Python :: 3.3",
            "Programming Language :: Python :: 3.4",
            "Programming Language :: Python :: 3.5"
        ],
        "description": "Kodiak\n======\n\n.. image:: https://img.shields.io/github/license/mashape/apistatus.svg\n\n.. image:: https://readthedocs.org/projects/kodiak/badge/?version=latest\n    :target: http://kodiak.readthedocs.io/en/latest/?badge=latest\n\n\nOverview\n--------\n\n**Kodiak** enhances your feature engineering workflow extracting common\npatterns so you can create more features faster.\n\nEx: You have the ``writers`` dataframe, where ``born`` is a datetime\n\n+-----------------------+--------------+\n| name                  | born         |\n+=======================+==============+\n| Miguel de Cervantes   | 09-29-1547   |\n+-----------------------+--------------+\n| William Shakespeare   | 04-23-1617   |\n+-----------------------+--------------+\n\nand you want to extract from the ``born`` column: ``day``, ``month`` and\n``year`` and create 3 new columns\n\n+-----------------------+--------------+---------------+-------------+--------------+\n| name                  | born         | born\\_month   | born\\_day   | born\\_year   |\n+=======================+==============+===============+=============+==============+\n| Miguel de Cervantes   | 09-29-1547   | 9             | 29          | 1547         |\n+-----------------------+--------------+---------------+-------------+--------------+\n| William Shakespeare   | 04-23-1617   | 4             | 23          | 1617         |\n+-----------------------+--------------+---------------+-------------+--------------+\n\nThe simplest thing you could do in Pandas is:\n\n.. code:: python\n\n    writers[\"born_month\"] = writers.born.map(lambda x: x.month)\n    writers[\"born_day\"]   = writers.born.map(lambda x: x.day)\n    writers[\"born_year\"]  = writers.born.map(lambda x: x.year)\n\nWith Kodiak you could get the same result in one line:\n\n.. code:: python\n\n    writers.gencol(\"born_{month,day,year}\", \"born\", lambda x, y: getattr(x, y))\n    # or more succinctly\n    writers.gencol(\"born_{.month,.day,.year}\", \"born\")\n\nBut, how does it work? Kodiak uses ``\"born_{month,day,year}\"`` as a\ntemplate for the columns: ``born_month``, ``born_day`` and ``born_year``\nand passes ``month``,\\ ``day`` and ``year`` as arguments to a provided\nfunction that also has the current 'born' as an an argument, so you're\nbasically doing:\n\n.. code:: python\n\n    for y in ['month', 'day', 'year']:\n        writers[\"born_{}\".format(y)] = writers.born.map(lambda x: getattr(x, y))\n\nKodiak does a lot of other things to boost your workflow, for that, see\nthe *Basic Usage* and *Advanced Usage* sections\n\nInstallation\n------------\n\nTo install Kodiak, ``cd`` to the Kodiak folder and run the install\ncommand:\n\n.. code:: sh\n\n    sudo python setup.py install\n\nYou can also install Kodiak from PyPI:\n\n.. code:: sh\n\n    sudo pip install kodiak\n\nBasic Usage\n-----------\n\n**Kodiak** main object is ``KodiakDataFrame`` an extension of\n``pandas.Dataframe`` that provides the instance method ``colgen`` to\ncreate one or more columns. You create ``KodiakDataFrames`` the same way\nyou create a ``pandas.DataFrame``\n\n``colgen`` signature is:\n``colgen(newcols, col, colbuilder=None, enum=False, config=None)``\n\nnewcols\n  has a double function, it works as a specification of the columns you\n  want to create, and it also contains the values passed to ``colbuilder``\n\n  .. code:: python\n\n      # If you want to create the columns `first_name`, `last_name`\n      # and pass `first` and `last` as arguments to `colbuilder` you write\n      >>> newcols = \"{first,last}_name\"\n\n      # More complex patterns allowed\n      >>> groups = \"col_{a,b}_{c,d}\"\n\n      # Will create the columns: `col_a_c`, `col_b_d`\n      # The way `a,b` and `c,d` is combined can be configured\n\ncol\n  is the `KodiakDataframe` column from where you'll extract information\n  to create your new column/s\n\ncolbuilder\n  is a function or lambda used to extract info from ``col``\n  and create the columns specified in ``newcols`` with the\n  corresponding ``col`` instance and the ``newcols`` values.\n  The signature of `colbuilder` is `colbuilder(x, y)` or\n  `colbuilder(i, x, y)` `x` is an instance of the column passed\n  in `col` and `y` is an argument extracted from `newcols`. The\n  extra argument `i` is an index of the arguments.\n\nconfig\n  tweak Kodiak inner workings with your own config, see the\n  dedicated section for more info\n\nAdvanced Usage\n--------------\n\nIn this section we're going to describe the main components and concepts\nthat are essential to Kodiak\n\nTemplating\n~~~~~~~~~~\n\nThe template language is minimal but has some extensions to help you:\n\nRanges\n^^^^^^\n\nThe range notation is ``start:end:step``. Reverse ranges are permitted\nsetting ``end`` bigger than ``start``. ``step`` default value is ``1``, and\n``start`` is ``0``, finally if ``end`` is absent, it'll be setted to ``0`` and\nyou'll have a reversed range. Ranges are inclusive.\n\n.. code:: python\n\n    simple_range = \"col_{1:3}\" # -> col_1, col_2, col_3\n    step_range = \"col_{:3:2}\" # -> col_0, col_3\n    inverse_range = \"col_{3:1}\" # -> col_3,col_2,col_1\n    no_end = \"2:\" # -> col_2,col_1,col_0\n\nKey-Value\n^^^^^^^^^\n\nIf you want the column name and argument passed to the ``colbuilder`` to\ndiffer you can use key-values.\n\n.. code:: python\n\n    dataframe.gencol(\"{short=very_long_name}_col\", \"col\", alambda)\n    # In this case the column name will be ``short_col`` but you'll pass\n    # ``very_long_name`` to ``alambda``\n\n    # key-value notation can be extended to more arguments:\n    dataframe.gencol(\"{k1=v1,k2=v2,k3=v3}_col\", \"col\", alambda)\n\n.. WARNING::\n  values are always interpreted as *strings* so in:\n  ``col_{k=1:5}`` the value ``1:5`` is interpreted as ``\"1:5\"`` and not as\n  a range, the same for ``col_{k=[1,2,3]}`` and any other object, also if\n  you pass a number it will also be interpreted as string so you will need\n  to convert it if you intend to use it as an ``int``.\n\nTransforms\n~~~~~~~~~~\n\nUnder the hood when you pass ``newcols`` to ``gencol``, Kodiak builds an\n``OrderedDict`` where it's keys are column names and it's values are\ntuples of ``Match`` objects -even if it's just one Match it's wrapped\ninside a tuple-\n\n.. code:: python\n\n    newcol = \"{first,last}_name\"\n    # will build\n    args_dict = {'first_name': (Match('first'),), 'last_name': (Match('last'),)}\n\n``Transforms`` are a way to pre-process the values and change them\nenriching the ``Match`` object with a payload as you will see in the\n``Default colbuilder`` section.\n\nSo, if the values are ``Match`` objects, how is that when you write your\n``colbuilder`` you deal with ``strings``? Kodiak understands that if the\n``Match`` object doesn't have a payload it's better to pass strings\narguments to ``colbuilder``, this behaviour can be controlled.\n\nWhat's the use of ``Match`` objects and their ``payload``? What're some\nexamples of ``Transforms``? The next section will answer this questions\n\nDefault colbuilder\n~~~~~~~~~~~~~~~~~~\n\nAs you can see in the ``colgen`` signature, ``colbuilder`` default\nargument is ``None``, in special cases Kodiak can infer the\n``colbuilder`` method, let's revisit the opening example.\n\n.. code:: python\n\n    writers.gencol(\"born_{.month,.day,.year}\", \"born\")\n\nThe ``colbuilder`` in this case is inferred from the hint you gave\nKodiak in the template: ``.month``, prefixing ``month`` with a ``.``\nindicates that you're referring to an attribute of ``born``, so\ninternally Kodiak builds a ``colbuider`` that extracts the ``month``\nfrom a ``born`` instance. Another way of omitting the ``colbuilder`` is\nwhen you have an instance method:\n\n.. code:: python\n\n    # Notice the `!` after weekday\n    writers.gencol(\"born_{weekday!}\", \"born\")\n\n.. WARNING::\n  This hint only works for methods with no arguments, passing\n  a method with one or more arguments will raise an error\n\nHow is that Kodiak infers the ``colbuilder``? When the ``newcols`` are\nprocessed they go through a pipeline of ``Transforms``, one of them:\n``PropertyTransform`` detects that ``.month`` refers to an attribute and\nenriches de ``Match`` object hinting in the payload the corresponding\n``colbuilder``, that's why you don't need to pass the ``colbuilder``\nargument. But what happens if you give a ``colbuilder``? In this case,\nas the ``Match`` object has a ``payload`` instead of working with plain\nstrings you will work with tuples of ``Match`` objects\n\n.. Note::\n  Kodiak will raise an exception when it can't figure out a\n  default colbuilder\n\nEnumerations\n~~~~~~~~~~~~\n\nSometimes you care about the position of the arguments not the exact\nvalue, in that case you can use the ``enum`` param or the implicit\n``enum`` with a function or lambda of arity 3, the first argument will\nbe the index starting at 0.\n\n.. code:: python\n\n    writers.gencol(\"{first=0,last=1}_name\", \"name\", lambda x,y: x.split(\" \")[int(y)])\n\n    # Another way with enum=True\n    writers.gencol(\"{first,last}_name\", \"name\", lambda i,x,y: x.split(\" \")[i], enum=True)\n\n    # Without enum=True but with a colbuilder with arity 3\n    writers.gencol(\"{first,last}_name\", \"name\", lambda i,x,y: x.split(\" \")[i])\n\nConfiguration\n-------------\n\nAlmost everything is configurable in Kodiak, you could have a per-method\nconfiguration or system-wide config.\n\nThe ``Config`` object has the following customizable params:\n\nparser\n  Kodiak by default uses the ``ArgsParser`` class to parse ``newcols``\n\nmatch\\_transform\n  data passed to the ``colbuilder`` could be\n  transformed first, by default we use the ``default_transform`` pipeline,\n  you could replace it with an array of ``Transforms`` objects.\n\nnew\\_col\\_combiner\n  in the newcols template if you have\n  ``\"col_{a,b}_{c,d}\"``, this results in the columns: ``\"col_a_c\"`` and\n  ``\"col_b,d\"``, how the different groups ``['a','b']`` and ``['c', 'd']``\n  are combined is controlled with this param, currently we use the ``zip``\n  function, and you could replace it with a function with similar\n  signature.\n\nunpack\n  Boolean Default True, when ``newcols`` is simple, ``foo_{a,b}``\n  instead of ``foo_{.a,b!}`` instead of passing to ``colbuilder``\n  tuples of ``Match`` objects we just pass individual items,\n  ``a``, ``b``, so it's easier to build a ``colbuilder`` without\n  having to unwrap the ``Match`` tuples\n\ncol\\_pair\\_combiner\n  Once you have the arguments that you're going to\n  pass to the ``colbuilder`` you can combine them in different ways, currently\n  we use ``product`` from itertools, ie: the cartesian product between an\n  element, ex: ``event``, and the other n-columns, creating the following\n  tuples:\n\n  .. code:: python\n\n      [('event', 'day') , ('event', 'month'), ('event', 'year')]\n\nYou can replace this method with another with the same signature as ``product``\n\nConfig can be accessed, modified and restored with:\n\n.. code:: python\n\n    >> import config\n    >> from config import cfg\n    >> config.options\n\n    # Global change on config\n\n    >> config.options[\"unpack\"] = False\n    >> config.options[\"col_pair_combiner\"] = zip\n\n    # Restoring one or more fields of the configuration\n    >> config.restore_default_config(\"col_pair_combiner\")\n\n    # Restoring all the options\n    >> config.restore_default_config()\n\n    # With `base_config` or it's alias `cfg` you can create modified versions\n    # of the default config\n\n    >> dataframe.gencol(\"col_{a!,b!}\",\"col\", func, config=cfg(unpack=False))\n\n\n=======\nHistory\n=======\n\n0.1.0 (2017-09-19)\n------------------\n\n* First release on PyPI.",
        "description_content_type": null,
        "docs_url": null,
        "download_url": "",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "https://github.com/alejandrodumas/kodiak",
        "keywords": "kodiak,pandas",
        "license": "MIT license",
        "maintainer": "",
        "maintainer_email": "",
        "name": "kodiak",
        "package_url": "https://pypi.org/project/kodiak/",
        "platform": "",
        "project_url": "https://pypi.org/project/kodiak/",
        "project_urls": {
            "Homepage": "https://github.com/alejandrodumas/kodiak"
        },
        "release_url": "https://pypi.org/project/kodiak/0.1.0/",
        "requires_dist": null,
        "requires_python": "",
        "summary": "A library to enhance your feature engineering workflow",
        "version": "0.1.0"
    },
    "last_serial": 3186039,
    "releases": {
        "0.1.0": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "c7ee2cc0e966cc62d68915d240560150",
                    "sha256": "e1046ea84a3582788b3bf621d0243c338ca07a45dda09e77ee2627a2ed74776a"
                },
                "downloads": -1,
                "filename": "kodiak-0.1.0-py2.py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "c7ee2cc0e966cc62d68915d240560150",
                "packagetype": "bdist_wheel",
                "python_version": "py2.py3",
                "requires_python": null,
                "size": 17863,
                "upload_time": "2017-09-19T17:03:52",
                "url": "https://files.pythonhosted.org/packages/34/ac/6479dc471ecf95a5ff083148be08246ddc679ef0f9ca6cedb90b34d543e2/kodiak-0.1.0-py2.py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "506605bf0ef4ea2c2bc7e7e213e66df2",
                    "sha256": "d66200850e4c84edb5142d907f77828461a4c933eebdfdd4aac4e1135a92889d"
                },
                "downloads": -1,
                "filename": "kodiak-0.1.0.tar.gz",
                "has_sig": false,
                "md5_digest": "506605bf0ef4ea2c2bc7e7e213e66df2",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 22747,
                "upload_time": "2017-09-19T16:56:05",
                "url": "https://files.pythonhosted.org/packages/1c/dc/4a3bf903323df70a2e02c190e839902de66a6b75c3c519240497bfcb9196/kodiak-0.1.0.tar.gz"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "c7ee2cc0e966cc62d68915d240560150",
                "sha256": "e1046ea84a3582788b3bf621d0243c338ca07a45dda09e77ee2627a2ed74776a"
            },
            "downloads": -1,
            "filename": "kodiak-0.1.0-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c7ee2cc0e966cc62d68915d240560150",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 17863,
            "upload_time": "2017-09-19T17:03:52",
            "url": "https://files.pythonhosted.org/packages/34/ac/6479dc471ecf95a5ff083148be08246ddc679ef0f9ca6cedb90b34d543e2/kodiak-0.1.0-py2.py3-none-any.whl"
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "506605bf0ef4ea2c2bc7e7e213e66df2",
                "sha256": "d66200850e4c84edb5142d907f77828461a4c933eebdfdd4aac4e1135a92889d"
            },
            "downloads": -1,
            "filename": "kodiak-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "506605bf0ef4ea2c2bc7e7e213e66df2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 22747,
            "upload_time": "2017-09-19T16:56:05",
            "url": "https://files.pythonhosted.org/packages/1c/dc/4a3bf903323df70a2e02c190e839902de66a6b75c3c519240497bfcb9196/kodiak-0.1.0.tar.gz"
        }
    ]
}