{ "info": { "author": "Alex Tremblay", "author_email": "alex.tremblay@utoronto.ca", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Programming Language :: Python" ], "description": "# auto-class\n\n>> This project is in early alpha. Until it is complete, this document should be treated as more of a design document than an actual README\n\n**auto-class** (not to be confused with the excellent but unrelated [python-autoclass] library) is a too that allows you to \nautomatically generate a set of nested dataclasses with built-in (de)serializers from a given YAML manifest ~~or set of api responses (json objects)~~\n\n> Generating dataclasses from a set of objects is not yet implemented, but it's the first thing on the [TODO](#todo) list\n\n[python-autoclass]:https://smarie.github.io/python-autoclass/\n\n## Description\n\nThis tool allows you, for a given data structure, to generate python source code for a set of nested dataclasses which \ncan load and dump (deserialize and serialize) that data structure. \nThe data structure you supply can either be a YAML manifest \ndescribing the nested data you want to work with (manual mode),\n~~or it can be a set of objects (a JSON string array of objects, or a python literal list of dictionaries) representing \nall possible variations of your \ndata structure (auto mode)~~(Not Yet Implemented)\n\nManual mode allows you to define data classes with serializers from a data structure of your choice. Example:\n```yaml\nMyDataClass: \n a_string: ''\n a_number: 0 \n a_bool: False\n a_list:\n - a string\n - 1\n A Key With Spaces: ''\n default_value: sweet!\n optional_field_without_type: \n optional_field_with_type: # t:str Optional\n allow_missing: '' # AllowMissing\n optional_and_allow_missing: # t:str Optional AllowMissing\n a_union: 0 # t:str,int\n sub_class:\n subkey: ''\n another_subkey: ''\n sub_sub_class:\n so_nested: True\n much_wow: True\n a_class_list:\n - a_subkey: 1\n one_more: hello\n```\nBecomes:\n```python\nfrom typing import List, Any, Union, Optional, ClassVar, Type\nfrom dataclasses import field\nfrom marshmallow import Schema\nfrom marshmallow_dataclass import dataclass\n\n@dataclass\nclass MyDataClass:\n a_string: str = field(default_factory=str)\n a_number: int = field(default_factory=int)\n a_bool: bool = field(default_factory=bool)\n a_list: List[str,int] = field(default_factory=list)\n a_key_with_spaces: str = field(default_factory=str, metadata=dict(data_key='A Key With Spaces'))\n default_value: str = field(default='sweet!')\n optional_field_without_type: Any = field(default=None)\n optional_field_with_type: Optional[str] = field(default=None)\n allow_missing: str = field(default_factory=str, metadata=dict(missing=str, default=str, required=False))\n optional_and_allow_missing: Optional[str] = field(default=None, metadata=dict(missing=None, default=None, required=False))\n a_union: Union[str,int] = field(default_factory=str)\n\n @dataclass\n class SubClass:\n subkey: str = field(default_factory=str)\n another_subkey: str = field(default_factory=str)\n\n @dataclass\n class SubSubClass:\n so_nested: bool = field(default_factory=bool)\n much_wow: bool = field(default_factory=bool)\n Schema: ClassVar[Type[Schema]] = Schema\n\n sub_sub_class: SubSubClass = field(default_factory=SubSubClass)\n Schema: ClassVar[Type[Schema]] = Schema\n\n sub_class: SubClass = field(default_factory=SubClass)\n\n @dataclass\n class AClassList:\n a_subkey: int = field(default_factory=int)\n one_more: str = field(default='hello')\n Schema: ClassVar[Type[Schema]] = Schema\n\n a_class_list: List[AClassList] = field(default_factory=list)\n Schema: ClassVar[Type[Schema]] = Schema\n\n# and can be used to transform this:\n\ninput_dict = {\n 'a_string': 'hello',\n 'a_number': 24,\n 'a_bool': True,\n 'a_list': ['hello', 1, 2, 'world'],\n 'A Key With Spaces': 'nice!',\n 'default_value': None,\n 'optional_field_without_type': None,\n 'optional_field_with_type': None,\n # allow_missing key is missing\n # optional_and_allow_missing is also missing\n 'a_union': 0,\n 'sub_class': {\n 'subkey': 'hello',\n 'another_subkey': 'world',\n 'sub_sub_class': {\n 'so_nested': True, \n 'much_wow': True\n }\n },\n 'a_class_list': [\n {'a_subkey': 1, 'one_more': 'hello'}\n ]\n}\n\n# into this:\n\noutput_object = MyDataClass.Schema().load({})\nassert output_object == MyDataClass(\n a_string='hello',\n a_number=24,\n a_bool=True,\n a_list=['hello', 1, 2, 'world'],\n a_key_with_spaces='nice!',\n default_value='sweet!',\n optional_field_without_type=None,\n optional_field_with_type=None,\n allow_missing='',\n optional_and_allow_missing=None,\n a_union=0,\n sub_class=MyDataClass.SubClass(\n subkey='hello', \n another_subkey='world',\n sub_sub_class=MyDataClass.SubClass.SubSubClass(\n so_nested=True,\n much_wow=True\n )\n ),\n a_class_list=[\n MyDataClass.AClassList(a_subkey=1, one_more='hello')\n ]\n)\n\n# And accessing nested data goes from this:\n\ninput_dict['sub_class']['sub_sub_class']['so_nested'] == True\n\n# To this:\n\noutput_object.sub_class.sub_sub_class.so_nested == True\n```\n\nThis gives you type checking capability, IDE autocomplete, easy data validation, and a whole bunch of other features!\n\n\n## Installation\n\n\\# TODO: deploy to PyPI\n```bash\n$ pip3 install auto-class\n```\n\n\n## Usage\n\nThis tool was primarily designed to be a command-line tool, but can also be used directly in python\n\n### Command Line Usage\n\n\\# TODO: flesh this out\n```bash\n$ auto-class generate --from yaml --in-clipboard\n$ auto-class generate --from yaml --in-file file.yaml\n$ auto-class generate --from python --in-clipboard\n$ auto-class generate --from python --in-file file.py # Haven't quite figured out how this one's gonna work\n```\n\n### Python Usage\n\n\\# TODO: flesh this out\n```python\nfrom auto_class import generate\n\ngenerated_source_code = generate.from_yaml('''\nMyDataClass:\n test_field: ''\n Another Test Field: 2\n''')\n```\n\n\n## YAML Manifest Spec\n\nThe YAML manifest is a YAML document with the following rules:\n - All Dataclasses to be generated should be expressed as YAML mappings (aka dictionaries) attached to CapitalCased \n top-level variables, like so:\n ```yaml\n MyDataClass:\n field: blah\n other_field: blah\n MyOtherDataClass:\n field: blah\n other_field: blah\n ```\n All top-level variables starting with lower-case letters as reserved for future use (like global settings to \n modify dataclass generation, for example)\n - All fields in each top-level dataclass declaration must be one of three types: \n - **Scalars** (int, float, str, null, bool)\n - **Sequences** (list, [hash table](#hash-table)) \n - **Objects** (dictionaries)\n - All field values are optional. If a field has a value, the type of that value will be used to define the type of the \n dataclass definition for that field. If the value is \"truthy\", it will be used as the default value for the \n dataclass definition for that field.\n - All fields can contain YAML comments. Those comments can contain any of the following tokens:\n - `Optional`: (only valid for **Scalar** and **Object** fields) Marks the field's type annotation as Optional and sets default value to None, if a default value has not been explicitly set\n\n - `HashTable`: (only valid for **Object** fields) Instructs the generator to create a [hash table](#hash-table) definition from this object, instead of generating a dataclass definition. \n This field will effectively become a special kind of **Sequence** type, and so only tokens which are valid for **Sequence** fields can be used in combination with this token. \n\n - `AllowMissing`: (valid on all field types) Configures the marshmallow serializer for this field to create a default value when this field is missing from an input dict / json payload\n\n - `t:{*type}`: (only valid for **Scalar** fields) Overrides the field's derived type. `*type` is a comma-separated list of one or more types (ex. `t:str` or `t:int,bool`). \n The types must only be separated by commas, and cannot be separated by spaces. \n When used on **Scalar** fields, the types listed in this token will replace the derived type of the field(ex `field_name: '' # t:str,int` will become `field_name: Union[str,int]`). \n Although it's possible to override a derived type with a completely different type (Ex. `field: '' # t:int`), it's ill advised and should be avoided. \n It will make your manifest unintuitive and difficult to read. \n Also, please note, this comment token is incompatible with the `f:{field}` comment token shown below. \n These two tokens cannot be used in the same field of your schema definition.\n\n - `f:{mm_field_name}`: (only valid for **Scalar** fields. [see note](#custom-field-name)) Overrides the Marshmallow [Field] to use for serialization / deserialization of the field. \n Can be an existing Marshmallow field, or a [Custom Field].\n For example: `field_name: '' # f:Url` will generate a field definition of `field_name: str = field(default_factory=str, metadata={'marshmallow_field': Url})`. \n Note: You will be responsible for either defining a custom field called `Url` or importing the `Url` field from the `marshmallow.fields` module into the generated dataclass module.\n\n - `n:{class_name}`: (Only valid for **Object** fields) Override the derived class name of a nested dataclass. \n Generally, the class name is derived from the name of the field that the nested object is attached to (ie: `field_name: {}` becomes `field_name: FieldName = field(default_factory=FieldName)`). \n With this token, you can specify the name of the generated data class (ex: `field_name: {} # n:MyNamedClass` becomes `field_name: MyNamedClass = field(default_factory=MyNamedClass)`)\n\n[Field]:https://marshmallow.readthedocs.io/en/3.0/api_reference.html#module-marshmallow.fields\n[Custom Field]:https://marshmallow.readthedocs.io/en/3.0/custom_fields.html\n\n\n## A Complete Example:\n\n```yaml\n\npreamble: |\n from marshmallow.fields import Email\n\nDataClass:\n# ^^^^^^^ \n# All top-level keys will be used as names for dataclass definitions, \n# and must contain YAML maps (aka dictionaries).\n\n a_field: ''\n# ^^ \n# this value is a string, so the generator will define \n# this as a str type, and tell marshmallow to validate \n# it as a Str() field.\n\n a_default_value: a default value\n# ^^^^^^^^^^^^^^^ \n# this string is not an empty string (''), \n# so it will be treated as a default value\n\n a_number: 0\n# ^ \n# this value is a number, so the generator will define \n# this as an int type, and tell marshmallow to validate\n# it as a Int() field. Since this number is falsy, \n# it will not be treated as a default value\n\n a_list:\n - string # Lists can contain any combination of types and will generate appropriate definitions\n - 2 # Lists should not contain more than one instance of a given type\n\n an_optional_value: \n# ^ a blank value will translate into type Any with default value None\n\n a_union: 0 # t:str,int\n# ^^^^^^^^^\n# There is no way in YAML to define a value that \n# has more than one type, so we introduce metadata \n# into our manifest in the form of a YAML comment token. \n# The 't:' token stands for type, and can be used to override\n# the type information which this library derives from the \n# YAML value itself\n\n\n A Key With Spaces: ''\n# ^^^^^^^^^^^^^^^^^\n# This key is not a valid python attribute name. \n# The dataclass field generated from this key will use a normalized \n# version of this key as its field name, and will configure \n# marshmallow field options to treat this key name as its \n# data key (to serialize from and deserialize to this key name)\n\n optional_field_with_type: '' # Optional\n# ^^^^^^^^\n# This flag will mark the type definition of \n# the generated dataclass field as Optional[]\n# and set its default value to None \n\n implicit_optional: # t:str\n# ^^^^^^^^\n# Any field that has a value of None (ie. a blank field) \n# with an explicit type override (t:str) will be implicitly\n# treated as if it had an `Optional` comment token. \n\n an_optional_union: # t:str,int\n# ^^^^^^^^^^^ \n# As with the `a_union` and `implicit_optional` examples\n# above, this will create a field definition that is\n# Optional[Union[str,int]] and has a default value of None\n\n optional_field_with_default: 'hello' # Optional\n# ^^^^^^^ ^^^^^^^^\n# This field will be marked Optional[] as above, \n# but its default value will not be set to None,\n# because it already has a default value.\n\n allow_missing: '' # AllowMissing\n# ^^^^^^^^^^^^\n# With this flag, the dataclass generator will configure the \n# marshmallow serializer/deserializer(Field) for this field \n# to set the value of this field to an empty string if this\n# field is missing from the input dictionary / json payload\n\n optional_and_allow_missing: '' # Optional AllowMissing\n# ^^^^^^^^ ^^^^^^^^^^^^\n# Same as above, but if the field is missing from \n# the input dictionary / json payload, it will be \n# set to None instead of an empty string (because \n# of the `Optional` comment token)\n\n custom_field: email@address.com # f:Email\n# ^^^^^^^\n# This tells the generator not to use the default marshmallow \n# field for a given type (in this case String for a str) \n# and instead use this specific marshmallow field. \n# Note: If you do this, you will be repsonsible for importing \n# an Email field into the generated dataclass module.\n\n sub_class:\n# ^^^^^^^^^\n# This field is a YAML map (equivalent to a python dictionary). \n# The generator will create a nested dataclass definition named \n# after this key (sub_class) and the type of this field will be \n# set to that data class. Since not all key names are guaranteed\n# to be valid python class names, the generated dataclass name \n# will be normalized (all spaces, dashes, and underscores will be\n# removed, and all letters following spaces, dashes, and underscores \n# will be capitalized. The first letter will also be capitalized).\n# In this case, a nested dataclass will be generated and named `SubClass`, \n# and the generated field definition for this `sub_class` field will\n# be `sub_class: SubClass = field(default_factory=SubClass)`\n subkey: ''\n another_subkey: ''\n sub_sub_class:\n so_nested: True\n much_wow: True\n\n optional_sub_class: # Optional\n subkey: ''\n another_subkey: ''\n\n implicit_hashtable:\n 1: 1 # The keys in this object are not and cannot be made to be valid python dataclass\n 2: 2 # field names, and so it is safe to assume that this particular object is not meant \n 3: 3 # to be a dataclass instance. auto-class will pick up on this and automatically mark\n 4: 4 # this object as a `HashTable`. Instead of generating a dataclass definition for this\n # for this object, auto-class will treat `implicit_hashtable` as an ordinary field with \n # a type of `Dict[int,int]`\n\n explicit_hashtable: # HashTable\n# ^^^^^^^^^\n# An object can also be explicitly marked as a `HashTable`, in \n# which case a dataclass definition will not be created for it.\n# This particular example will be treated as a regular field\n# with a type of `Dict[str,Union[int,str,bool]]`\n a_subkey: 1\n one_more: hello\n and_another: True\n\n a_class_list: # n:NamedClass\n# ^^^^^^^^^^^^\n# This object would have generated a dataclass called `AClassList`, but with\n# the inclusion of the `n:{}` token, the generated dataclass will instead be \n# called `NamedClass` \n - a_subkey: 1\n one_more: hello\n```\nWill generate the following python code:\n```python\nfrom typing import List, Any, Union, Optional, Dict, ClassVar, Type\nfrom dataclasses import field\nfrom marshmallow import Schema\nfrom marshmallow_dataclass import dataclass\n\nfrom marshmallow.fields import Email\n\n@dataclass\nclass DataClass:\n a_field: str = field(default_factory=str)\n a_default_value: str = 'a default value'\n a_number: int = field(default_factory=int)\n a_list: List[str,int] = field(default_factory=list)\n an_optional_value: Any = None\n a_union: Union[str,int] = field(default_factory=str)\n a_key_with_spaces: str = field(default_factory=str, metadata=dict(data_key='A Key With Spaces'))\n optional_field_with_type: Optional[str] = None\n implicit_optional: Optional[str] = None\n an_optional_union: Optional[Union[str,int]] = None\n optional_field_with_default: str = 'hello'\n allow_missing: str = field(default_factory=str, metadata=dict(default=str, missing=str, required=False))\n optional_and_allow_missing: Optional[str] = field(default=None, metadata=dict(default=None, missing=None, required=False))\n custom_field: str = field(default='email@address.com', metadata=dict(marshmallow_field=Email))\n\n @dataclass\n class SubClass:\n subkey: str = field(default_factory=str)\n another_subkey: str = field(default_factory=str)\n\n @dataclass\n class SubSubClass:\n so_nested: bool = True\n much_wow: bool = True\n Schema: ClassVar[Type[Schema]] = Schema\n\n sub_sub_class: SubSubClass = field(default_factory=SubSubClass)\n Schema: ClassVar[Type[Schema]] = Schema\n\n sub_class: SubClass = field(default_factory=SubClass)\n\n @dataclass\n class OptionalSubClass:\n subkey: str = field(default_factory=str)\n another_subkey: str = field(default_factory=str)\n Schema: ClassVar[Type[Schema]] = Schema\n\n optional_sub_class: Optional[OptionalSubClass] = None\n implicit_hashtable: Dict[int,int] = field(default_factory=dict)\n explicit_hashtable: Dict[str,Union[int,str,bool]] = field(default_factory=dict)\n\n @dataclass\n class NamedClass:\n a_subkey: int = 1\n one_more: str = 'hello'\n Schema: ClassVar[Type[Schema]] = Schema\n\n a_class_list: List[NamedClass] = field(default_factory=list)\n Schema: ClassVar[Type[Schema]] = Schema\n```\n\nas a side note, for those interested in the nuts and bolts, what happens behind the scenes is that the input manifest \nis converted into an intermediate representation, which is then used to generate the actual python code for the \ndataclass definitions. The above example produces the following intermediate representation:\n```python\nfrom auto_class.intermediate_representation import DataClass, Member, Type, Sequence, HashTable\n\nDataClass('DataClass', [\n Member('a_field', [Type('str')]),\n Member('a_default_value', [Type('str')], default='a default value'),\n Member('a_number', [Type('int')]),\n Member('a_list', [Sequence('list', types=[\n Type('str'),\n Type('int')\n ])]),\n Member('an_optional_value', [Type('None')]),\n Member('a_union', [\n Type('str'),\n Type('int')\n ]),\n Member('A Key With Spaces', [Type('str')]),\n Member('optional_field_with_type', [Type('str'), Type('None')]),\n Member('implicit_optional', [Type('str'), Type('None')]),\n Member('an_optional_union', [Type('str'), Type('int'), Type('None')]),\n Member('optional_field_with_default', [Type('str')], default='hello'),\n Member('allow_missing', [Type('str')], optional=True),\n Member('optional_and_allow_missing', [Type('str'), Type('None')], optional=True),\n Member('custom_field', [Type('str')], default='email@address.com', custom_field='Email'),\n Member('sub_class', [DataClass('SubClass', [\n Member('subkey', [Type('str')]),\n Member('another_subkey', [Type('str')]),\n Member('sub_sub_class', [DataClass('SubSubClass', [\n Member('so_nested', [Type('bool')], default=True),\n Member('much_wow', [Type('bool')], default=True),\n ])]),\n ])]),\n Member('optional_sub_class', [Type('None'), DataClass('OptionalSubClass', [\n Member('subkey', [Type('str')]),\n Member('another_subkey', [Type('str')]),\n ])]),\n Member('implicit_hashtable', [HashTable(key=Type('int'), values=[Type('int')])]),\n Member('explicit_hashtable', [HashTable(key=Type('str'), values=[\n Type('int'), Type('str'), Type('bool')\n ])]),\n Member('a_class_list', [Sequence('list', [\n DataClass('NamedClass', [\n Member('a_subkey', [Type('int')], default=1),\n Member('one_more', [Type('str')], default='hello')\n ])\n ])])\n])\n```\n\n\n## Limitations\nCurrently, the following limitations apply:\n - Cannot support shared dataclasses:\n See [shared_dataclasses](./docs/implementation_details/shared_dataclasses.md) for more info\n - Cannot support `HashTable`s whose keys are `Union`s of multiple types \n You can have HashTables whose keys are strings, you can have HashTables whose keys are integers, but you can't \n have HashTables whose keys might be strings *or* integers. \n - Cannot support a `Union` of multiple `HashTable`s\n You can have a dataclass member whose type can be a string or a `HashTable` of strings, you can have a dataclass\n member whose type can be a integer or a `HashTable` of integers, but you cannot have a dataclass member whose\n type can be a `HashTable` of strings or a `HashTable` of integers\n\n## TODO\nHere is a list of features that I am either actively implementing in another branch, or hope to add to the library in the future:\n - Generate dataclass definitions from a set of objects, including the following sub-features:\n - Auto-identify fields that have different types in different objects within the set, and make those fields Unions\n - Auto-identify fields that are `null` in some of the objects in the set, and make those fields Optional\n - Auto-identify fields that are missing from some of the objects in the set and add the `AllowMissing` comment token \n to those objects\n - Recursively reduce types from sub-objects (ie if a field in the object set is a nested object, merge all instances \n of the nested sub-objects into a single set of objects and apply all the rules listed above to that set of \n sub-objects, recursively)\n - Generate a YAML manifest of the identified data structure, and present that manifest to the user for tweaking prior \n to generating dataclass definitions\n - Add support for Tuples and Sets (possible by adding a comment token to a list?)\n - Add the ability to define methods in the schema attached to the generated dataclasses (maybe define these methods as \n fields on the dictionary with a special comment token?) (maybe also add token syntax to support decorators, to enable \n [extending marshmallow schemas](https://marshmallow.readthedocs.io/en/stable/extending.html#schemavalidation))\n - Add options to the manifest to change defaults, for example: \n - Instead of all fields being required by default and having an `Optional` comment token, make all fields optional \n and have a `Required` comment token\n - Instead of causing validation errors when input dictionaries have missing keys and having an `AllowMissing` \n comment token to override that, make all fields allow missing keys and have a `DontAllowMissing` comment token\n\n\n## Inspiration\nTime and time again I've found myself following the same basic patterns, and I suspect you probably have too:\n\n### Pattern 1: Organic Data Structures\n\nYou're happily typing along, building some awesome library or tool, and you create a data structure to represent some state or \nimportant part of your project. It starts out as a simple dictionary, whose keys are strings, or numbers, or bools, or \nlists containing strings, or numbers, or bools, and everything is fine.\n\nYour project's complexity grows, and so too does the complexity of your data structure. you start grouping related \nvalues in your dictionary into nested dictionaries, turning those lists of strings into lists of dictionaries to store \nrelated data. Your data structure is a bit tricky to fully keep in your head, you write it down somewhere and start \ngetting in the habit of checking that example every time you access or change a key, but still, mostly, everything is \nfine.\n\nYour project's complexity grows some more. some of your nested dictionaries and lists of dictionaries start to ALSO \ncontain nested dictionaries and lists of dictionaries. You start running into problems:\n - \"Wait, is this subfield a dictionary in a list, or a list in a dictionary?\"\n - \"Wait, how did I spell this key name?\"\n - \"Wait, is this subfield supposed to be a string, or a number?\"\n\nYou get to the point where you think to yourself \"Ok, ok, this isn't working anymore. My data is too large to be an \nundocumented pile of dictionaries and lists. I need to implement structure!\"\n\nSo you start implementing structure. You think to yourself \"Well, I'll just make a set of classes to define my data \nstructure. That's the best-practices way to go!\"... except that class definitions are incredibly verbose and tedious \nto write. Especially if you've got nested data structure. Not only do you need to write out each field name 3-4 times \n(class body, \\__init\\__ signature, \\__init\\__ body x2), you also have to manually convert each field value that's a dict\nor a list of dicts into class instances or lists of class instances. It gets to be a real pain.\n\nSo you think \"OK, I'll just make my classes dataclasses! Then the dataclass library can auto-generate \\__init\\__ \nmethods for me. Perfect!\"... Well, yes and no. True, dataclasses conveniently auto-generate \\__init\\__ methods for you,\nbut those init methods don't handle converting nested dictionaries / lists of dictionaries into nested dataclass \ninstances / lists of dataclass instances. You could achieve what you want by futzing around with dataclass InitVars and \n\\__post_init\\__ methods, but you'll probably find yourself fighting against the very thing you thought would save you \ntime. You'll also find that omitting default values for fields, in order to mark them as required (as, for example, a \nrudimentary form of data validation) will cause you more headaches than it solves.\n\nAlso, if your python dictionary data structure has keys with spaces or dashes in them, turning them into dataclasses \nbecomes an almost instant non-starter\n\nThe excellent marshmallow library has the ability to not only validate any data structure you want, but also has the \nability to seemlessly transform data at the global level and on a per-node level (meaning you can give it a dictionary \nof dictionaries and have it transform the whole thing, or apply per-sub-dictionary transformations).\n\n\"Perfect!\" you say to yourself, \"I can use marshmallow to validate that my data is correct and transform each \nsubdict into a dataclass instance, automatically attached to its parent dictionary / dataclass instance! (and rename \nkeys with spaces / dashes in them into valid python field names)\"... Well, as it turns out, yes! you can! \nMarshmallow and dataclasses fit so incredibly well together. Dataclasses are great if you have a very rigidly defined \nstructure where none of the fields are either missing or `None`, and marshmallow (with it's ability to intelligently \nhandle missing and null keys) is great at filling in that all-fields-should-be-required part of dataclasses \nmentioned above.\n\nThe only problem is that marshmallow can only do all of these wonderful things if you write custom marshmallow schemas \nto represent each of your nested dataclasses, like so:\n\n```python\nfrom marshmallow import Schema, fields, post_load, pre_dump\nfrom dataclasses import dataclass, asdict\n\n@dataclass\nclass NestedData:\n some_data: str\n more: bool\n\nclass NestedDataSchema(Schema):\n some_data = fields.Str(missing='')\n more = fields.Bool(missing=False)\n\n @post_load\n def post(self, data):\n return NestedData(**data)\n\n @pre_dump\n def pre(self, instance):\n return asdict(instance)\n\n@dataclass\nclass ParentData:\n a_field: str\n another_field: int\n extra_data: NestedData\n\nclass ParentDataSchema(Schema):\n a_field = fields.Str(missing='')\n another_field = fields.Int(missing=0)\n\n @post_load\n def post(self, data):\n return ParentData(**data)\n\n @pre_dump\n def pre(self, instance):\n return asdict(instance)\n```\n\nSo for each field in each dictionary / data class, you'd have to write each field name twice, and each field type twice,\nin two different ways! (`str` vs `fields.Str`, etc)\n\nIck!\n\nThat's not fun. That's a lot of typing, a lot of verbosity. That's a lot of opportunities to create annoying bugs by \naccidentally having mismatches or typos between dataclasses and their associated schemas.\n\nSo maybe at this point you're thinking \"Gosh, I wonder if there's a library that can autogenerate marshmallow schemas \nfrom dataclasses, or dataclasses from marshmallow schemas...\" And there is! There are, in fact, several of them. All \nwith different pros and cons. Maybe you dive deep into them all and find that some of them are too cumbersome to use and \nothers involve too much python meta-magic (a python class that replaces itself with an auto-generated dataclass version \nof itself... what?) and decide (as i did) that marshmallow_dataclass strikes the best balance between hassle and magic \nand you think \"Perfect! my quest for a data structure solution is finally complete! I now have something that can define \nan arbitrarily nested set of dataclasses and can optionally support input validation, key name transformation, and a \nwhole bunch of other features! Yes!\"\n\nWell, yes, but the end result is still kind of verbose. To represent a data structure like:\n```python\n{\n 'a_string': 'hello',\n 'a_number': 24,\n 'a_bool': True,\n 'a_list': ['hello', 1, 2, 'world'],\n 'A Key With Spaces': 'nice!',\n 'default_value': None,\n 'default_value_of_none': None,\n 'default_none_with_explicit_type': None,\n 'optional_field': None,\n 'a_union': 0,\n 'sub_class': {\n 'subkey': 'hello',\n 'another_subkey': 'world',\n 'sub_sub_class': {\n 'so_nested': True, \n 'much_wow': True\n }\n },\n 'a_class_list': [\n {'a_subkey': 1, 'one_more': 'hello'}\n ]\n}\n```\nYou would need to write this:\n```python\nfrom typing import List, Any, Union, Optional, ClassVar, Type\nfrom dataclasses import field\nfrom marshmallow import Schema\nfrom marshmallow_dataclass import dataclass\n\n@dataclass\nclass MyDataClass:\n a_string: str = field(default_factory=str)\n a_number: int = field(default_factory=int)\n a_bool: bool = field(default_factory=bool)\n a_list: List[str,int] = field(default_factory=list)\n a_key_with_spaces: str = field(default_factory=str, metadata=dict(data_key='A Key With Spaces'))\n default_value: str = field(default='sweet!')\n default_value_of_none: Any = field(default=None)\n default_none_with_explicit_type: str = field(default=None)\n optional_field: Optional[str] = field(default=None)\n allow_missing: str = field(default_factory=str, metadata=dict(missing=str, default=str, required=False))\n a_union: Union[str,int] = field(default_factory=str)\n\n @dataclass\n class SubClass:\n subkey: str = field(default_factory=str)\n another_subkey: str = field(default_factory=str)\n\n\n @dataclass\n class SubSubClass:\n so_nested: bool = field(default_factory=bool)\n much_wow: bool = field(default_factory=bool)\n Schema: ClassVar[Type[Schema]] = Schema\n\n sub_sub_class: SubSubClass = field(default_factory=SubSubClass)\n Schema: ClassVar[Type[Schema]] = Schema\n\n sub_class: SubClass = field(default_factory=SubClass)\n\n @dataclass\n class AClassList:\n a_subkey: int = field(default_factory=int)\n one_more: str = field(default='hello')\n Schema: ClassVar[Type[Schema]] = Schema\n\n a_class_list: List[AClassList] = field(default_factory=list)\n Schema: ClassVar[Type[Schema]] = Schema\n```\nIt's definitely better, but still verbose. We're no longer repetitively writing out field names again and again, but\nwe're still repetitively writing out types and a lot of boilerplate, which is not very fun.\n\nWouldn't it be great if we could just declare what we want our data structure to look like in as simple a way as \npossible and have the classes and schemas and various minutia taken care of for us?\n\nThat was the driving force behind this project.\n\nThe idea is to have a super simple and minimal YAML manifest, containing all the information needed to generate \nfunctional marshmallow_dataclass code. See the [Manifest Specification](#yaml-manifest-spec) and \n[A Complete Example](#a-complete-example)for details.\n\n### Pattern 2: Undocumented APIs\n\nYou're working on a piece of software that makes use of an api endpoint that returns a JSON object. This JSON object \nis deeply nested, and not very well documented (or documented at all).\n\nYou grab an example api response object from the endpoint, and store it somewhere for reference, so you can look at \nkey names and value types and whatnot.\n\nYou're typing along, working on your library, and you start to run into some of the same problems identified in pattern 1:\n - \"Wait, is this subfield a dictionary in a list, or a list in a dictionary?\"\n - \"Wait, how did I spell this key name?\"\n - \"Wait, is this subfield supposed to be a string, or a number?\"\n\nYou reference your stored api response object to correct these issue, but find it a hassle.\n\nMaybe you also start to run into issues where sometimes the response from the api doesn't match the response object \nyou're using as reference:\n - \"Oh, sometimes when this field is blank it's an empty string, but sometimes its a `null`/`None`...\"\n - \"Huh, sometimes this field is a `float`, and sometimes it's a `str` of a float. Well that's just perfect...\"\n - \"Oh man, sometimes this nested dict is a null value, and I'm getting key errors trying to look up its values...\"\n\nMaybe you also start thinking you need to document this API in the form of a class.\n\nMaybe you start with a dataclass, but then realize that writing out dataclass rules to handle all the variety this \nAPI endpoint throws at you is too much of a PITA (making everything a field with defaults, setting up default \nfactories for values that need mutable defaults, figuring out how to handle nested data), etc.\n\nMaybe you also discover marshmallow as I did and go down the rabbit hole of discovery as I did in Pattern 1 above\n\nThe problem is, you've got this API that you call and get a response from, and every time you think you know the shape \nof that response, the API surprises you. You make a bunch of calls to this API, and get a bunch of responses that all \nhave a given field, and that field is always a string... until it isn't. Until you get a response where that field is \na null value, or just doesn't exist at all.\n\nSo what do you do? You update your dataclass definition / schema. You didn't know that this field was optional, but \nnow you do. You won't be surprised again... Or so you think. Then it happens again, with a different field. Maybe this \ntime it's a nested dictionary that sometimes does and sometimes doesn't exist. Updating your dataclass / schema is \nmore complicated, but still doable. But then it happens again, and again, and you start getting fed up. You wish you \ncould just feed a list of every possible api response object into a tool and have that tool figure out what is and \nisn't optional, which fields are sometimes a string and sometimes an integer, and all that stuff...\n\nWell now you can!\n\n> Actually, you can't. This feature hasn't been implemented yet. Sorry!\n\n\n## Notes\n\n#### Custom Field Name\n`f:{mm_field_name}` is technically valid for all field types except [hash table](#hash-table) and union types, but is \ncurrently only supported for **Scalar** fields\n\n#### Hash Table\nA hash table is a kind of dictionary, like a dataclass, but where a dataclass is a dictionary with a fixed set of keys \nthat are all strings, a hash table is an arbitrarily-sized set of keyed values where both the keys and the values can \nbe any type. Another common name ofr a hash table is a lookup table\n\n#### Union Types\nA quick note about unions: in python a union is a value type that can be one of a number of types. a Union[str,int] is a value that could be either a string or an integer. Marshmallow_dataclass (upon which this library is built) also support serializing and deserializing union types through the `marshmallow_union` library. The way this works is it maps each type in the union to a marshmallow field and tries the value to be serialized / deserialized against each of those marshmallow fields until one succeeds, or they all fail and you get a validation error\n\n#### PyScaffold\nThis project has been set up using PyScaffold 3.1. For details and usage\ninformation on PyScaffold see https://pyscaffold.org/.\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://pyscaffold.org/", "keywords": "", "license": "mit", "maintainer": "", "maintainer_email": "", "name": "auto-class", "package_url": "https://pypi.org/project/auto-class/", "platform": "any", "project_url": "https://pypi.org/project/auto-class/", "project_urls": { "Homepage": "https://pyscaffold.org/" }, "release_url": "https://pypi.org/project/auto-class/0.1.post1/", "requires_dist": [ "ordered-set", "ruamel.yaml", "jinja2", "prompt-toolkit", "pyperclip", "recommonmark ; extra == 'docs'", "sphinx ; extra == 'docs'", "pytest ; extra == 'testing'", "pytest-cov ; extra == 'testing'", "pytest-pycharm ; extra == 'testing'", "marshmallow-dataclass ; extra == 'testing'" ], "requires_python": ">=3.6", "summary": "generate a set of nested data classes (with marshmallow-dataclass powered (de)serializers) for a given YAML manifest", "version": "0.1.post1" }, "last_serial": 5793475, "releases": { "0.1.post1": [ { "comment_text": "", "digests": { "md5": "2121c3a55b837628b776b25eb36a1bd6", "sha256": "29df8de5c1bba6fd06e048f9e35ca4cb954883c5a77a4c9e62eb2000410ccc17" }, "downloads": -1, "filename": "auto_class-0.1.post1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "2121c3a55b837628b776b25eb36a1bd6", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=3.6", "size": 25427, "upload_time": "2019-09-06T18:38:29", "url": "https://files.pythonhosted.org/packages/f0/91/09cef254e39426fb24590ba95e71e78d6544bbcc95cbf859105642cb4229/auto_class-0.1.post1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "023e306acdad4841040a93731a8ac885", "sha256": "50fea0b10f9fe0d7e6d35d9c8198859de66d27c33fd7abd476198e737a71b503" }, "downloads": -1, "filename": "auto-class-0.1.post1.tar.gz", "has_sig": false, "md5_digest": "023e306acdad4841040a93731a8ac885", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 58693, "upload_time": "2019-09-06T18:38:31", "url": "https://files.pythonhosted.org/packages/c2/c1/974014b78fdc93ddeab86bbe1bf8422e0ecacfb1f4b6c20906defdbe68b2/auto-class-0.1.post1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "2121c3a55b837628b776b25eb36a1bd6", "sha256": "29df8de5c1bba6fd06e048f9e35ca4cb954883c5a77a4c9e62eb2000410ccc17" }, "downloads": -1, "filename": "auto_class-0.1.post1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "2121c3a55b837628b776b25eb36a1bd6", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=3.6", "size": 25427, "upload_time": "2019-09-06T18:38:29", "url": "https://files.pythonhosted.org/packages/f0/91/09cef254e39426fb24590ba95e71e78d6544bbcc95cbf859105642cb4229/auto_class-0.1.post1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "023e306acdad4841040a93731a8ac885", "sha256": "50fea0b10f9fe0d7e6d35d9c8198859de66d27c33fd7abd476198e737a71b503" }, "downloads": -1, "filename": "auto-class-0.1.post1.tar.gz", "has_sig": false, "md5_digest": "023e306acdad4841040a93731a8ac885", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 58693, "upload_time": "2019-09-06T18:38:31", "url": "https://files.pythonhosted.org/packages/c2/c1/974014b78fdc93ddeab86bbe1bf8422e0ecacfb1f4b6c20906defdbe68b2/auto-class-0.1.post1.tar.gz" } ] }