{ "info": { "author": "Remi Cadene", "author_email": "remi.cadene@icloud.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3.7", "Topic :: Software Development :: Build Tools" ], "description": "# MUREL: Multimodal Relational Reasoning for Visual Question Answering\n\nThe **MuRel network** is a Machine Learning model learned end-to-end to answer questions about images. First, it extracts a graphical representation of the scene where each node is an object or region. Secondly, it fuses the question representation multiple times with a MuRel cell to progressively refines visual and question interactions. Finally, it answers the question via an implicit attention mechanism and a bilinear model. Interestingly, the MuRel network doesn't include an explicit attention mechanism, usually at the core of state-of-the-art models. Its rich vectorial representation of the scene can even be leveraged to visualize the reasoning process at each step.\n\n
\n
\n
\n
\n