Improving Image Representations of Words through Region-specific Loss MinimisationHow can we make text in generated images seem more real?