Controlling the uncontrollable: Strategies to process User-Generated Content in a Multilingual Context

Johann Roturier - (Symantec Research Labs)
Digital Security

Date: -
Location: Eurecom

This presentation focuses on some of the strategies that can be used to process User-Generated Content in a Multilingual Context. Specifically, it investigates the usefulness of automatic machine translation metrics when analyzing the impact of source reformulations on the quality of machine-translated user generated content. A novel framework to quickly identify rewriting rules which improve or degrade the quality of MT output is presented, by trying to rely on automatic metrics rather than human judgments. We find that this approach allows us to quickly identify overlapping rules between two language pairs (English-French and English-German) and specific cases where the rules? precision could be improved. A new approach, based on lattice inputs, is also briefly described to try and address some of the shortcomings of the first approach.