The AAAI-21 Workshop on Knowledge Discovery from Unstructured Data in Financial Services

Bio

Johannes Hoffart is a Senior Research Scientist in the Goldman Sachs R&D AI group, working on natural language understanding and knowledge bases in the financial domain. He received his PhD from the Max Planck Institute for Informatics in the area of artificial intelligence (AI), specifically knowledge base construction, entity linking, and entity discovery. During his PhD he stayed at the Google Research group in Zurich. He is one of the creators of the YAGO knowledge base as well as the AIDA entity linking system, two of the most cited publications in their respective fields. He published numerous papers and articles at renowned conferences and journals in the areas of Artificial Intelligence, Natural Language Processing, and Semantic Web. He won the AI Journal prominent paper award as well as the Web Conference best demo award for works on YAGO.

After his PhD and before joining Goldman Sachs R&D, Johannes Hoffart co-founded Ambiverse and became its CEO, spinning off his research work and developing bleeding edge AI solutions for financial, automotive, media, and tech companies. During his computer science studies he co-founded a software company developing a widely used tag-based file management application for macOS, Punakea.

Keynote: Rich Text, Lean Knowledge Bases: Knowledge Extraction from Financial Documents

Knowledge extraction in the financial domain needs to deal with a large variety of documents. Each of these documents contains critical, often legally binding information. One could almost claim that the financial industry does actually not run on numbers, but on documents! The language in the financial domain comes with extra dimensions that knowledge extraction methods need to cope with (and should benefit from): long documents with hundreds of pages, containing deeply nested structure with many visual cues, dense intra- and inter-document links, and with frequent edits and updates. The first step of extracting knowledge is often the identification of legal entities and concepts, which crucially depends on the type of the document and the kind of knowledge to be extracted and requires highly flexible entity linking methods. The presentation will show initial results of knowledge extraction from finance-related documents, as well as open research questions, highlighting the extra dimensions of financial document types and shortcomings of many current entity linking methods.