17/11/2023
The Research unit ATILF (Computer Processing and Analysis of the French Language) offers a postdoctoral position in natural language processing (NLP).
Topic: Discovery of multiword expressions, their meaning and their linguistic properties in texts using large language models
Location: ATILF, Nancy, France
Starting date: from February 2024
Duration: 12 months (possibility to extend the duration for one more year)
Supervisors: Mathieu Constant (Univ. Lorraine, France) and Agata Savary (Univ. Paris-Saclay, France)
Salary: depends on experience after PhD and salary grids, from 3070 (7-year-experience) before tax
Application deadline: 5th December 2023
Subject. The term « multiword expression » refers to a combination of multiple lexical items that displays irregular composition possibly on different linguistic levels (morphology, syntax, semantics, ...). They include a large variety of phenomena such as idioms (run around in circles), support verb constructions (take a walk), nominal compounds (dry run), complex function units (in spite of). They have been the subject of extensive research work in the NLP community over the last 50 years.
The goal of this post-doc position is to investigate new methods for discovering multiword expressions, their meaning and their linguistic properties in texts, in order to enrich an induced semantic lexicon with new multiword entries, definitions, argumental structure, and other properties. The emergence of Large Language Models (LLM) opens new promising perspectives for multiword expressions, not only regarding their semantic compositionality but also their linguistic characterization. The methods will be primarily experimented on French, but other languages are also possible.
Context. The position is part of the SELEXINI project (https://selexini.lis-lab.fr, 2022-2026) funded by the French National Research Agency (ANR). The goal of the SELEXINI project is to develop next-generation lexicon induction methods for natural language processing. The induced lexicons will not only cluster word usages according to their senses, but also contain multiword expressions, argumental structure, generated definitions, etc, combining the power of large pre-trained language models and existing lexical resources to address the lack of interpretability and diversity in current language technology. The hired researcher will be fully integrated in the project team.
Requirements. Applicants should hold a PhD thesis in computer science, in applied mathematics, in natural language processing, or in computational linguistics. Applications from PhD students planning their defense by December 31st, 2023 are also welcome.
The hired post-doc researcher should have the following skills:
expertise in deep learning for NLP and notably large language models
excellent programming skills
good linguistic skills
good knowledge of French would be a plus
team spirit
Application. The applicants should submit a cover letter, a CV including their publications, a list of references for recommendation, a transcript of Master grades, on the following official web site: https://emploi.cnrs.fr/Offres/CDD/UMR7118-SABMAR-017/Default.aspx?Lang=EN. The applications should be submitted not later than December 5.
SELEXINI - SEmantil LEXicon INduction for Interpretability and diversity in text processing. A research project funded by ANR - Agence National de la Recherche. Keywords: word sense induction, lexicon induction, clustering, language models, lexicon, interpretability, evaluation, robustness, multiwor...