Open position at JetBrains

Paid Internship - Duplicates detection

Work schedule
Internship
Address
Na Hřebenech II 785/9, 147 00 Praha 4-Podolí, Česko

The project team is researching various approaches to detecting explicit and potential duplicates in text written in the IDE in natural languages (English to start with). 

Duplicates detection

Our IntelliJ plugin helps streamline the process of writing technical documentation for a software application inside the IDE. It supports the concept of ‘single source’, which means that a chunk of content can be written once and re-used in multiple help articles or documentation outputs by including it by ID.

The proposed inspection should be able to detect identical pieces of content, or, more importantly, non-explicit duplicates so that they can be extracted to a library and reused. Such inspection should help:

  • maintain consistency throughout sources
  • avoid making multiple updates when an UI changes
  • reduce the review and editing effort
  • reduce localization costs

Comparing each chunk of text with all other chunks and suggesting duplicates based on the percentage of matches is not a task that can be run in the IDE at runtime on a large code base, so we expect you to research, try and test different approaches that may include ML, Elasticsearch, trigram search, the Apache Lucene engine, and whatever other approaches you can apply.

Prerequisities: 

  • Java/Kotlin knowledge
  • Basic knowledge of natural language processing
  • English (pre-intermediate and above)

Your task: 

  • Create an inspection that can be run in the IDE or in an external web interface in the headless mode and provide data on the potential duplicates
  • An intention action in the IDE that would suggest extracting such duplicates to reusable chunks
  • An inspection that would analyze duplicates in the background and suggest replacing content with an existing chunk as you type (ideally)


        
          Razmik Seysyan
        

        
          –
        

        
          Senior Software Developer
Razmik Seysyan
Senior Software Developer

It is like your own home away from home. You can come to the gym to work out, you will meet friends, or you can bring your children, who love it here, with you.

Razmik Seysyan, Senior Software Engineer

Share opportunity

FacebookLinkedInE-mail