Important Dates

* All deadlines are calculated at 11:59 pm
UTC-12 hours

Trial Data Ready Jul 15 (Fri), 2022
Training Data Ready Sep 30 (Fri), 2022
Evaluation Start Jan 10 (Tue), 2023
Evaluation End Jan 31 (Tue), 2023
System Description Paper Submission Due Feb 1 (Wed), 2023
Notification to Authors Mar 1 (Wed), 2023
Camera-ready Due Apr 1 (Sat), 2023
Workshop TBD

1. How to Participate

2. Training Data Format

Click here to download a small set of trial data in English.

We will follow the CoNLL format for the datasets. Here is an example data sample from the trial data.


In a data file, samples are separated by blank lines. Each data instance is tokenized and each line contains a single token with the associated label in the last (4th) column. Second and third columns (_) are ignored. Entities are labeled using the BIO scheme. That means, a token tagged as O is not part of an entity, B-X means the token is the first token of an X entity, I-X means the token is in the boundary (but not the first token) of an X type entity having multiple tokens. In the given example, the input text is:

the original ferrari daytona replica driven by don johnson in miami vice

The following image shows the entities as annotated. .

Here are some examples from the other languages.

6. Some Resources for the Beginners in NLP