Jose Reinaldo
Download
Source
Ground truth provided by TCDFT (Tribunal de Contas do Distrito Federal e Territórios)
Preprocessing done by:
- José Reinaldo da C. S. A. V. da S. Neto
- Leonardo Maffei
Dataset Information
- Dataset with retirement related named entity annotations in CoNLL format.
- Train-validation-test split was done in chronological order of DODFs, so as to not split entities of the same DODF document into different sets.
size:
- train set: 3860 sentences
- validation set: 828 sentences
- test set: 827 sentences
Named Entity Information
The number of named entities per class in each set (train/valid/test) is presented below:
| Entity types |
Train set |
Valid set |
Test set |
| ATO |
3859 |
828 |
827 |
| NOME_ATO |
3860 |
828 |
827 |
| COD_MATRICULA_ATO |
3852 |
827 |
827 |
| CARGO |
3850 |
828 |
827 |
| CLASSE |
1911 |
493 |
342 |
| PADRAO |
3471 |
739 |
749 |
| FUND_LEGAL |
3860 |
827 |
827 |
| PROCESSO |
3575 |
824 |
814 |
| QUADRO |
3535 |
747 |
753 |
| EMPRESA_ATO |
1716 |
473 |
315 |
Relevant Papers
Papers That Cite This Dataset