{rfName}
Au

Indexed in

Altmetrics

Analysis of institutional authors

Martínez Plumed, FernandoCorresponding AuthorFerri Ramírez, CésarAuthor

Share

March 11, 2025
Publications
>
Proceedings Paper
No

Automatic PDF Document Classification with Machine Learning

Publicated to: Lecture Notes in Computer Science. 15346 447-459 - 2025-01-01 15346(), DOI: 10.1007/978-3-031-77731-8_40

Authors:

Luna, SL; Garigliotti, D; Plumed, FM; Ramírez, CF
[+]

Affiliations

Univ Bergen, Bergen, Norway - Author
Univ Politecn Valencia, Valencia, Spain - Author

Abstract

Universitat Polit`ecnica de Val`encia (UPV) faces challenges in managing its Alfresco document repository, which contains 600,000 PDF files, of which only 100,000 are correctly categorised. Manual classification is laborious and error-prone, hindering information retrieval and advanced search capabilities. This project presents an automated pipeline that integrates optical character recognition (OCR) and machine learning to efficiently classify documents. Our approach distinguishes between scanned and digital documents, accurately extracts text and categorises it into 51 predefined categories using models such as BERT and RF. By improving document organisation and accessibility, this work optimises UPV's document management and paves the way for advanced search technologies and real-time classification systems.
[+]

Keywords

Adversarial machine learningAlfresco repositorAlfresco repositoryContrastive learningDocument classificationDocument repositoriesError pronesFederated learningMachine learningMachine-learningManual classificationOcrOptical character recognitionOptical-Pdf documentPdf filesSearch capabilities

Quality index

Bibliometric impact. Analysis of the contribution and dissemination channel

The work has been published in the journal Lecture Notes in Computer Science due to its progression and the good impact it has achieved in recent years, according to the agency WoS (JCR), it has become a reference in its field. In the year of publication of the work, 2025, it was in position 70/78, thus managing to position itself as a Q1 (Primer Cuartil), in the category Computer Science, Artificial Intelligence.

[+]

Impact and social visibility

From the perspective of influence or social adoption, and based on metrics associated with mentions and interactions provided by agencies specializing in calculating the so-called "Alternative or Social Metrics," we can highlight as of 2026-03-24:

  • The use, from an academic perspective evidenced by the Altmetric agency indicator referring to aggregations made by the personal bibliographic manager Mendeley, gives us a total of: 9.
  • The use of this contribution in bookmarks, code forks, additions to favorite lists for recurrent reading, as well as general views, indicates that someone is using the publication as a basis for their current work. This may be a notable indicator of future more formal and academic citations. This claim is supported by the result of the "Capture" indicator, which yields a total of: 9 (PlumX).

With a more dissemination-oriented intent and targeting more general audiences, we can observe other more global scores such as:

  • The Total Score from Altmetric: 3.
[+]

Leadership analysis of institutional authors

This work has been carried out with international collaboration, specifically with researchers from: Norway.

There is a significant leadership presence as some of the institution’s authors appear as the first or last signer, detailed as follows: First Author (Llacer Luna, Socrates) and Last Author (Ferri Ramírez, César).

the author responsible for correspondence tasks has been MARTÍNEZ PLUMED, FERNANDO.

[+]

Awards linked to the item

This work was funded by the Norwegian Research Council grant 329745 Machine Teaching for Explainable AI, CIPROM/2022/6 (FASSLOW) funded by Generalitat Valenciana, the EC H2020-EU grant agreement No. 952215 (TAILOR), and Spanish grant PID2021-122830OB-C42 (SFERA) funded by MCIN/AEI/10.13039/501100011033 and "ERDF A way of making Europe". Authors thank the Catedra de Inteligencia Artificial aplicada a la Administracion Pblica of Universitat Politecnica de Valencia (UPV).
[+]