POSW-Vote: A precision-oriented weighted voting framework for robust information extraction from domain-specific reports

POSW-Vote: A precision-oriented weighted voting framework for robust information extraction from domain-specific reports

Authors

  • Hoang Van Toan Institute of Information Technology and Electronics, Academy of Military Science and Technology
  • Dang Duc Thinh Institute of Information Technology and Electronics, Academy of Military Science and Technology
  • Phung Nhu Hai Institute of Information Technology and Electronics, Academy of Military Science and Technology

DOI:

https://doi.org/10.54939/1859-1043.j.mst.CSCE9.2025.123-134

Keywords:

Information extraction; Large language models; Ensemble voting; Semantic similarity; Schema-based extraction.

Abstract

Information extraction (IE) from unstructured or semi-structured reports remains a challenging task in specialized domains such as military situation reporting, where textual content is narrative, irregular, and context-dependent. Traditional rule-based or named-entity-recognition (NER) methods often fail to achieve sufficient coverage or adaptability in such settings. In comparison, large language models (LLMs) have shown strong potential for schema-based extraction, their outputs exhibit variability across runs and models, limiting consistency and precision. This paper proposes POSW-Vote (Precision-Oriented Similarity-Weighted Voting) — a semantic voting algorithm designed to consolidate multiple LLM outputs into a single, stable, structured representation. The method jointly employs similarity-based clustering, reliability weighting, and superstring-aware selection to identify the most complete and contextually correct information for each schema-defined field. Extensive experiments on real-world, expert-annotated Vietnamese military reports demonstrate that POSW-Vote consistently improves Precision and F1-score compared to single-run and intra-model baselines, while maintaining robustness across heterogeneous models. The results highlight that the proposed framework enhances the stability and reliability of LLM-based extraction without retraining, offering a scalable, model-agnostic solution for high-stakes domains such as defense intelligence and situational monitoring.

References

[1]. Yang, Y.; Wu, Z.; Yang, Y.; Lian, S.; Guo, F.; Wang, Z., “A Survey of Information Extraction Based on Deep Learning”, Applied Sciences, Vol. 12, No. 19, Article 9691, (2022).

[2]. Chiticariu, L.; Li, Y.; Reiss, F. R., “Rule-Based Information Extraction Is Dead! Long Live Rule-Based Information Extraction Systems!”, In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 827–832, (2013).

[3]. Zhang, Z.; You, W.; Wu, T.; Wang, X.; Li, J.; Zhang, M., “A Survey of Generative Information Extraction”, In Proceedings of the International Conference on Computational Linguistics (COLING), pp. 4840–4870, (2025).

[4]. OpenAI et al., “gpt-oss-120b & gpt-oss-20b Model Card”, arXiv, Article arXiv:2508.10925, (2025).

[5]. Yang, A. et al., “Qwen3 Technical Report”, arXiv, Article arXiv:2505.09388, (2025).

[6]. Lupart, S.; van Dijk, D.; Langezaal, E.; van Dort, I.; Aliannejadi, M., “Investigating LLM Variability in Personalized Conversational Information Retrieval”, arXiv, Article arXiv:2510.03795, (2025).

[7]. Wang, X.; Wei, J.; Schuurmans, D.; Le, Q.; Chi, E. H.; Zhou, D., “Self-Consistency Improves Chain of Thought Reasoning in Language Models”, arXiv, (2022).

[8]. Li, J.; Zhang, Q.; Yu, Y.; Fu, Q.; Ye, D., “More Agents Is All You Need”, arXiv, Article arXiv:2402.05120, (2024).

[9]. Chen, Z. et al., “Harnessing Multiple Large Language Models: A Survey on LLM Ensemble”, arXiv, Article arXiv:2502.18036, (2025).

[10]. Qwen Team, “Qwen3-4B-Thinking-2507”, Hugging Face Model Repository, (2025).

[11]. Qwen Team, “Qwen3-14B”, Hugging Face Model Repository, (2025).

[12]. OpenAI, “gpt-oss-20b”, Hugging Face Model Repository, (2025).

[13]. Ye, A.; Wang, L.; Zhao, L.; Ke, J.; Wang, W.; Liu, Q., “RapidFuzz: Accelerating Fuzzing via Generative Adversarial Networks”, Neurocomputing, Vol. 460, pp. 195–204, (2021).

Downloads

Published

2025-12-31

How to Cite

[1]
Hoang Van Toan, Dang Duc Thinh, and D. H. Phung Nhu, “POSW-Vote: A precision-oriented weighted voting framework for robust information extraction from domain-specific reports”, JMST’s CSCE, no. CSCE9, pp. 123–134, Dec. 2025.

Issue

Section

Articles
Loading...