Welcome to CSITY 2025

11th International Conference on Computer Science, Engineering and Information Technology (CSITY 2025)

October 25 ~ 26, 2025, Vienna, Austria



Accepted Papers
Detecting Hate Speech Against People with Disabilities in Social Media Comments Using Rag-enhanced Llms, Fine-tuning, and Prompt Engineering

Davide AVESANI, Ammar KHEIRBEK, Isep - Institut Sup´erieur d’Electronique de Paris ´10 rue de Vanves, 92130 Issy-les-Moulineaux, France

ABSTRACT

Social media is now deeply integrated into people’s daily life, enabling rapid information exchange and global connectivity. Unfortunately, harmful content can be easily disseminated among all communities, including hate speech and biases against vulnerable groups such as people with disabilities. While social media platforms employ a mix of automated systems and skillful experts for content moderation, significant challenges remain in detecting nuanced hate speech, particularly when expressed through indirect or coded language. This paper proposes a novel approach to address these challenges through HEROL (Hate-speech Evaluation via RAG and Optimized LLM), a unified model that combines RAG-Enhanced Large Language Models with Prompt Engineering and Fine-Tuning. Experimental results, obtained through a structured evaluation methodology using annotated social media datasets, demonstrated that HEROL achieved an accuracy improvement by up to 10% compared to baseline models. This highlights its effectiveness in identifying subtle and indirect forms of hate speech and its potential to contribute to safer, more inclusive online environments.

Keywords

Social Media – Hate Speech Detection – Disability – Natural Language Processing – Large Language Models – Prompt Engineering – Fine-Tuning – Retrieval-Augmented Generation – Knowledge Graph


Enterprise Large Language Model Evaluation Benchmark

Liya Wang, David Yi,Damien Jose,John Passarelli, James Gao, Jordan Leventis, and Kang Li, Atlassian, USA

ABSTRACT

Large Language Models (LLMs) enhance productivity through AI tools, yet existing benchmarks like Multitask Language Understanding (MMLU) inadequately assess enterprise-specific task complexities. We propose a 14-task framework grounded in Bloom’s Taxonomy to holistically evaluate LLM capabilities in enterprise contexts. To address challenges of noisy data and costly annotation, we develop a scalable pipeline combining LLM-as-a-Labeler, LLM-as-a-Judge, and corrective retrieval-augmented generation (CRAG), curating a robust 9,700-sample benchmark. Evaluation of six leading models shows open-source contenders like DeepSeek R1 rival proprietary models in reasoning tasks but lag in judgment-based scenarios, likely due to overthinking. Our benchmark reveals critical enterprise performance gaps and offers actionable insights for model optimization. This work provides enterprises a blueprint for tailored evaluations and advances practical LLM deployment.

Keywords

Large Language Models (LLMs), Evaluation Benchmark, Bloom’s Taxonomy, LLM-as-a-Labeler, LLM-as-a-Judge, corrective retrieval-augmented generation (CRAG).


Evaluation of Bagging Predictors with Kernel Density Estimation and Bagging Score

Philipp Seitz, Jan Schmitt, and Andreas Schiffler, Institute of Digital Engineering, Technical University of Applied Sciences W¨urzburg- Schweinfurt, Germany

ABSTRACT

For a larger set of predictions of several differently trained machine learning models, known as bagging predictors, the mean of all predictions is taken by default. Nevertheless, this proceeding can deviate from the actual ground truth in certain parameter regions. A method is presented to determine a representative value ˜yBS from such a set of predictions and to evaluate it by an associated quality criterion βBS, called Bagging Score (BS), using nonlinear regression with Neural Networks (NN). The BS reflects the confidence of the obtained ensemble prediction and also allows the construction of a prediction estimation function δ(β) for specifying deviations that are more precise than using the variance of the bagged predictors themselves.

Keywords

Machine Learning, Neural Network, Bagging Predictors, Bagging Score, Nonlinear Regression, Deviation Estimation.