LLM-Guided Cross-Platform Optimization of Cloud Analytics Workloads

Srihari Babu  Godleti

doi:10.22399/ijcesen.5181

Authors

Srihari Babu Godleti

DOI:

https://doi.org/10.22399/ijcesen.5181

Keywords:

Amazon EMR, Apache Spark, Large Language Model, Kubernetes, Snowflake

Abstract

Large-scale data analytics in the cloud inevitably involves trade-offs among latency, throughput, scalability, elasticity, and cost. Today’s platforms model these trade-offs in very different ways-Amazon EMR builds on managed Hadoop ecosystems, Spark on Kubernetes container-native distributed execution, and Snowflake offers a fully managed data warehousing model. Although prior benchmarks-often based on TPC-DS, TPC-H, or microbenchmarks-have studied these systems, they are typically evaluated in isolation and rely on static configurations, manual tuning, or simplified cost assumptions. As a result, it remains unclear how these platforms compare under realistic, evolving cloud workloads, or how their performance and cost can be jointly optimized in dynamic environments. To bridge this gap, we introduce LLM-TradeOpt, a Large Language Model (LLM)–guided optimization framework that adaptively reasons about workload characteristics, system configurations, and execution traces across heterogeneous analytics platforms. Using CloudSuite v4.0 analytics workloads, our evaluation shows that LLM-TradeOpt consistently improves performance and efficiency, achieving up to 18.7% lower latency, 22.4% higher throughput, and 15.3% cost savings compared to strong baselines on Amazon EMR, Apache Spark on Kubernetes, and Snowflake.

References

[1]LeisViktor and KuschewskiMaximilian, “Towards cost-optimal query processing in the cloud,” Proceedings of the VLDB Endowment, vol. 14, no. 9, pp. 1606–1612, May 2021, doi: https://doi.org/10.14778/3461535.3461549.

[2] H. Zhang, Y. Liu, and J. Yan, “Cost-Intelligent Data Analytics in the Cloud,” arXiv.org, 2023. https://arxiv.org/abs/2308.09569 (accessed Jan. 01, 2026).

[3] V. Vyas et al., “Managed Resource Scaling in Amazon EMR,” Companion of the 2025 International Conference on Management of Data, pp. 662–674, Jun. 2025, doi: https://doi.org/10.1145/3722212.3724443.

[4] Zhu, Changpeng, Bo Han, and Yinliang Zhao. "A comparative performance study of spark on kubernetes." Journal of Supercomputing 78, no. 11 (2022).

[5] J. V. Szlang et al., “Workload Insights from the Snowflake Data Cloud: What Do Production Analytic Queries Really Look Like?,” Proceedings of the VLDB Endowment, vol. 18, no. 12, pp. 5126–5138, Aug. 2025, doi: https://doi.org/10.14778/3750601.3750632.

[6] J. Oliveira e Sá, R. Gonçalves, and C. Kaldeich, “Benchmark of Market Cloud Data Warehouse Technologies,” Procedia Computer Science, vol. 239, pp. 1212–1219, 2024, doi: https://doi.org/10.1016/j.procs.2024.06.289.

[7] S. Henning, A. Vogel, M. Leichtfried, O. Ertl, and R. Rabiser, “ShuffleBench: A Benchmark for Large-Scale Data Shuffling Operations with Distributed Stream Processing Frameworks,” arXiv (Cornell University), Mar. 2024, doi: https://doi.org/10.1145/3629526.3645036.

[8] G. Cheng, S. Ying, and B. Wang, “Tuning configuration of apache spark on public clouds by combining multi-objective optimization and performance prediction model,” Journal of Systems and Software, vol. 180, p. 111028, Oct. 2021, doi: https://doi.org/10.1016/j.jss.2021.111028.

[9] X. Huang, H. Zhang, and X. Zhai, “A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization,” Sensors, vol. 22, no. 15, p. 5930, Aug. 2022, doi: https://doi.org/10.3390/s22155930.

[10] R. Tardío, A. Maté, and J. Trujillo, “Beyond TPC-DS, a benchmark for Big Data OLAP systems (BDOLAP-Bench),” Future Generation Computer Systems, vol. 132, pp. 136–151, Feb. 2022, doi: https://doi.org/10.1016/j.future.2022.02.015.

[11] Ferdman, Michael. Cloudsuite: A benchmark suite for cloud services. 2022.

[12] V. Leis, P. Boncz, A. Kemper, and T. Neumann, “Morsel-driven parallelism,” International Conference on Management of Data, Jun. 2014, doi: https://doi.org/10.1145/2588555.2610507.

[13] R. Moussa, “TPC-H Benchmark Analytics Scenarios and Performances on Hadoop Data Clouds,” Communications in Computer and Information Science, pp. 220–234, 2012, doi: https://doi.org/10.1007/978-3-642-30507-8_20.

[14] M. Armbrust, A. Ghodsi, R. Xin, and M. Zaharia, “Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics,” 2021. Available: https://15721.courses.cs.cmu.edu/spring2023/papers/02-modern/armbrust-cidr21.pdf

[15] M. Armbrust et al., “Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores,” doi: https://doi.org/10.14778/3415478.3415560.

[16] V. Govindarajan, P. Patel, S. Tripathi, M. A. Hoque, and G. S. Kashyap, “MAGIC-Enhanced Keyword Prompting for Zero-Shot Audio Captioning with CLIP Models,” arXiv.org, 2025. https://arxiv.org/abs/2509.12591 (accessed Jan. 01, 2026).

[17] G. Quattrocchi, E. Incerto, R. Pinciroli, C. Trubiani, and L. Baresi, “Autoscaling Solutions for Cloud Applications Under Dynamic Workloads,” IEEE Transactions on Services Computing, vol. 17, no. 3, pp. 804–820, May 2024, doi: https://doi.org/10.1109/tsc.2024.3354062.

[18] S. S. Kolawole, G. S. Kashyap, O. E. Kolawole, and M. Yu, “The Future of Fall Prevention: Integrating OpenPose with Cutting-Edge ML Models,” EAI Endorsed Transactions on Pervasive Health and Technology, vol. 11, Apr. 2025, doi: https://doi.org/10.4108/eetpht.11.9013.

[19] S. K. Mondal et al., “Toward Optimal Load Prediction and Customizable Autoscaling Scheme for Kubernetes,” Mathematics, vol. 11, no. 12, p. 2675, Jun. 2023, doi: https://doi.org/10.3390/math11122675.

[20] D. R. Augustyn, Ł. Wyciślik, and M. Sojka, “Tuning a Kubernetes Horizontal Pod Autoscaler for Meeting Performance and Load Demands in Cloud Deployments,” Applied Sciences, vol. 14, no. 2, p. 646, Jan. 2024, doi: https://doi.org/10.3390/app14020646.

[21] S. Tripathi, Nafis, Md Tabrez, I. Hussain, and J. Gao, “The Confidence Paradox: Can LLM Know When It’s Wrong,” arXiv.org, 2025. https://arxiv.org/abs/2506.23464 (accessed Jan. 01, 2026).

[22] Y. Ji et al., “Adapting Large Language Models to Log Analysis with Interpretable Domain Knowledge,” Proceedings of the 34th ACM International Conference on Information and Knowledge Management, pp. 1135–1144, Nov. 2025, doi: https://doi.org/10.1145/3746252.3761189.

[23] A. Soni et al., “Can We Predict Your Next Move Without Breaking Your Privacy?,” arXiv.org, 2025. https://arxiv.org/abs/2507.08843 (accessed Jan. 01, 2026).

[24] T. Cui et al., “LogEval: A Comprehensive Benchmark Suite for Large Language Models In Log Analysis,” arXiv.org, 2024. https://arxiv.org/abs/2407.01896 (accessed Jan. 01, 2026).

[25] P. Tang, S. Tang, H. Pu, Z. Miao, and Z. Wang, “MicroRCA-Agent: Microservice Root Cause Analysis Method Based on Large Language Model Agents,” arXiv.org, 2025. https://arxiv.org/abs/2509.15635 (accessed Jan. 01, 2026).

LLM-Guided Cross-Platform Optimization of Cloud Analytics Workloads

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Information

Keywords

Announcements

Current Issue