





Hi, I am
Pengrui Zeng
Traveler and
Data Scientist
Data is Language
World is Canvas
IwasborninChina,andspent7yearsinAustraliacompletinghighschoolandmyundergraduatestudies,andlaterearnedaMasterdegreeinUnitedStatesOfAmerica.
Ihavecrossed5continentsandflownover250,000kilometers.Intherhythmofdeparturesandarrivals,Ihavecometoseethecharacterofcities,thewarmthofcultures,andgraduallyfoundmyownwayofrelatingtotheworld.Tome,travelismorethanmovement.Itisawayofunderstandinglife,connectingwithothers,andreshapingmyself.




My Journey
「成都•烟火人间」
「阿德莱德•海风旷野」
「布里斯班•黄金海岸」
「匹兹堡•桥城灯火」
「洛杉矶•爱乐之城」
Chengdu
2000.12~2016.08Adelaide
Brisbane
2019.02~2023.12Pittsburgh
Los Angeles






































AIMING FOR THE STARS BEYOND.








Sep 2023 - Dec 2024
Focus: Data Science, Machine Learning, Artificial Intelligence.

Feb 2019 - Dec 2022
Focus: Data Structures and Algorithms, Database Management, Statistics, Mathematics.

Intel
AI Data Scientist
May. 2025 - Present
AI / ML Empowerment
•Capacity Planning Reengineering: Leveraged multi-source data and AI/ML orchestration models to redesign capacity planning workflows, improving planning logic, responsiveness, and scalability across scenarios.
•Strategic Validation: Designed and executed A/B tests to compare alternative planning strategies, measuring uplift via key KPIs (e.g., forecast accuracy, utilization variance, service-level risk).
•Decision Support Systems: Built Power BI dashboards and internal websites to translate model outputs into executive-ready insights, clearer trade-offs, and more confident strategic decisions for executive leadership.
ML-Driven Forecasting
•Multi-Model Framework: Developed a robust forecasting framework integrating SARIMAX, Prophet, and GBR to enable long-term global capacity and demand planning.
•Advanced Segmentation: Applied Gaussian Mixture Models (GMM) for regional segmentation combined with advanced feature engineering.
•Impact: Successfully deployed models through an end-to-end CI/CD ML pipeline with auto-retraining capabilities, boosting forecast accuracy by 20%.
Scalable Data Architecture
•Distributed ETL Systems: Designed and built scalable ETL pipelines using Python, Azure Data Factory, and Databricks.
•Big Data Processing: Leveraged Spark for TB-scale data processing and Airflow for automated workflow orchestration.
•Digital Transformation: drove the standardization and digital transformation of Intel’s global supply chain data infrastructure.

MSA Safety
Data Analyst Intern
May. 2024 - Dec. 2024
Global Procurement Analytics
•Large-Scale Analysis: Utilized Python and SQL to analyze 10M+ rows of global procurement data to drive strategic sourcing decisions.
•Supplier Segmentation: Applied K-Means clustering for supplier segmentation and conducted A/B testing to validate strategies.
•Impact: Achieved a 10% cost reduction and a 15% reduction in procurement cycle time through data-driven optimization.
Machine Learning Optimization
•Model Fine-Tuning: Fine-tuned Random Forest models on large-scale datasets to improve supplier selection accuracy.
•Decision Support: Enhanced sourcing decision quality by integrating predictive analytics into the supplier evaluation process.
BI Architecture and ETL
•Power BI Optimization: Restructured Power BI schemas and optimized complex DAX queries to improve dashboard performance.
•Pipeline Engineering: Developed streamlined ETL pipelines for automated data flow and monitoring, significantly accelerating refresh speeds and operational efficiency.

LVMH
Data Scientist Intern
Feb. 2023 - Jul. 2023
User Segmentation and Marketing Strategy
•High-Precision Targeting: Analyzed 60M+ rows of user data, applying ML models (LR, GBDT, KNN, K-Means) and A/B testing to optimize multi-channel acquisition strategies.
•Engagement Growth: Successfully boosted ad targeting precision and user engagement by 15%.
•Campaign Analytics: Evaluated 9 major ad campaigns using multivariate and logistic regression, increasing operational efficiency by 34% while reducing costs by 15%.
ML-Powered Forecasting (Dior)
•Inventory Optimization: Deployed advanced cloud-based SARIMAX models serving Dior’s operations across China, powering inventory management for 395 stores.
•Accuracy Improvement: Achieved a 10.5% boost in forecasting accuracy during high-volume promotional periods.
Model Optimization and Performance
•Algorithm Tuning: Enhanced model performance via cross-validation (AUC, KS metrics), grid search, and feature engineering.
•Conversion Uplift: Resulted in a 28% increase in CTR (Click-Through Rate) and 8% in CVR (Conversion Rate) through iterative model refinement.

Signify
Data Expert Intern
Dec. 2021 - Sep. 2022
Process Automation
•Efficiency Boost: Created 30+ Python automation scripts to streamline manual supply chain processes.
•Time Savings: Saved 60+ work hours monthly, freeing up resources for strategic tasks.
•Digital Transformation: Advanced the department’s digital transformation through efficient data management practices and automated workflows.
Supply Chain Analytics
•Anomaly Detection: Designed material consumption models and applied data mining on logistics and warehousing data to detect anomalies.
•Inventory Optimization: Addressed inventory imbalances through predictive analysis, streamlining supply chain operations.
•Impact: Raised customer satisfaction by 5% by ensuring smoother logistics and inventory availability.
CMU x Netflix: Large Scale Personalized Movie Recommendations
Aug 2024 – Dec 2024
API and Real-Time Processing
- •High-Throughput API: Implemented a real-time recommendation API using Kafka and Flask, capable of handling over 50M+ user interactions.
- •Stream Processing: Designed the system to process user events in real-time to update recommendations dynamically.
Scalable Database Architecture
- •Big Data Storage: Engineered a robust MySQL database storing 30M+ records (~112GB).
- •Performance Optimization: Applied multithreading for efficient batch processing and log parsing.
- •Database Tuning: Optimized storage performance through advanced partitioning and indexing techniques.
MLOps and CI/CD
- •Automated Pipelines: Integrated Jenkins and GitHub Actions to establish a robust CI/CD pipeline.
- •Monitoring: Deployed Prometheus and Grafana to monitor system performance and detect data drift in real-time.
CMU x TCS: Multimodal Emotion Understanding and Calibration with LLMs
Jan 2024 – May 2024
Transformer-Based NLP
- •Semantic Understanding: Built an advanced emotion classification system designed for deep semantic understanding and multi-class text classification.
- •Benchmarking: Benchmarked performance against traditional models like Word2Vec and TF-IDF, ensuring superior accuracy with modern architecture.
LLM Fine-Tuning
- •Optimization: Optimized DistilBERT with advanced training strategies to balance performance and computational efficiency.
- •Model Comparison: Conducted rigorous performance comparisons with BERT models to validate the efficacy of fine-tuning strategies.
MLOps Deployment
- •Inference API: Deployed a lightweight, high-performance inference API using Flask and RESTful services.
- •Scalability: Designed the deployment architecture to handle real-time requests efficiently.
CMU x General Electric: Recognition and Fault Detection System Project
Sep 2023 – Dec 2023
Deep Learning and Computer Vision
- •Real-Time Inference: Utilized YOLOv5 with multi-scale training techniques to achieve robust object detection.
- •Performance Tuning: Accelerated inference speed to 45ms/frame using TensorRT, enabling real-time fault detection capabilities.
System Integration and Edge AI
- •Streaming Architecture: Configured an optimized model pipeline integrated with Kafka-based streaming for continuous data processing.
- •Impact: Implemented edge validation mechanisms that successfully reduced downtime by 10%, enhancing operational reliability.
BUSINESS AND COLLABORATION

