CNSV Member
IEEE Senior Member
Agentic LLM, C/C++, Python, Perl, Jenkins, Docker, git, Device Drivers, CUDA/OpenCL, HPC, x86/ARM, Compute/Memory/Storage, AWS/GCP/Azure, SoC Debugging, CPU/GPU/NPU/ML Performance Benchmarking,JIRA
San Jose, CA 95117
USA
SACHIN KESWANI
858-699-8271 skeswani2005@gmail.com San Jose
Summary:
Experienced System/Software engineer with expertise in AI Agents, LLM integration, software development, performance optimization, and AI/ML workloads. Proficient in CPU/GPU/NPU/Storage, embedded systems, consumer electronics, mobile apps, and wireless networks. Skilled in benchmarking, microarchitecture analysis, and optimization of software systems, with a strong background in various programming languages and technologies.
Technical and Business Skills
· Agentic LLM, C/C++, Python, Perl, Jenkins, Docker, git, Device Drivers, CUDA/OpenCL, HPC, x86/ARM, Compute/Memory/Storage, AWS/GCP/Azure, SoC Debugging, macOS/Android/iOS/Windows, CPU/GPU/NPU/ML Performance Benchmarking
· Excellent communication/project/product management, customer engagement/leadership skills, JIRA, Agile
Professional Experience:
Stealth Startup, USA (2025 – Present) Consulting (Part-Time)
- Developing an Agentic AI system specializing in UVM, Verilog, and System Verilog to accelerate hotspot detection, issue resolution, and reduce verification turnaround time for developers. Integrated LLM-based models into EDA workflows, enhancing design verification and code coverage analysis with AI-driven insights. [Python, Agentic AI, GPT4/llama2 LLM Models]
- Experience working with OpenTitan, integrating AI-driven verification techniques into semiconductor design flows.
- Expertise in Synopsys VCS and Cadence Xcelium simulators, improve code coverage metrics, and formal verification strategies.
Intel, USA (2021 – 2024) System and Software Performance Engineer
- Lead the Intel Evo Program customer engineering effort for AI PC across Asia/Europe/US markets. [Python]
- Drove GenAI workload optimization for NPU for CoPilot and other local LLM use cases of AI PC specification.
- Hands-on coding and debugging for performance optimization and issue resolution.
- Drove increase in 15% revenue while simultaneously increasing market share by 7% across 3 geographies.
- Led debugging efforts on CPU/GPU/NPU benchmarking and AI inference performance tuning specifically for matrix multiplication, libBLAS(Linear Algebra libraries),FP16/32 floating point operations
- Proficient in Python for scripting, automation, and performance analysis tasks.
- Defined UX metrics like responsiveness of apps including latency which has become a key KPI for quantifying UX, helped achieve increase in responsiveness by 12%, debugged performance/battery life optimizations.
Samsung, USA (2020 – 2021) Staff Performance Analysis Engineer
- Lead the effort for data center workloads characterization, analysis, performance tuning and benchmarking effort
- AI/DL/ML Benchmarks Analysis, TensorFlow, Pytorch, Storage technologies (NVMe, NVMeOF, RDMA, GPUDirect)
- GPU configuration using XLA optimization [Python, C++]
- Benchmarks: MLPerf/Image Classification/ResNet-50: Vision, Object Detection, NLP, Autonomous Driving, Commerce, Recommendation Engine
- AI/DL/ML Benchmarks Analysis, TensorFlow, Pytorch, Storage technologies (NVMe, NVMeOF, RDMA, GPUDirect)
- Cloud Gaming:
- Defined, implemented, collected and analyzed User Experience (UX) Performance Metrics
- Decompression of Game Assets: Study of components; DLSS2 Vs DLSS3; nVidia Cloud gaming features
- PCIe Gen3/Gen4/Gen5 Analysis for system performance for CPU, Memory and I/O utilization. [Python, matplotlib, numpy, pandas]
- Extensive performance analysis for Intel vs AMD systems and impact on storage performance for PCIe Gen3/Gen4/Gen5 support using fio/iostat/mpstat/sar performance metrics.
AMD, USA (2018 – 2020) Member of Technical Staff
- Implementation of Crypto/AES-256-bit implementation for AMD processors. [C++]
- Implemented processor group support for Windows in various open-source projects: LuxCore Render, POV-Ray, VeraCrypt [C++]
- CPU/Power/Performance analysis and optimization for HPC/enterprise software via benchmarks.
- SPEC Benchmarks for various HPC workloads. Worked and improved Zen architecture for Ryzen/EPYC processors.
- Profiling and optimization using profiling tools Intel vTune, AMD uProf for microarchitecture analysis, SIMD/Vector programming using Intel intrinsics (AVX256/AVX512) [Python, C++]
- Analyzed floating-point operationsimpact on deep learning models and AI acceleration.
- Fixed bugs in octave in SPEC CPU benchmarks for AMD platforms. For example, found an invalid instruction set and replaced the same using corresponding assembly instruction with assembly tools IDA Pro/Hex Editors.
Intel, USA (2010 – 2018) Senior Software Engineer
- Lead next-gen macOS workstation/MacBook product bring-up on Intel chipsets.
- Board/Platform Bring Up, Firmware, Power and Performance Management, Continuous Integration and Improvement [CPU/GPU, MacOS, Jenkins,CI/CD Pipelines]
- Ported software workarounds and integrating/merging to newer platforms and newer OS.[C,MacOS]
- Debugged OpenCL/Metal driver on MacOS. Ported across generations. Evaluated ML workloads performance.
- Wrote programming tools to validate various system level APIs in Intel stack (C, Linux) for Video Pipeline.
- Designed and developed test cases for Intel stack’s API for video pipeline and customer escapes.[C,Linux]
- Brought up Intel boards with the new firmware/software stack to enable the rest of the team.
- Applied ML algorithms to predict bugs/defects in upcoming platforms.
- Automated GoogleTV for validation, regression with different internal releases for audio-video usecases/test cases and debugged and resolved HDMI related issues. (Python, adb)
- Area(s) worked on: Graphics, MacOS, Cable Segment, Android Performance/Application development, video and audio, TV technologies, HDMI, Multimedia framework.
Qualcomm, USA (2008 – 2009) Systems Engineer
- System Analysis and Performance benchmarking to analyze design changes in single/dual processor environments on various OS’s: Android, Linux, BREW, Windows Mobile [C/C++, QPST,QXDM]
- Automation of performance benchmarking environment. [Perl/Python]
- Optimized Software/Hardware for meeting Multimedia/Multimedia Concurrency requirements.
Qualcomm, USA (2007 – 2007) Engineering Intern
- Debugged customer issues on Windows Mobile code to identify and reuse free virtual memory. Analyzed and recommended customer service requests to reduce engineering effort and improve team knowledge on power management between ARM9/ARM11.
ST Microelectronics Private Limited, India (2004 –2006) Software Engineer
- Device Driver Development for Nomadik for Samsung smartphones. [C++, Windows Mobile]
- System integration of various modules in mobile handset development for Alcatel.
IIT Delhi (Indian Institute of Technology), India (2004 – 2004) Research Intern
- Design and Development of Wireless Internet Post Office Protocol for United Nations [C++, Linux]
Open-Source Contributions: https://github.com/techvintage
Publications:
- PSoC Implementation of a Newspaper Vending Machine Controller, Cypress Semiconductors, 2007
- Wireless Internet Post Office Protocol: Providing Rural Access to Text based Digital Communication using Wireless Multi-hop Mesh Networking, AMIC Bangkok, 2006
Education
- Fellow, Startup Leadership Program, CA. Highly selective, global 6-month world-class training program
- S. (Electrical & Computer Engineering), Stony Brook University, NY
- Tech (Information Technology), University School of IT, GGS Indraprastha University, New Delhi, India
Achievements & Awards
· ‘Group Recognition Award’ for contribution towards Mac Pro ‘Power On’ for Apple, Q3,2018; ‘Department Recognition Award’ in Intel VPG Graphics Software Group, Q2, 2015 for Apple Metal Design Win; Q3, 2013 for Apple OpenCL Design Win, enabling validation environment for set-top box for UPC France, 2012 [Intel]
· ‘Qualstar’ for contribution towards fine tuning the performance of MSM 7k chips, 2009 [Qualcomm]
· ‘Best Performing Team’ in TPA-Telecom, ‘Special Award’ on ‘ESS Day’ 2004, 2005 [ST Microelectronics]
· IEEE J.K.Pal Best Student Award 2003, Founder Chairman IEEE Student Branch at USIT, GGSIPU, New Delhi, India
· Honorable Mention in ACM International Collegiate Programming Contest, Asia Regionals, IIT Kanpur 2001
· National Level Science Talent Scholarship 1998, 1999.
