What are the state of the art frameworks in ML programming area? Similar to what React is for web and tailwind for CSS
Triton, ONNX, JAX, PyTorch, cublass, .....
I know they might be for different purposes, but having some idea what is for what and when to use would be helpful
these are model-level frameworks
> Triton
this is a kernel DSL
> [cublas]
this is a BLAS library built atop CUDA
> I know they might be for different purposes, but having some idea what is for what and when to use would be helpful
when people ask this question i always ask: who are you and what is your job? if you're not an ML/DL/AI person then you knowing the specifics is about as useful as me knowing the specifics of react/express/angular/tailwind/django/whatever as an ML person. this is not meant to be condescending, this is meant to allay your anxiety, ie that if you ever find yourself in the position where you have to know these things for your job, it won't be that hard to figure out (just like it isn't that hard to figure out the difference between react and express and django if you're a webdev).
let's swap roles and let's pretend i'm an ML engineer asking you how to enter CRUD. what would you tell me? my strong suspicion (if i caught you in an honest, frank, moment) is you would say to me "why the fuck would you want to do that - it sucks". i have this suspicion because i did actually used to do CRUD and it does suck! but here's your moment of zen: so does ML/DL/AI. it really really does suck. it's basically just as bad as webdev in terms of tedium/boredom/incidental complexity/etc. it's not fun, interesting, exciting, whatever else you're projecting based on an outside-looking-in-perspective.
now i'll acknowledge that there's one big difference: the pay is way better at the far end of the distribution - meaning if you can get to a FAANG ML team then you'll get more money than you're probably getting now (and a ton more stress too) and it's even more than the CRUD devs in FAANG. fine. but ask yourself if it's really worth learning a whole heap of new bullshit just for a chance at more money (no guarantee).
okay now a useful/practical answer: i went back to school for a PhD but i should've just dropped out with the MS. do that. even better do Georgia Tech's online MS.
I strongly recommend it if one's able. It's a bit more stable than a quickly evolving ML/DL/AI ecosystem or frontend ecosystem. The skills are more durable. It repays deep investment and knowledge.
It allows you to straddle both the distributed systems and services domain and the ML domain.
ML systems problems are extremely interesting since they require extremes of compute, storage, network, and latency, in very different parts of the model lifecycle. Its unique problem is the scarcity and cost of hardware accelerators.
I've worked eleven years in the space and rarely have had the desire to leave.
I'm currently a GPU compiler engineer in FAANG specializing in compute (not graphics). So clearly ML systems. Prior I have worked at every level of stack above and during my PhD I worked below (RTL). I hate it and think about leaving every day (I stay because of the money and like wtf else am I gonna do lol).
Triton is a python-like language to define ML math operations that run efficiently on hardware accelerators like GPUs or TPUs. OpenAI open sourced it. If there's a particular math operation you have a unique need for in your model, and it hasn't already been implemented by some other library, and it's important for efficiency, you'd probably write it in triton these days. It'll be compiled to an intermediate representation, then to an efficient runtime.
The course linked deals with "MLSys", or "ml systems". That means using GPUs and other hardware accelerators efficiently to run ML math operations on one or more computers.
95% of working ML engineers will never need to write Triton, and will be more than satisfied with PyTorch. Many more ML engineers will, nevertheless, write Triton code, because it is interesting, fun, easy, and people are impressed when you tell them you did.
Hosting pytorch models efficiently is currently awkward, because there's no clear winner in the ecosystem. ONNX is a way of representing model graphs in a framework-agnostic way. Other systems can interpret ONNX graphs to do inference. So sometimes, when someone wants to host a pytorch model, they turn it into an ONNX model and run it with an efficient runtime on CPUs or GPUs.
This is incorrect. Triton has literally no path to TPU and it has always been open source because it was Philippe Tillet's PhD project (OAI simply hired Philippe).
> 95% of working ML engineers will never need to write Triton, and will be more than satisfied with PyTorch.
Maybe 95% of hobbyist ML engineers but professional ML engineers are absolutely writing Triton day-to-day (eg FB has an army of such people). Even if you're not writing Triton you're still using Triton through inductor.
> because it is interesting, fun, easy, and people are impressed when you tell them you did
Professionals write Triton not for any of the reasons you mentioned but for the same reason they wrote CUDA kernels prior: it's a path to peak performance for their specific workloads (where stock PyTorch kernels have mediocre performance).
Everything after "Pipelining GEMM with TMA" (inclusive) is specific to NVIDIA. Which is fine but the title (of the guide itself) is clearly misleading.
misleading?
> the title (of the guide itself) is clearly misleading.
...
> title: the distinguishing name of a written, printed, or filmed production
do you understand now? or do i need to also define for you the word misleading?
I spent months, months of late nights watching commits to nvfuser and shit, I wrote a SASS decompiler instrumented everything trying to learn Blackwell.
This is the first time I've seen something so clean, just a real work of scholarship on it.
My hat is off to the authors and the contribution it represents.
If I would caution a reader anything it's that the 2CTA (sm_100 sm_110) patterns here are different on 1CTA in important ways and it's not a better / worse thing, they are good for different workloads.
Really outstanding work. I proves q lot of this in lean4 and published but I got lazy short of really doing the pedagogical work.
This is what you should be starting with if you want to max out 2CTA gear, it's immaculate.