@eabdullin on Hugging Face: "I’m doing a PhD in AI, which sounds impressive until you realize it mostly…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

posted an update about 22 hours ago

Post

1490

I’m doing a PhD in AI, which sounds impressive until you realize it mostly means I spend three years trying to make a computer say something slightly less stupid than it said yesterday.

People hear "AI researcher" and they think I’m building the future. No. I’m in a basement at 2 a.m. Googling, "CUDA error what the f**k does this mean."

And the worst part about AI research now is compute. You don’t even ask, "Is this idea good?" anymore. You ask, "Can I afford for this idea to be wrong?"

My advisor comes to me one day and says, "I think we should fine-tune our own language model."

I said, "Professor, with what money? I’m a PhD student. I have two bank accounts: checking and emotionally checking."

He goes, "Don’t worry. We have compute."

Now, in academia, "don’t worry" is never the beginning of a good sentence.

I said, "What do you mean we have compute?"

He said, "My friend knows the cluster admin. He can get us on the GPUs."

I said, "Okay… what do we have to do?"

He goes, "Nothing crazy. Just be very grateful in the acknowledgements."

I said, "How grateful?"

He said, "Maybe put him as co-author."

I said, "Co-author? Are we using the cluster, or is the cluster using us?"

Because at that point, that’s not a favor. That’s academic child support.

So I go to the server room, and the cluster admin walks up to me and goes, "So you’re the NLP student."

And in my head I’m like, "No, tonight you’re the principal investigator. You’re the provider. I’m just a little token waiting to be attended to."

Because whoever controls the GPUs controls the relationship. That’s lab romance.

He starts setting things up, and I’m trying to act casual, but I don’t understand any of the numbers he’s saying.

He’s like, "Yeah, I can probably give you four H100s for the weekend."

I’m nodding like, "Mmm. Four. Weekend. H. One hundred. Absolutely."

Inside I’m like, "Is that good? Is that prison time? Why did he say it like he was offering me organs?"

[Continue in comments...]

eabdullin

about 22 hours ago

•

edited about 22 hours ago

Then he says, "Just make sure your results are worth it."

I go, "Worth it?"

He goes, "Yeah, worth the compute."

I’m like, "I don’t think you mean compute. I think you mean my future."

Because when you’re a PhD student, every technical term also sounds financial.

Memory? I don’t have that.

Attention? I’m losing that.

Gradient descent? That’s my life trajectory.

So we start training the model. And at first I’m excited, because the loss is going down.

But then I start getting scared, because the loss is going down too nicely.

That’s how you know something’s wrong in research. If it works on the first try, you didn’t discover something. You evaluating it wrong.

My labmate comes in and says, "Bro, look at this curve. It’s beautiful."

I said, "Beautiful? We can’t afford beautiful. We’re a public university. Our plots are supposed to look like an earthquake wearing a Fitbit."

Then the model hits 92% accuracy.

I go, "Oh no."

My labmate says, "Why are you upset?"

I said, "Because 92% is corporate accuracy. University accuracy is 63% with a very passionate limitations section."

You know you’re broke when your model performing well feels suspicious.

I’m looking at the training dashboard like, "What are you doing? Who paid you to be competent?"

And that’s when I open the benchmark leaderboard.

I look at the top row.

Transformer model.

Massive pretraining.

Billions of parameters.

I go, "We’re cooked."

My labmate goes, "Why?"

I said, "Because the big labs are here."

He goes, "Which big labs?"

I said, "All of them. The ones with logos in their abstracts."

That’s how you know you’re not competing with a university anymore. Their paper says, "We trained on 14 trillion tokens." Mine says, "We thank Kevin for letting us use his desktop while he was at lunch."

Then I open their google scholar profile and see: "Attention Is All You Need."

I go, "I don’t even know that paper like that."

My labmate looks at me like I’m insane.

He goes, "You don’t know ‘Attention Is All You Need’?"

I said, "I mean, I’ve heard of it."

He goes, "That’s THE Transformer paper."

I said, "That’s them? Oh no. That’s timeless."

Then it all starts coming back.

Self-attention.

Multi-head attention.

Positional encoding.

Encoder-decoder architecture.

I’m like, "Oh my God, they have hits."

And my labmate starts humming it out loud, "Attention is all you nee-e-ed!"

I go, "Stop saying it! Stop giving them citations! Their h-index is already too high!"

Every time a PhD student says "attention" in a lab, a Google researcher gets another keynote.

And now I’m panicking because I realize we’re not just training a model. We’re entering a financial and emotional hostage situation with a machine that cannot count the letter R in strawberry.

That’s the humiliating part of working on LLMs.

You spend three weeks designing an experiment, three days debugging the tokenizer, twelve hours writing a prompt, and the model still answers your question like a guy who read half the Wikipedia page in an Uber.

I asked our model, "Summarize my thesis proposal."

It said, "Certainly! Here is a concise summary: the student appears confused but enthusiastic."

I was like, "Damn. It works."

That’s the problem. These models are wrong, but emotionally accurate.

Then comes the worst part: the ablation study.

For people who don’t know, an ablation study is where you remove parts of your system to prove they matter.

Which is also what a PhD does to your personality.

You remove sleep. Performance drops.

You remove friends. Performance somehow stays the same.

You remove funding. The system becomes unstable.

After all that Reviewer 2 says, "Needs more experiments."

Reviewer 2 always says "needs more experiments" like experiments are free.

Reviewer 2 has never seen a GPU invoice. Reviewer 2 thinks compute grows on trees. It does not. It grows in data centers cooled by the tears of graduate students.

Finally, after three days, the run finishes.

The cluster admin comes back and says, "Hey, don’t worry."

I go, "Thank God."

He says, "The storage is covered."

I said, "The storage?"

He goes, "Yeah, the storage is on us."

I said, "What about the GPUs?"

He goes, "Oh, no, those are billable."

And from across the lab my labmate whispers, "Attention is all you need."

I said, "No. Funding. Funding is all I need."

this is all generated by AI which was carefully prompted by me :)
Kudos to original standup: https://www.youtube.com/shorts/R33qcZtE8iM

In this post

eabdullin Yelaman Abdullin