Build a Natural Language Processing Transformer from Scratch

In summary: Theory.I am not saying that theory is bad or unnecessary. What I am looking for is a numerical example.I see. So you want a numerical example of a QM algorithm, without learning the theory. There is no numerical example of a QM algorithm, without learning the theory.
  • #1
jonjacson
447
38
TL;DR Summary
I wonder if anybody knows how to build and train one from scratch or if there is any book, video, or website explaining it.
I have read that transformers are the key behind recent success in artificial intelligence but the problem is that it is quite opaque.

I wonder if anybody knows how to build and train one from scratch or if there is any book, video, or website explaining it.

Thanks
 
Technology news on Phys.org
  • #4
jonjacson said:
I have read that transformers are the key behind recent success in artificial intelligence but the problem is that it is quite opaque.
Then you need to understand the theory.

jonjacson said:
But I don't see a python implementation, just the theory.
You did not ask for python code.
Google: python code for NPL transformer

There will be more answers from others.
 
  • #5
jonjacson said:
But I don't see a python implementation, just the theory.
But you didn't ask for a Python implementation, you asked about building one from scratch!

If I wanted to find a Python machine learning algorithm related to [X] I would input "Tensorflow X" into a search engine. Have you tried this?
 
  • Informative
Likes berkeman
  • #6
Baluncore said:
Then you need to understand the theory.You did not ask for python code.
Google: python code for NPL transformer

There will be more answers from others.
I see answers but they use libraries like pytorch or tensorflow. I mean from scratch, pure python.

pbuk said:
But you didn't ask for a Python implementation, you asked about building one from scratch!

If I wanted to find a Python machine learning algorithm related to [X] I would input "Tensorflow X" into a search engine. Have you tried this?
I don't want to use libraries.
 
  • #7
jonjacson said:
I see answers but they use libraries like pytorch or tensorflow. I mean from scratch, pure python.
Even if you don't use libraries, looking at the source code for the libraries might be a good way of learning how these things are done in Python.

If searching the web doesn't turn up any Python implementations that don't use libraries, that's probably a clue that everyone else who has tried what you are trying has found it easier to use the well-tested implementations in the libraries than to try and roll their own.
 
  • #8
PeterDonis said:
Even if you don't use libraries, looking at the source code for the libraries might be a good way of learning how these things are done in Python.

If searching the web doesn't turn up any Python implementations that don't use libraries, that's probably a clue that everyone else who has tried what you are trying has found it easier to use the well-tested implementations in the libraries than to try and roll their own.

The problem is that this looks like a magic thing, I don't know why is it "hidden" behind the bogus language "deep learning", "encoder", "decoder", "tokeninez input embeeding", "multi head self attention", "layer normalization", "feed forward network", "residual connection".... and all that stuff.

At the end I guess this will be a whole bunch of vectors, matrices and operations on them.

Hopefully now you understand what I want to know.
 
  • #9
jonjacson said:
The problem is that this looks like a magic thing
That problem doesn't look to me like a "find Python code" problem. It looks to me like an "learn and understand the theory" problem, as @Baluncore has already pointed out.
 
  • Like
Likes russ_watters, pbuk, Tom.G and 1 other person
  • #10
jonjacson said:
The problem is that this looks like a magic thing, ...
“Any sufficiently advanced technology is indistinguishable from magic”.
Arthur C. Clarke's third law.
 
  • Like
Likes russ_watters
  • #11
Baluncore said:
“Any sufficiently advanced technology is indistinguishable from magic”.
Arthur C. Clarke's third law.

Nice, but still there is no basic example of this anywhere.
 
  • #12
jonjacson said:
Nice, but still there is no basic example of this anywhere.
It is only magic because you do not yet understand the theory. If you were given some version of the Python code, you would still not understand the theory. It would still be magic, and a danger to the uninitiated.
 
  • Like
Likes russ_watters, PeterDonis and pbuk
  • #13
jonjacson said:
The problem is that this looks like a magic thing, I don't know why is it "hidden" behind the bogus language "deep learning", "encoder", "decoder", "tokeninez input embeeding", "multi head self attention", "layer normalization", "feed forward network", "residual connection".... and all that stuff.
For the same reason that quantum mechanics is hidden behind the bogus language "complex projective space", "Hermitian operators", "Hamiltonians", "eigenstates", "superpositions" and all that stuff.

At the end this is just a whole bunch of vectors, matrices and operations on them.

jonjacson said:
Hopefully now you understand what I want to know.
Yes, you want to do QM without learning the theory. Good luck.

Edit: or is this the kind of thing you are looking for: https://habr.com/en/companies/ods/articles/708672/
 
  • #14
pbuk said:
For the same reason that quantum mechanics is hidden behind the bogus language "complex projective space", "Hermitian operators", "Hamiltonians", "eigenstates", "superpositions" and all that stuff.

At the end this is just a whole bunch of vectors, matrices and operations on them.Yes, you want to do QM without learning the theory. Good luck.

Edit: or is this the kind of thing you are looking for: https://habr.com/en/companies/ods/articles/708672/

I am not saying that theory is bad or unnecessary. What I am looking for is a numerical example.

Schrodinger equation is fine, but once you compute the orbitals of the hydrogen atom you get a better understanding.

I don't understand why it is bad to ask for numerical examples and numbers.

Your edit was great and it is what I was looking for, I add the link you have at the end of that article:

https://jalammar.github.io/illustrated-transformer/

And something I just found:

https://e2eml.school/transformers.html

I hope this helps anybody interested on this topic.

Thanks to all for your replies.

Edit:

This may be good too:

 
Last edited:

1. What is Natural Language Processing (NLP)?

Natural Language Processing (NLP) is a subfield of computer science and artificial intelligence that focuses on enabling computers to understand, analyze, and generate human language. It involves combining linguistics, computer science, and machine learning techniques to process and understand large amounts of natural language data.

2. What is a Transformer in NLP?

A Transformer is a type of neural network architecture used in NLP tasks such as language translation. It utilizes attention mechanisms to process sequences of words or symbols, allowing it to handle long input sequences and capture relationships between words more easily than traditional recurrent neural networks (RNNs).

3. Why would I want to build a NLP Transformer from scratch?

Building a NLP Transformer from scratch allows you to have complete control over the design and implementation of your model. This can be beneficial for research purposes or when working with limited or specific datasets that may not be suitable for pre-trained models. It also allows you to gain a deeper understanding of the inner workings of these complex models.

4. What are the steps involved in building a NLP Transformer from scratch?

The steps involved in building a NLP Transformer from scratch include data preprocessing, designing and implementing the Transformer architecture, training the model, and evaluating its performance. It also involves fine-tuning the model and optimizing its hyperparameters for better results.

5. What are the potential applications of a NLP Transformer?

A NLP Transformer can be used for a wide range of NLP tasks, such as language translation, text summarization, question-answering, sentiment analysis, and natural language understanding. It has also been used in chatbots, virtual assistants, and other applications that require the understanding and generation of human language.

Similar threads

Replies
8
Views
976
  • Computing and Technology
Replies
1
Views
279
  • Special and General Relativity
Replies
5
Views
959
  • Programming and Computer Science
Replies
4
Views
4K
  • Computing and Technology
2
Replies
35
Views
4K
Replies
2
Views
886
  • New Member Introductions
Replies
1
Views
71
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • STEM Academic Advising
Replies
12
Views
1K
Back
Top