PAC learning, Theoretical Comp. Sci and Mathematics

In summary: I would recommend reading "Data Science: A Statistical Perspective" by Hadley Wickham and "Machine Learning: A Probabilistic Perspective" by Michael Nielsen. These books are foundational and will help you understand the theoretical underpinnings of machine learning and big data.In summary, Dave is interested in complexity theory, big data, and machine learning. He recommends reading books on these topics in order to better understand the theoretical underpinnings.
  • #1
dkotschessaa
1,060
783
Per another thread I've decided not to pursue a PhD in mathematics, but rather do a Masters with a Thesis option. So it's time to start finding an area and zooming in.

I am interested in complexity theory, and have been watching the whole "Big Data" scene with interest (and ignorance). I googled "Complexity and Big Data" and the result seems to be something called PAC learning.

I have some undergraduate background in Theory of Computation (including Automata and complexity) and a little bit of Combinatorics and Graph Theory. I have a career background in I.T. and some programming skills, and generally a good intuition and draw towards computer science, though I've never studied it formally, save from a purely mathematical perspective.

This coming semester I will be taking (graduate courses) Algebra II, Algebraic Automata Theory, and mathematical logic and foundations. (Which all nicely tie together)

I'm a bit away from a thesis but I don't want to wait to get started researching. I'd like it to be something mathematically satisfying, yet have some connection to something that is happening right now. Machine learning is a fascinating area that is very "now."

We have some people doing theoretical comp sci in the math department, so I have my eye on a couple of people for potential advisors and are in contact with them. So I will keep that up. In the meantime, just poking around on my own. I want to come to them with an informed idea.

So, can somebody tell me about PAC learning, what the 'questions' are in this field, or any other connections between machine learning, big data, complexity, etc.? Is this a good starting point for a topic?

Apologies if my question is uninformed or vague, but I'm at the necessarily naive first step...

Regards,

Dave K
 
  • Like
Likes Medicol
Physics news on Phys.org
  • #2
I wouldn't say PAC learning is "Complexity and Big Data", I would simply say that it is an aspect of machine learning. Specifically dealing with computational learning. Thus, it's natural for you to come across this field when you combine the two words you used. I can go into this but the basic idea is that you want to find a model that learns a specific target and also forms a good hypothesis. Naturally, there exist a more mathematical way to state this, but the definition is kind of long winded and not entirely insightful. The key idea is that you need an algorithm that converges in a reasonable amount a time using using a reasonable amount of random examples that converges to a good probability for a certain hypothesis using a class of target concepts. As you go on you'll find that the hypothesis class can generate the concept class.

A common problem faced by people who use this has to do with bounds. What is a good lower bound for a sample size? Also the computational complexity of finding a decision rule. Then there are issues with overfitting and how to handle non i.i.d models.
 
  • #3
Thanks Marnemath.

If you can help me sort out the timeline and where everything fits in here...

According to the ever infallible wikipedia, "An important innovation of the PAC framework is the introduction of computational complexity theory concepts to machine learning." and PAC was proposed by Les Valiant in 1984. Now, I came across a paper by Michael Kearns - actually, his PhD thesis under Les Valiant, which is called "The computational complexity of machine learning."

So I perhaps what I'm interested in is what Kearns is doing, which is a a follow up on Valiant's work?

Confirming what you said, there is also a chapter of "Advances of Intelligent Data Analysis" called "Machine Learning with Cellular Automata," which is also interesting to me, though it doesn't involve complexity. I find automata interesting as well.

Overall I'm pretty overwhelmed, despite thinking I am zooming in on something particular!

-Dave K
 
  • #4
I wish I could be more help to you when it came to timelines and the most modern and to the edge research out there. The fact of the matter is that I have no intention of ever working in the academic field, can only inform you on what type of problems a certain large corporation considers important and is working on. That's about it. Nevertheless, I think instead of trying to find a field based on buzz words and what is cool sounding, you need to take a step back an ensure this field is for you.

I'm a Statistician by training. I just happened to do a masters thesis on Hidden Markov Models and other junk, which led me down the rabbit hole into machine learning which naturally pushed me towards big data. Instead of focusing on what is out there, I would focus on what the professors at your university are doing. I would then ask them how to apply your interest to their work. Big data is just a broad field with many broad fields that make it up. I wouldn't get stuck on an idea for to much. If you really want to focus on data science, then try to take classes that deal with machine learning, statistics, and algorithm designs. Once you start intertwining these fields, you'll get a better feel for what kind of work you want to do.

As for PAC, I know it because I'm well read in my field. However, I'm by no means an expert on it. However, I would encourage you to get your feet wet with something simple like learning about Naive Bayes. I wouldn't just learn the statistical theory, I would learn to implement and then test it. This exercise will teach you more about how the field works than trying to find your thesis right now.
 
  • #5
MarneMath said:
I wish I could be more help to you when it came to timelines and the most modern and to the edge research out there. The fact of the matter is that I have no intention of ever working in the academic field, can only inform you on what type of problems a certain large corporation considers important and is working on. That's about it. Nevertheless, I think instead of trying to find a field based on buzz words and what is cool sounding, you need to take a step back an ensure this field is for you.

Well, I've been trying to narrow down the field for quite some time. I have broad experience in I.T. and went back for a math degree later. So what I'm trying to do now is wind my academic studies back into something I can leverage when I eventually look for work again. So, the "complexity" part is academic interest, but getting in touch with something in Data Science is more strategy.
I'm a Statistician by training. I just happened to do a masters thesis on Hidden Markov Models and other junk, which led me down the rabbit hole into machine learning which naturally pushed me towards big data. Instead of focusing on what is out there, I would focus on what the professors at your university are doing. I would then ask them how to apply your interest to their work. Big data is just a broad field with many broad fields that make it up. I wouldn't get stuck on an idea for to much. If you really want to focus on data science, then try to take classes that deal with machine learning, statistics, and algorithm designs. Once you start intertwining these fields, you'll get a better feel for what kind of work you want to do.

Unfortunately I don't have much stats background, and our math and stats departments are bitterly and politically divided. So I will have to fill in those gaps somehow. Hopefully I will be able to take some of the machine learning and stats classes later on.

I'm not necessarily stuck on PAC.. but I am trying to make sure that when I approach a professor with an idea of what I want to research, I am going in informed.

As for PAC, I know it because I'm well read in my field. However, I'm by no means an expert on it. However, I would encourage you to get your feet wet with something simple like learning about Naive Bayes. I wouldn't just learn the statistical theory, I would learn to implement and then test it. This exercise will teach you more about how the field works than trying to find your thesis right now.

In my "spare time" I am working on the Data Science track at Coursera, which involves learning the more practical side, (learning R, and stats and such), but I haven't gotten very far (because I haven't had much spare time!) Hopefully now that I have switched to a masters and won't be studying for quals I can get back to that.

Anyway, I am just shopping for now. Thanks for your input
 

Related to PAC learning, Theoretical Comp. Sci and Mathematics

1. What is PAC learning and why is it important in theoretical computer science and mathematics?

PAC (Probably Approximately Correct) learning is a framework in machine learning that allows for efficient and accurate learning from a given set of data. It is important in theoretical computer science and mathematics because it provides a rigorous mathematical foundation for understanding the capabilities and limitations of machine learning algorithms.

2. How does the PAC learning model work?

The PAC learning model involves a learning algorithm that takes in a set of training data and outputs a hypothesis that can accurately predict new data points. The model also takes into account a confidence parameter, which determines how close the hypothesis must be to the true underlying function in order to be considered accurate.

3. What is the relationship between PAC learning and computational complexity?

PAC learning is closely related to computational complexity because it provides a theoretical framework for understanding the time and space complexity of learning algorithms. In particular, the sample complexity of a PAC learning algorithm is often used as a measure of its computational complexity.

4. How does mathematics play a role in PAC learning?

Mathematics plays a crucial role in PAC learning as it provides the formal language and tools for analyzing the performance and limitations of learning algorithms. Concepts from various branches of mathematics, such as statistics, probability, and geometry, are used to develop and analyze PAC learning algorithms.

5. What is the current state of research and applications in PAC learning?

PAC learning is a rapidly evolving field of research, with new algorithms and techniques being developed constantly. It has a wide range of applications, including natural language processing, computer vision, and robotics. However, there are still many open questions and challenges in this area, making it an exciting and active field for further study.

Similar threads

  • STEM Academic Advising
Replies
1
Views
617
  • STEM Academic Advising
Replies
6
Views
203
  • STEM Academic Advising
Replies
6
Views
1K
Replies
8
Views
1K
  • STEM Academic Advising
Replies
20
Views
2K
  • STEM Academic Advising
Replies
11
Views
691
  • STEM Academic Advising
Replies
10
Views
1K
Replies
4
Views
163
  • STEM Academic Advising
Replies
14
Views
718
Replies
3
Views
1K
Back
Top