PAC learning, Theoretical Comp. Sci and Mathematics

dkotschessaa · Dec 30, 2014

Per another thread I've decided not to pursue a PhD in mathematics, but rather do a Masters with a Thesis option. So it's time to start finding an area and zooming in.

I am interested in complexity theory, and have been watching the whole "Big Data" scene with interest (and ignorance). I googled "Complexity and Big Data" and the result seems to be something called PAC learning.

I have some undergraduate background in Theory of Computation (including Automata and complexity) and a little bit of Combinatorics and Graph Theory. I have a career background in I.T. and some programming skills, and generally a good intuition and draw towards computer science, though I've never studied it formally, save from a purely mathematical perspective.

This coming semester I will be taking (graduate courses) Algebra II, Algebraic Automata Theory, and mathematical logic and foundations. (Which all nicely tie together)

I'm a bit away from a thesis but I don't want to wait to get started researching. I'd like it to be something mathematically satisfying, yet have some connection to something that is happening right now. Machine learning is a fascinating area that is very "now."

We have some people doing theoretical comp sci in the math department, so I have my eye on a couple of people for potential advisors and are in contact with them. So I will keep that up. In the meantime, just poking around on my own. I want to come to them with an informed idea.

So, can somebody tell me about PAC learning, what the 'questions' are in this field, or any other connections between machine learning, big data, complexity, etc.? Is this a good starting point for a topic?

Apologies if my question is uninformed or vague, but I'm at the necessarily naive first step...

Regards,

Dave K

MarneMath · Dec 30, 2014

I wouldn't say PAC learning is "Complexity and Big Data", I would simply say that it is an aspect of machine learning. Specifically dealing with computational learning. Thus, it's natural for you to come across this field when you combine the two words you used. I can go into this but the basic idea is that you want to find a model that learns a specific target and also forms a good hypothesis. Naturally, there exist a more mathematical way to state this, but the definition is kind of long winded and not entirely insightful. The key idea is that you need an algorithm that converges in a reasonable amount a time using using a reasonable amount of random examples that converges to a good probability for a certain hypothesis using a class of target concepts. As you go on you'll find that the hypothesis class can generate the concept class.

A common problem faced by people who use this has to do with bounds. What is a good lower bound for a sample size? Also the computational complexity of finding a decision rule. Then there are issues with overfitting and how to handle non i.i.d models.

dkotschessaa · Dec 30, 2014

Thanks Marnemath.

If you can help me sort out the timeline and where everything fits in here...

According to the ever infallible wikipedia, "An important innovation of the PAC framework is the introduction of computational complexity theory concepts to machine learning." and PAC was proposed by Les Valiant in 1984. Now, I came across a paper by Michael Kearns - actually, his PhD thesis under Les Valiant, which is called "The computational complexity of machine learning."

So I perhaps what I'm interested in is what Kearns is doing, which is a a follow up on Valiant's work?

Confirming what you said, there is also a chapter of "Advances of Intelligent Data Analysis" called "Machine Learning with Cellular Automata," which is also interesting to me, though it doesn't involve complexity. I find automata interesting as well.

Overall I'm pretty overwhelmed, despite thinking I am zooming in on something particular!

-Dave K

MarneMath · Dec 30, 2014

I wish I could be more help to you when it came to timelines and the most modern and to the edge research out there. The fact of the matter is that I have no intention of ever working in the academic field, can only inform you on what type of problems a certain large corporation considers important and is working on. That's about it. Nevertheless, I think instead of trying to find a field based on buzz words and what is cool sounding, you need to take a step back an ensure this field is for you.

I'm a Statistician by training. I just happened to do a masters thesis on Hidden Markov Models and other junk, which led me down the rabbit hole into machine learning which naturally pushed me towards big data. Instead of focusing on what is out there, I would focus on what the professors at your university are doing. I would then ask them how to apply your interest to their work. Big data is just a broad field with many broad fields that make it up. I wouldn't get stuck on an idea for to much. If you really want to focus on data science, then try to take classes that deal with machine learning, statistics, and algorithm designs. Once you start intertwining these fields, you'll get a better feel for what kind of work you want to do.

As for PAC, I know it because I'm well read in my field. However, I'm by no means an expert on it. However, I would encourage you to get your feet wet with something simple like learning about Naive Bayes. I wouldn't just learn the statistical theory, I would learn to implement and then test it. This exercise will teach you more about how the field works than trying to find your thesis right now.

dkotschessaa · Dec 30, 2014

MarneMath said:

I wish I could be more help to you when it came to timelines and the most modern and to the edge research out there. The fact of the matter is that I have no intention of ever working in the academic field, can only inform you on what type of problems a certain large corporation considers important and is working on. That's about it. Nevertheless, I think instead of trying to find a field based on buzz words and what is cool sounding, you need to take a step back an ensure this field is for you.

Well, I've been trying to narrow down the field for quite some time. I have broad experience in I.T. and went back for a math degree later. So what I'm trying to do now is wind my academic studies back into something I can leverage when I eventually look for work again. So, the "complexity" part is academic interest, but getting in touch with something in Data Science is more strategy.

I'm a Statistician by training. I just happened to do a masters thesis on Hidden Markov Models and other junk, which led me down the rabbit hole into machine learning which naturally pushed me towards big data. Instead of focusing on what is out there, I would focus on what the professors at your university are doing. I would then ask them how to apply your interest to their work. Big data is just a broad field with many broad fields that make it up. I wouldn't get stuck on an idea for to much. If you really want to focus on data science, then try to take classes that deal with machine learning, statistics, and algorithm designs. Once you start intertwining these fields, you'll get a better feel for what kind of work you want to do.

Unfortunately I don't have much stats background, and our math and stats departments are bitterly and politically divided. So I will have to fill in those gaps somehow. Hopefully I will be able to take some of the machine learning and stats classes later on.

I'm not necessarily stuck on PAC.. but I am trying to make sure that when I approach a professor with an idea of what I want to research, I am going in informed.

As for PAC, I know it because I'm well read in my field. However, I'm by no means an expert on it. However, I would encourage you to get your feet wet with something simple like learning about Naive Bayes. I wouldn't just learn the statistical theory, I would learn to implement and then test it. This exercise will teach you more about how the field works than trying to find your thesis right now.

In my "spare time" I am working on the Data Science track at Coursera, which involves learning the more practical side, (learning R, and stats and such), but I haven't gotten very far (because I haven't had much spare time!) Hopefully now that I have switched to a masters and won't be studying for quals I can get back to that.

Anyway, I am just shopping for now. Thanks for your input

PAC learning, Theoretical Comp. Sci and Mathematics

Related to PAC learning, Theoretical Comp. Sci and Mathematics

1. What is PAC learning and why is it important in theoretical computer science and mathematics?

2. How does the PAC learning model work?

3. What is the relationship between PAC learning and computational complexity?

4. How does mathematics play a role in PAC learning?

5. What is the current state of research and applications in PAC learning?

Similar threads

Hot Threads

Recent Insights