Welcome to our community

Be a part of something great, join today!

Why I Can't Stand SAS (and even R)

  • Thread starter
  • Admin
  • #1

Ackbach

Indicium Physicus
Staff member
Jan 26, 2012
4,189
This is an interesting topic! The software used in the academic math world is often very different than the business world from my experience. In school I used SAS and R almost entirely. Most companies today don't want to pay the huge cost of a SAS license and deal with the closed source nature of it either. R is still used for more advanced statistical projects I think, but Python has really taken over everywhere due to its extremely versatile nature.That's just my own view of the software changes and it's likely US centric.

So for this task though, I agree with @MarkFL if adding more languages is negligible then easiest to just add all of them or all within reason. Don't need to let something like highlighting syntax be the thing that makes a new user leave out of frustration. :p
I found learning SAS to be one of the most infuriating learning experiences of my life. Here's the contrast: I learned basic Python from this webpage in one week. At that point, I was good enough to be productive, and I had learned enough to do the code for TestScript, a LabVIEW/Python connector. I can't count the number of times I've tried to do something somewhat intuitive in Python, and it just worked! Or if it didn't work, it was very close to what did work.

In SAS, it was a completely different ballgame. I took up The Little SAS Book, pushed my way through 4 or 5 chapters of it in about six weeks, and found that my first attempt at doing anything never worked. Ever. Its syntax is the most obtuse, horrific thing I've come across in a long time. Its readability is terrible. Probably the only reason it's still around is that it can do any statistics you want, and it's supported by a company. But it's going to lose to R, because R can also do any statistics you want, and it's free. I'm not a huge fan of R's readability either, though. It's Python for me, because readability counts. And also because its syntax is so incredibly intuitive. It shortens developer time for small projects enormously.
 
Last edited:

Jameson

Administrator
Staff member
Jan 26, 2012
4,035
SAS is pretty much universally accepted to be a terrible language by anyone that hasn't spent a career with it. I had to become an expert level coder for it in my first jobs so I've sunk an embarrassing amount of time into learning niche topics about the language that aren't generalizable to other languages really. However it is still shockingly prevalent at big banks and pharma companies, so depending on the situation having this skill could be useful.

R's syntax is pretty standard functional syntax I think. You have base functions, variables, loops, etc. that are all used in a standard manner so I don't think it's bad to pick up. If you start using dplyr you can use a chain or piping type syntax that removes the concept of using nested parentheses like this: func1(func2((a+b)^2)). which gets really hard to follow when you get many levels into something and instead you can rewrite it as a very logical left-to-right flow like this: (a+b)^2 %>% func2() %>% func1(). This isn't an exact analog of Python syntax but it is much closer to the suffix style chaining of additional functions and operations to the starting point that Python uses.

I'm pretty sure you used a class in your blog post @Ackbach so you're familiar with object oriented concepts, but this is a key different between Python and R too. Python can work both ways and many users will never define a class but for those that want this it's a much more flexible and scalable language than R for developing software and web apps. The only topic which I've heard colleagues prefer R for is pure statistical tasks. I think it's better suited for basic statistical tests all the way to advanced simulations for approximating a distribution with no closed form. Python might be closing the gap but R seems to be more concise and powerful for pure stats work.

The limitation of all these languages is run time performance is orders of magnitude slower than C++ so they can't be used for low latency problems. @Klaas van Aarsen or @MarkFL might know about this more in addition to @Ackbach but I think the idea is that the higher level a language becomes with not needing to explicitly define everything about the objects that your code defines and interacts with, the more you'll trade in efficiency.

Some days i wish I had double majored in computer science years ago and others I'm glad I didn't so I don't get stuck doing unit tests or something else that can be really limited and repetitive. The gap between most data scientists and trained computer scientists is huger though, and I think a data scientist with coding standards up to par with CS grads is a huge asset.