What Other Data Distributions Can Be Used for Outlier Detection in Python?

In summary, the conversation is about using outlier detection techniques in Python and finding other distributions besides sinusoidal to test the algorithm on. The participants discuss using scipy.stats and plotting data to determine the appropriate distribution. They also suggest considering the type of data being dealt with and different distributions for different devices. Some suggestions include square waves for digital currents and exponential or sinc currents for real-world scenarios.
  • #1
ipmax
4
0
Hi folks...I am trying to use outlier detection techniques on python...I checked my algorithm for sinusoidal distribution of data. I need to develop some other kind of distribution to check the working of the algorithm I have used. Can you give me examples of some other known distribution like sine, gaussian, binomial etc...which I can use for outlier detection?

IPMAX
 
Technology news on Phys.org
  • #2
Zipf, Poisson
 
  • #3
What type of data do you have/what do you expect to see?

scipy.stats has a whole bunch of distributions you can test against and a bunch of tests for trying to figure out how your data is distributed.
 
  • #4
I saw the scipy.stats module...I am confused with which function would be appropriate...I am dealing with the currents...what would be a good distribution for a current variable...I tried sinusoidal (thats what I could come up with) :P
 
  • #5
The way I've done it is plot my data and then see which distributions it seems to look like. If you plot yours and post the graph, it may be easier to give you suggestions. Right now, I'd guess that sinusoidal does sound about right.
 
  • #6
you misunderstood my post...My whole point is to generate a current dataset from a certain distribution and mix random outliers in it and detect the outliers...I have tried sinusoidal distribution as a possible dataset and tried the detection. Now, I need to devise some other distribution of dataset. I just know sinusoidal current dataset...what else could be a data distribution that would be favorable to called current dataset?
 
  • #7
ipmax said:
I just know sinusoidal current dataset...what else could be a data distribution that would be favorable to called current dataset?

Depends on the device/whatever you're trying to simulate: digital currents will likely be the derivative of a square wave (which itself is a collection of impulse functions), mosfets look sort of like http://en.wikipedia.org/wiki/Current%E2%80%93voltage_characteristichttp://en.wikipedia.org/wiki/Current%E2%80%93voltage_characteristic , etc. You may need outlier detection for some distros and not others.
 
Last edited by a moderator:
  • #8
what about exponential current and sinc? Is sinc current probable in real world?
 
Last edited:

Related to What Other Data Distributions Can Be Used for Outlier Detection in Python?

What is data distribution?

Data distribution refers to the way in which data values are spread out or distributed across a dataset. It can provide insights into the central tendency, variability, and shape of a dataset.

What are the different types of data distribution?

The main types of data distribution are uniform, normal (or Gaussian), skewed, and bimodal. A uniform distribution has an equal probability of all values occurring, a normal distribution has a bell-shaped curve with most values falling near the mean, a skewed distribution has a longer tail on one side, and a bimodal distribution has two distinct peaks.

How is data distribution represented visually?

Data distribution is often represented visually using histograms, box plots, and probability plots. These graphs can help to identify the shape and variability of the data distribution.

What can data distribution tell us about a dataset?

Data distribution can provide important insights into the characteristics of a dataset. It can reveal the central tendency, variability, and outliers in the data, which can help to identify patterns and relationships.

How does data distribution impact statistical analysis?

The type of data distribution can impact the choice of statistical analysis methods. For example, if the data is normally distributed, parametric tests can be used, but if the data is skewed, non-parametric tests may be more appropriate. Understanding the data distribution is crucial for accurate and meaningful statistical analysis.

Similar threads

  • Programming and Computer Science
Replies
1
Views
361
Replies
5
Views
2K
  • Programming and Computer Science
4
Replies
107
Views
5K
  • Programming and Computer Science
Replies
8
Views
964
  • Programming and Computer Science
Replies
10
Views
4K
  • Programming and Computer Science
Replies
8
Views
1K
  • Programming and Computer Science
Replies
11
Views
3K
  • Engineering and Comp Sci Homework Help
Replies
7
Views
1K
  • Programming and Computer Science
Replies
3
Views
803
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
4K
Back
Top