Minimising a function related to a Bloom Filter

In summary, the conversation is about minimizing a function related to the use of a "Bloom Filter" data structure in computer science. The function describes the probability of encountering a false positive and the goal is to find the value of k that minimizes it. One approach is to differentiate the function and solve for k, but the process leads to a complicated equation that cannot be easily solved. Another approach suggested is to take the natural logarithm of both sides of the equation and use the zero product property to solve for j, which represents a simplified version of k. However, this method also leads to a complex equation that is difficult to solve. The best approach is to show that j = 2 is the only solution by finding the turning points
  • #1
delcypher
5
0
Minimising a function related to a "Bloom Filter"

Homework Statement


I'm looking at a problem that requires me to find [itex]k[/itex] that minimises the function [itex]f[/itex]. Where [itex]n,m[/itex] are to be treated as constants.

Although not needed for this problem the context is that the function [itex]f[/itex] approximately describes the probability of "finding a false positive" when using the bloom filter data structure (computer science). See http://en.wikipedia.org/wiki/Bloom_filter#Probability_of_false_positives

Homework Equations



[itex]f(k) = (1 - e^{\frac{-kn}{m}})^k[/itex]

I already know what the answer should be. Which is the [itex]k[/itex] that minimises [itex]f[/itex] is given by
[itex]k_{\text{min}} = \frac{m}{n} \ln(2)[/itex]

What I want to find out is a good way of getting there. I found a way but it is incomplete because it requires me to know the form of the answer and to use observation to solve for a particular value. So what I'd like to know is

  • Is there a way of completing my proof given later that doesn't require me to use observation (to find [itex]j[/itex])?
  • Is there a better way of approaching the problem? I may of gone the long way round proving this.

The Attempt at a Solution


My plan is to differentiate [itex]f[/itex] with respect to [itex]k[/itex] and set to 0 and then solve for [itex]k[/itex].

Firstly I rewrite [itex]f[/itex].
[itex] f = e^{k \ln{\left(1 - e^{\frac{-kn}{m}}\right) }}[/itex]

[itex] \ln{f} = k \ln{ ( 1 - e^{\frac{-kn}{m}}) }[/itex]

and observe that minimising [itex]f[/itex] is equivalent to minimising [itex]\ln{f}[/itex]

Now via the product rules and chain rule...
[itex] \frac{d \ln{f}}{dk} =
\ln{(1 - e^{\frac{-kn}{m}} )} +
\frac{kn e^{ \frac{-kn}{m} } }{m (1 - e^{\frac{-kn}{m}}) }
=0
[/itex]

I now make the following substitution to make the equations easier to write
[itex] a = \frac{kn}{m} [/itex]

so...

[itex] \frac{d \ln{f}}{dk} =
\ln{(1 - e^{-a} )} +
\frac{a e^{-a} }{(1 - e^{-a}) }
=0
[/itex]

So now I'm solving for a. I now rewrite the second term using the following...
[itex] \frac{a e^{-a} }{(1 - e^{-a}) } =
\ln{(e^{ \frac{a e^{-a} }{(1 - e^{-a}) } } )}
[/itex]

To give...
[itex]
\ln{ \left( (1 - e^{-a} ) e^{ \frac{a e^{-a} }{(1 - e^{-a}) } } \right) } = 0
[/itex]

Now re-arranging...
[itex] (1 - e^{-a} ) e^{ \frac{a e^{-a} }{(1 - e^{-a}) } } = 1 [/itex]

Now because I know what the answer is supposed to be it hints to me that I should try a form of [itex]a[/itex] such that [itex]a = \ln{j}[/itex] where [itex]j[/itex] is to be found which would give a solution of
[itex] k_{\text{min}} = \frac{m}{n} \ln{j} [/itex]

Substituting that in and rearranging gives
[itex] ( 1 - \frac{1}{j}) j^{\frac{1}{j-1}} = 1 [/itex]

This is where I get stuck. I can't think of a way to solve the above equation for [itex]j[/itex]. By observation I can see that [itex]j = 2[/itex] solves it as expected but I don't know how to show that. I can see by plotting it (via GnuPlot) that [itex]j=2[/itex] is the only solution. I also found that "Wolfram Alpha" was able to solve the equation although it wouldn't explain to me how it did so.Any ideas?
 
Physics news on Phys.org
  • #2


I received a reply from QuarkCharmer. For some reason it isn't shown here. It was
I admittedly didn't read the whole post, just the end part where you are trying to solve for J. You can take the natural log of both sides of the equation, recalling that:
[tex]ln(x^c) = cln(x)[/tex]

Then you have something like:
[tex]\frac{1}{J-1}ln(J-1) = 0[/tex]
(since the ln of 1 is 0)

You can see that 1/(J-1) has an asymptote at 1, so you can discount it. Then, use the zero product property to solve.

I appreciate the reply but the suggestion does not work. If I take logarithms of both sides I get
[itex]
\ln{ \left( ( 1 - \frac{1}{j}) j^{\frac{1}{j-1}} \right)} = 0
[/itex]

From this it is not possible to get to
[itex]
\frac{1}{j-1} \ln{ \left( ( 1 - \frac{1}{j}) j\right)} = 0
[/itex]
(which would lead to your suggestion) because only [itex]j[/itex] is to the power of [itex]\frac{1}{j-1}[/itex] and not [itex]( 1 - \frac{1}{j}) j[/itex]

If I take the logarithm of both sides what I actually end up with is
[itex]
\ln{(1 - \frac{1}{j})} - \frac{1}{j-1} \ln{(j)} = 0
[/itex]
which is no easier to solve in my opinion.
 
  • #3


delcypher said:
I received a reply from QuarkCharmer. For some reason it isn't shown here. It wasI appreciate the reply but the suggestion does not work. If I take logarithms of both sides I get
[itex]
\ln{ \left( ( 1 - \frac{1}{j}) j^{\frac{1}{j-1}} \right)} = 0
[/itex]

From this it is not possible to get to
[itex]
\frac{1}{j-1} \ln{ \left( ( 1 - \frac{1}{j}) j\right)} = 0
[/itex]
(which would lead to your suggestion) because only [itex]j[/itex] is to the power of [itex]\frac{1}{j-1}[/itex] and not [itex]( 1 - \frac{1}{j}) j[/itex]

If I take the logarithm of both sides what I actually end up with is
[itex]
\ln{(1 - \frac{1}{j})} - \frac{1}{j-1} \ln{(j)} = 0
[/itex]
which is no easier to solve in my opinion.

Hey delcypher.

I would take QuarkCharmers advice.

The logarithm function is unique for all valid values.

Basically what happens is that (1 - 1/j)j^(1/(j-1)) = 1. Now let's get it in terms of one log term which gives us:

(2-j)/(j-1)ln(j) + ln(j-1) = 0

Now j > 1 since ln(j-1) is only defined when this is the case. Now (2-j)/(j-1) is monotonic increasing for j <= 2. Now RHS is also monotonically increasing as well for j > 1.

From the equation you can see that j = 2 is one solution, but to show it's the only solution you should show what the turning points of this function are and that using various theorems like say the mean-value theorem you can show that it is the only root if this is the case (I haven't checked, but I would basically do what I said above).
 

Related to Minimising a function related to a Bloom Filter

What is a Bloom Filter?

A Bloom Filter is a data structure used for efficient storage and retrieval of key-value pairs. It uses a bit array and a set of hash functions to quickly check if an element is present in a set or not.

Why do we need to minimise a function related to a Bloom Filter?

In order to improve the performance and efficiency of a Bloom Filter, it is important to minimise a function that represents the number of false positives or false negatives. This helps to reduce the memory usage and improve the accuracy of the Bloom Filter.

What is the function that needs to be minimised in a Bloom Filter?

The function that needs to be minimised in a Bloom Filter is the false positive rate. This is the probability that an element is incorrectly identified as present in the set even though it is not.

How can the false positive rate be minimised in a Bloom Filter?

The false positive rate in a Bloom Filter can be minimised by adjusting the size of the bit array, the number of hash functions used, and the number of elements in the set. Increasing the size of the bit array and the number of hash functions can decrease the false positive rate, while decreasing the number of elements can also help improve the accuracy of the Bloom Filter.

What are some applications of minimising a function related to a Bloom Filter?

Minimising a function related to a Bloom Filter has various applications, such as in web caching, network routers, spell checkers, and malware detection systems. It is also used in distributed systems for efficient data storage and retrieval.

Similar threads

  • Calculus and Beyond Homework Help
Replies
2
Views
647
  • Calculus and Beyond Homework Help
Replies
14
Views
3K
  • Calculus and Beyond Homework Help
Replies
14
Views
2K
  • Calculus and Beyond Homework Help
Replies
16
Views
1K
  • Calculus and Beyond Homework Help
Replies
2
Views
967
  • Calculus and Beyond Homework Help
Replies
18
Views
1K
  • Calculus and Beyond Homework Help
Replies
3
Views
523
  • Calculus and Beyond Homework Help
Replies
2
Views
873
  • Calculus and Beyond Homework Help
Replies
2
Views
1K
  • Calculus and Beyond Homework Help
Replies
2
Views
1K
Back
Top