# Find the context-free grammar,that produces a language

#### evinda

##### Well-known member
MHB Site Helper
Hello!!!
Could you explain me how I could find the context-free grammar,that produces a language?Because I am not familiar yet with context-free grammars

#### Evgeny.Makarov

##### Well-known member
MHB Math Scholar
Sorry, this forum is not a replacement for a textbook. It is possible to write some approximation to a book chapter (like Deveno is wont to do ), but why would an explanation from an amateur helper be better than that from a published author and a professional mathematician? I assume you mean that you don't yet have enough intuition about CFG and are not comfortable with them, not that you don't know the definition. Even so, you should first go through several examples in a textbook. For example, the book by Lewis and Papadimitriou has five worked out examples of grammars. After you have some familiarity, feel free to post any questions about concrete languages or problems.

#### evinda

##### Well-known member
MHB Site Helper
Sorry, this forum is not a replacement for a textbook. It is possible to write some approximation to a book chapter (like Deveno is wont to do ), but why would an explanation from an amateur helper be better than that from a published author and a professional mathematician? I assume you mean that you don't yet have enough intuition about CFG and are not comfortable with them, not that you don't know the definition. Even so, you should first go through several examples in a textbook. For example, the book by Lewis and Papadimitriou has five worked out examples of grammars. After you have some familiarity, feel free to post any questions about concrete languages or problems.
I have seen some examples from an other book I have,but I haven't understood how to find the context-free grammars. Could you explain me how I can find it,for example,for this language: $$\{w \epsilon$${a,b}$$^{*}$$: (number of a in w)=2 * (number of b in w)} ?

#### Evgeny.Makarov

##### Well-known member
MHB Math Scholar
Could you explain me how I can find it,for example,for this language: $$\{w \epsilon$${a,b}$$^{*}$$: (number of a in w)=2 * (number of b in w)} ?
By trial and error, using common sense.

First we need to cover the empty string, so there should be the rule
$S\to\epsilon$
Next, an obvious approach is to have all other rules introduce one $b$ and two $a$s. By induction on derivation, this condition guarantees that each generated word belongs to the language, but there are grammars that don't have this restriction. The challenge is to generate all strings in the language. The newly introduced $b$ may be located to the left of the newly introduced $a$'s, between them, or to the right. So we may define the following rules.
\begin{aligned} S&\to SbSaSaS\\ S&\to SaSbSaS\\ S&\to SaSaSbS \end{aligned}
By inserting $S$ between symbols we don't lose the ability to make the intermediate string (with nonterminal symbols) into an arbitrary string in the language. We don't want to commit, for example, to $baa$ with no way to insert something between $b$ and $a$.

It seems to me that these rules should cover the language, though this grammar is probably highly ambiguous, i.e., each terminal word has many derivations. In other words, inserting $S$ everywhere is probably an overkill. Anyway, the next step is proving that this grammar generates precisely the required language.

It is possible to try to provide a more economical grammar, which is less ambiguous and more suitable for parsing.

#### evinda

##### Well-known member
MHB Site Helper
By trial and error, using common sense.

First we need to cover the empty string, so there should be the rule
$S\to\epsilon$
Next, an obvious approach is to have all other rules introduce one $b$ and two $a$s. By induction on derivation, this condition guarantees that each generated word belongs to the language, but there are grammars that don't have this restriction. The challenge is to generate all strings in the language. The newly introduced $b$ may be located to the left of the newly introduced $a$'s, between them, or to the right. So we may define the following rules.
\begin{aligned} S&\to SbSaSaS\\ S&\to SaSbSaS\\ S&\to SaSaSbS \end{aligned}
By inserting $S$ between symbols we don't lose the ability to make the intermediate string (with nonterminal symbols) into an arbitrary string in the language. We don't want to commit, for example, to $baa$ with no way to insert something between $b$ and $a$.

It seems to me that these rules should cover the language, though this grammar is probably highly ambiguous, i.e., each terminal word has many derivations. In other words, inserting $S$ everywhere is probably an overkill. Anyway, the next step is proving that this grammar generates precisely the required language.

It is possible to try to provide a more economical grammar, which is less ambiguous and more suitable for parsing.
I understand.But how can I prove that this grammar generates precisely the required language?

#### Evgeny.Makarov

##### Well-known member
MHB Math Scholar
But how can I prove that this grammar generates precisely the required language?
I'll wait for your ideas and respond to them.

#### evinda

##### Well-known member
MHB Site Helper
I'll wait for your ideas and respond to them.
Do I have maybe to do the following:
-to prove that every string produced by the grammar has the property (number of a in w)=2 * (number of b in w)
-to prove that every string with this property is produced by the grammar.

If yes,could you give me a hint how to start?

#### Evgeny.Makarov

##### Well-known member
MHB Math Scholar
Do I have maybe to do the following:
-to prove that every string produced by the grammar has the property (number of a in w)=2 * (number of b in w)
-to prove that every string with this property is produced by the grammar.
Certainly.

If yes,could you give me a hint how to start?
an obvious approach is to have all other rules introduce one $b$ and two $a$s. By induction on derivation, this condition guarantees that each generated word belongs to the language
Then

The challenge is to generate all strings in the language.
Take an arbitrary string in the language. Suppose it starts with a $b$. Then somewhere in the string there are two $a$'s. (That's an understatement!) These three symbols break the string into four (possibly empty) parts. What can be said about them? That is, this direction uses induction on the string length.

#### evinda

##### Well-known member
MHB Site Helper
Certainly.

Then

Take an arbitrary string in the language. Suppose it starts with a $b$. Then somewhere in the string there are two $a$'s. (That's an understatement!) These three symbols break the string into four (possibly empty) parts. What can be said about them? That is, this direction uses induction on the string length.
For the first step,I tried this:

-to prove that every string produced by the grammar has the property (number of a in w)=2 * (number of b in w), by induction on derivation:
Claim:For every natural number n, if w is a sentential form whose derivation is of length n, then w has the property: (number of a in w)=2*(number of b in w)
Base case: The claim holds for n=0,as the only sentential form with derivation length 0 is S,which satisfies the property. The claim also holds for n=1, as the only possible sentential forms are SbSaSaS, SaSbSaS, SaSaSbS, or the empty string.
Induction Hypothesis: Let the claim be true for n<=k. So, assume that for all sentential forms of length less than k satisfy the property.
Induction Step: Consider a sentential form w whose derivation is of length k+1.
Let the sentential form z have a derivation of length k, and by induction hypothesis it satisfies the property. We can write z=xSy, where x and y are sentential forms and we replace S in the (k+1)-step to obtain w. This S is one of the strings SbSaSaS, SaSbSaS, SaSaSbS, or ε. In any case the property is satisfied.

Is this right?

#### Evgeny.Makarov

##### Well-known member
MHB Math Scholar
Base case: The claim holds for n=0,as the only sentential form with derivation length 0 is S,which satisfies the property. The claim also holds for n=1, as the only possible sentential forms are SbSaSaS, SaSbSaS, SaSaSbS, or the empty string.
Up to this point everything is fine. A couple of remarks about the following.

Induction Hypothesis: Let the claim be true for n<=k. So, assume that for all sentential forms of length less than k satisfy the property.
First you say, for $\le k$ and then for $<k$.

Induction Step:
Stating induction hypothesis is a part of the induction step. That is, you should signal the beginning of the induction step before stating the hypothesis.

Consider a sentential form w whose derivation is of length k+1.
Let the sentential form z have a derivation of length k
From the following, $z$ is not an arbitrary sentential form with a derivation of length $k$, but a direct ancestor of $w$. This should be stated from the start.

and by induction hypothesis it satisfies the property. We can write z=xSy, where x and y are sentential forms and we replace S in the (k+1)-step to obtain w. This S is one of the strings SbSaSaS, SaSbSaS, SaSaSbS, or ε.
The last sentence has to be written more carefully. $S$ is not $SbSaSaS$.

In any case the property is satisfied.
Up to now, your reasoning is very detailed, which is fine, but this last statement is comparatively unsupported. How do you use $x$, $y$ and $z$ to prove it?

#### evinda

##### Well-known member
MHB Site Helper
Up to this point everything is fine. A couple of remarks about the following.

First you say, for $\le k$ and then for $<k$.

Stating induction hypothesis is a part of the induction step. That is, you should signal the beginning of the induction step before stating the hypothesis.

From the following, $z$ is not an arbitrary sentential form with a derivation of length $k$, but a direct ancestor of $w$. This should be stated from the start.

The last sentence has to be written more carefully. $S$ is not $SbSaSaS$.

Up to now, your reasoning is very detailed, which is fine, but this last statement is comparatively unsupported. How do you use $x$, $y$ and $z$ to prove it?
x,y,z are strings that satisfy the property.Or can't I do it like that?

#### Evgeny.Makarov

##### Well-known member
MHB Math Scholar
x,y,z are strings that satisfy the property.Or can't I do it like that?
Why do $x$ and $y$ satisfy the property?

I just wanted to say that it would be nice to write in more detail why the fact that $z$ satisfies the property implies that $w$ does, too.

#### evinda

##### Well-known member
MHB Site Helper
Why do $x$ and $y$ satisfy the property?

I just wanted to say that it would be nice to write in more detail why the fact that $z$ satisfies the property implies that $w$ does, too.
Can I explain it like that:
Since S can be replaced by one of the strings SbSaSaS, SaSbSaS, SaSaSbS, ε, then we obtain w cause z has now a derivation of length k+1.
Or am I wrong?

#### Evgeny.Makarov

##### Well-known member
MHB Math Scholar
Can I explain it like that:
Since S can be replaced by one of the strings SbSaSaS, SaSbSaS, SaSaSbS, ε, then we obtain w cause z has now a derivation of length k+1.
Or am I wrong?
Unfortunately, this is not even wrong. You are saying, "Since ..., we obtain $w$". But what kind of conclusion is this? It's as if "we obtain $w$" can be true or false; it can't at this point. In the induction step we started by considering a derivation of length $k+1$ and called its conclusion $w$. We already obtained $w$ and this fact needs no justification.

Second, $z$ is a fixed string, a direct predecessor of $w$ in the derivation. You can't say, "$z$ has now a derivation of length $k+1$" because $z$ has not changed, and the derivation we fixed in the induction step has not changed. There is $z$, and by one more rule application we obtain $w$.

Finally, you have not said anything about the number of $a$'s and $b$'s.

#### evinda

##### Well-known member
MHB Site Helper
Unfortunately, this is not even wrong. You are saying, "Since ..., we obtain $w$". But what kind of conclusion is this? It's as if "we obtain $w$" can be true or false; it can't at this point. In the induction step we started by considering a derivation of length $k+1$ and called its conclusion $w$. We already obtained $w$ and this fact needs no justification.

Second, $z$ is a fixed string, a direct predecessor of $w$ in the derivation. You can't say, "$z$ has now a derivation of length $k+1$" because $z$ has not changed, and the derivation we fixed in the induction step has not changed. There is $z$, and by one more rule application we obtain $w$.

Finally, you have not said anything about the number of $a$'s and $b$'s.
I will think about it and I will tell you what I think that I could change to improve my answer..

Last edited by a moderator: