Formal Definition of Inner Join?

In summary: T1 or T2.Yes, the outer join is a subset of T1x(T2∪{dummy NULL...}) where dummies are inserted only if there is no matching record in either T1 or T2.
  • #1
WWGD
Science Advisor
Gold Member
7,019
10,579
Hi All,
Hope this is not too simple/dumb. I think I have a good idea of what an inner join of two SQL tables T1, T2 along a common field F is, but , for an exercise, I am looking for a precise definition. I am looking for a definition of this sort: (please correct if necessary or let me know if this is right)
Inner Join function of relational tables T1, T2 along a common column F:
It is a function that takes as inputs a pair T1, T2 of tables and spits out a table T3 so that:

1) F does not appear in T3.

2) T3 contains as columns : all columns other than F that are either in T1 or in T2 (or both). Each of these appears exactly once in T3.

Is this right?
Thanks.
 
  • Like
Likes Silicon Waffle
Technology news on Phys.org
  • #2
Hi WWGD:

I think you are mistaken. See:
http://www.programmerinterview.com/index.php/database-sql/inner-vs-outer-joins/ .​

Regards,
Buzz
 
  • Like
Likes WWGD
  • #3
Thanks, Buzz, that helps, but the example does not seem to explain what happens when we join along a column that has repetitions; I don't mean non-atomic values, but I mean different records have the same value along the field using for joining (which may happen if the field along which we join is not a primary or candidate key in at least one of the tables). Do you know what happens in that case?
 
  • #4
Your question is just difficult at the internal level of how such these clauses have been implemented and optimized overtime instead. I think it is Yes, there should be a third table for your display but that is the final result. The optimization I think is performed in the join process itself i.e via choices made by the engine optimizer of keys (index, foreign, clustered, etc), of column types (either same or different), and of picking one table over the other to be the inner storing buffer, usually the smaller table is preferred, as a cache container for comparison of each columns to be done between the two, etc.
 
  • #5
Silicon Waffle said:
Your question is just difficult at the internal level of how such these clauses have been implemented and optimized overtime instead. I think it is Yes, there should be a third table for your display but that is the final result. The optimization I think is performed in the join process itself i.e via choices made by the engine optimizer of keys (index, foreign, clustered, etc), of column types (either same or different), and of picking one table over the other to be the inner storing buffer, usually the smaller table is preferred, as a cache container for comparison of each columns to be done between the two, etc.
Thank you, but isn't it possible to give a generalization of the table one gets by joining two tables along a common field? This is all ultimately set theory, so there should be a specific answer in terms of the rows and columns of T1, T2, shouldn't it? I understand the table created is a virtual table, but still, it should be definable.
 
  • Like
Likes Silicon Waffle
  • #6
If you see T1 and T2 as sets of records, and you have a field ##f_1## in T1 and a field ##f_2## in T2 you want to use for the inner join, then I think
##T3=\{(r_1,r_2) | r_1 \in T1, r_2 \in T1, r_1(f_1)=r_2(f_2)\}##
T3 is just a subset of T1xT2.
 
  • Like
Likes WWGD
  • #7
Hi WWGD:

What Samy said is the key:
Samy_A said:
T3 is just a subset of T1xT2.
This is true for both inner and outer joins. The difference is in how the subsets are selected. The inner join does not use NULL as a matchable value, while the outer join does. In both cases, the subset consists of those records in the cross product that have a matching value for a specified field from each source table.

You question, "Do you know what happens in that case?" is ambiguous. If you mean, "What is the result in that case?" then the above is the answer. If you mean "What goes on internally in that case?" then Silicon's answer is a good summary.

Regards,
Buzz
 
  • Like
Likes WWGD
  • #8
Correct, I should have added the condition that the field values are not NULL (been hating NULL's for some 25 years now).

Nitpick: I'm not sure that an outer join is a subset of T1xT2. If we take the outer join from T1 to T2, then we get as result the union of the records in T1 and T2 that match on the chosen field(s), and those records in T1 that have no matching record in T2 (the "T2-fields" in the outer join are then filled with NULL's: I don't know if that is by definition or just standard DB-implementation).

I guess the outer join is a subset of T1x(T2∪{dummy NULL record}).
 
Last edited:
  • Like
Likes WWGD, Silicon Waffle and Greg Bernhardt
  • #9
Samy_A said:
Nitpick: I'm not sure that an outer join is a subset of T1xT2.
Hi Samy:

I have never used an outer join, so my memory of its details is rather vague. From the somewhat limited description in the article that I linked to in my post #2, I get the impression that the plain vanilla outer join matches a NULL to a NULL, but there are also a left-outer and a right-outer that matches a non-NULL from one of the tables to a NULL in the other table. However, I may be misinterpreting this.

Regards,
Buzz
 
  • #10
Since we have been doing this post on inner joins, we might as well extended to other types of joins. Full, outer, etc. Also, does the word 'join' alone usually imply an inner join?
 
  • #11
Buzz Bloom said:
Hi Samy:

I have never used an outer join, so my memory of its details is rather vague. From the somewhat limited description in the article that I linked to in my post #2, I get the impression that the plain vanilla outer join matches a NULL to a NULL, but there are also a left-outer and a right-outer that matches a non-NULL from one of the tables to a NULL in the other table. However, I may be misinterpreting this.

Regards,
Buzz
I have used outer joins, but as left or right outer joins. The article you linked gives examples for left outer join, right outer join and inner join, but not for what you call the plain vanilla outer join. It may well be as you describe it, I don't know.
 
  • #12
WWGD said:
Since we have been doing this post on inner joins, we might as well extended to other types of joins. Full, outer, etc. Also, does the word 'join' alone usually imply an inner join?
Well, for a left outer join it would be something like this (using the same terminology as before):
##T3=\{(r_1,r_2) | r_1 \in T1, r_2 \in T2, r_1(f_1)=r_2(f_2)\} \cup \{(r_1,NULL) | r_1 \in T1, \{r_2 \in T2, r_1(f_1)=r_2(f_2)\}=\phi\} ##

Right outer join would be the same, with the role of T1 and T2 switched. As I wrote above I don't know what a vanilla outer join should be.

In MySQL, join and inner join are the same.

I noticed a typo in my representation for an inner join. It should have been:
##T3=\{(r_1,r_2) | r_1 \in T1, r_2 \in T2, r_1(f_1)=r_2(f_2)\}## (and not ##r_2 \in T1##).
 
Last edited:
  • #13
Thanks, Samy. I think I finally understood, in a more informal way. This is the way to join T1, T2 along the field F :

We create a table T3 in which:
1) We include all columns in F, both from T1, T2, having common records,
i.e., let F_11, F_12,..,F_1n ; F_21, F_22,..., F_2m be the columns in T1, T2 respectively containing entries from F
Then F_1j , F_2k are included , and listed in separate columns if they are in the intersection ## F_1. \cap F_2. ##

2) We include all rows in which F_ij, F_2k are columns of T3, as in 1) above.
 
  • #14
Buzz Bloom said:
Hi WWGD:

What Samy said is the key:

This is true for both inner and outer joins. The difference is in how the subsets are selected. The inner join does not use NULL as a matchable value, while the outer join does. In both cases, the subset consists of those records in the cross product that have a matching value for a specified field from each source table.

You question, "Do you know what happens in that case?" is ambiguous. If you mean, "What is the result in that case?" then the above is the answer. If you mean "What goes on internally in that case?" then Silicon's answer is a good summary.

Regards,
Buzz
Yes, thanks, I got it, I realized my question did not make much sense: basically, we include all rows in which we have pairs ##(f_{1i}, f_{2j}): f_{1i}=f_{2j} ##, together with the pair itself. Of course, ##f_{1i}, f_{2j} ## are entries from the common field in tables T1, T2 respectively.

In a more direct way: table contains columns common to both fields and all rows associated with those common fields. Still kind of a mouthful; will try to simplify it. But still, internally, we have some set theory and model theory, where queries get a result if there is a model for them. A query is a wff in 1st order Predicate Calculus, and the outputs are (possible) models for the wffs.
 

Related to Formal Definition of Inner Join?

1. What is the formal definition of Inner Join?

The formal definition of Inner Join is a type of join operation that combines rows from two or more tables based on a common field or attribute, returning only the rows that have matching values in both tables.

2. How is Inner Join different from other types of joins?

Inner Join differs from other types of joins, such as Outer Join or Cross Join, in that it only returns rows that have matching values in both tables. This means that any rows in either table that do not have a match in the other table will not be included in the result set.

3. What is the syntax for writing an Inner Join in SQL?

The syntax for writing an Inner Join in SQL is as follows:
SELECT [columns]
FROM [table1]
INNER JOIN [table2] ON [table1].[common_field] = [table2].[common_field];

4. What are the benefits of using an Inner Join?

There are several benefits of using an Inner Join, including:
- It allows for combining data from multiple tables into a single result set
- It ensures that only rows with matching values in both tables are included
- It can improve performance by reducing the amount of data that needs to be processed

5. When should I use an Inner Join?

An Inner Join is typically used when you want to retrieve data from multiple tables that have a common field or attribute. It is useful for data analysis, reporting, and querying databases. It is also commonly used when creating relationships between tables in a database.

Similar threads

  • Programming and Computer Science
Replies
16
Views
3K
  • Programming and Computer Science
2
Replies
51
Views
4K
  • Programming and Computer Science
Replies
3
Views
1K
  • Programming and Computer Science
Replies
5
Views
2K
  • Electrical Engineering
Replies
18
Views
4K
  • Special and General Relativity
Replies
15
Views
1K
  • Programming and Computer Science
Replies
1
Views
2K
  • Introductory Physics Homework Help
Replies
10
Views
3K
  • Introductory Physics Homework Help
Replies
7
Views
2K
  • Engineering and Comp Sci Homework Help
Replies
1
Views
1K
Back
Top