Tricky Segmentation Fault in C

  • Thread starter Enharmonics
  • Start date
  • Tags
    Fault Unix
In summary: Finally, we'll use strcatc to concatenate the current character // and the length of the line to the string. // This function will return an error if the file could not be // opened for some reason. // Otherwise, it will return the length of the string that was concatenated // by strcatc. // extern int optind; // Points to the position of the 'optind' argument.
  • #1
Enharmonics
29
2

Homework Statement


Write a C program to run on ocelot to find the total count of words and optionally the longest and or shortest words in a string input by the user or coming from a file. If there is no filename the user would be prompted to enter the string. You must use getopt to parse the command line. The string would not be input on the command line.

Usage: countwords [-l] [-s] [filename]

  • The l flag means to find the longest word in the string.
  • The s option means to find the shortest word in the string.
  • You may have both or one of the flags.
  • Output should be well formatted and easy to read.

Homework Equations


N/A

The Attempt at a Solution


My code so far:

Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

// Function Prototype for the strcatc function,
// which concatenates a char to a string

void strcatc(char* str, char c);

int main(int argc, char **argv)
{
    extern char *optarg;
    extern int optind;
    int c, err = 0;

    // The flags for longest/shortest
    // options

    int lflag = 0, sflag = 0;

    // Stores the number of words in a file/user-provided string

    int wordCount = 0;

    // Will be used to count the number of letters in each word

    int letterCount = 0;

    // Stores the size of the C-strings used

    const int STRING_SIZE = 5000;

    // TRUE/FALSE boolean value stand-ins

    const int TRUE = 1;
    const int FALSE = 0;

    // Type alias for the bool type

    typedef int bool;

    // Holds the user-provided string

    char userString[STRING_SIZE];

    // Holds the current line in the file/user-provided string.

    char currentLine[STRING_SIZE];

    // Holds the current word we're processing in the file
    // or user-provided string, which will be derived while
    // processing the current line.

    char currentWord[STRING_SIZE];

    // Holds the current character in the user-provided string.

    char currentChar;

    // These will point to the char arrays holding the longest and shortest
    // words in a string (where applicable)

    char longestWord[STRING_SIZE], shortestWord[STRING_SIZE];    static char usage[] = "Usage: %s [-l] [-s] [filename]\n";

    while ((c = getopt(argc, argv, "ls")) != -1)
        switch (c)
        {
            case 'l':
                lflag = 1;
                break;
            case 's':
                sflag = 1;
                break;
            case '?':
                err = 1;
                break;
        }

    if (err)
    {
        // Generic error message

        printf("ERROR: Invalid input.\n");

        fprintf(stderr, usage, argv[0]);
        exit(1);
    }

    // CODE FOR THE WORD COUNTER STARTS HERE

    // First, check whether the user provided a filename. We do this using
    // optind. We know we've reached the final getopt argument at this point,
    // so optind at this point represents the index position of filename (if it exists).

    char fileDirectory[STRING_SIZE];

    strcpy(fileDirectory, argv[optind]);

    // FILE pointer that will be used to attempt to open the file
    // the user provided, if indeed they did provide one

    FILE *inFile;

    // Represents the mode the file will be opened in

    char *mode = "r";

    // This will be used to iterate through each
    // line of the file, which will be stored
    // in a char array

    int i;

    // This "bool" (an alias for an int defined earlier,
    // since C has no native bool type) variable indicates whether the current
    // word being processed in the file/user-provided string
    // is the FIRST WORD

    bool firstWord;

    // Here we actually attempt to open the file

    inFile = fopen(fileDirectory, mode);

    // Check whether the file was opened successfully; If it
    // wasn't, that means the user didn't provide a file path,
    // misspelled it, the file didn't exist, etc.

    if (inFile == NULL)
    {

        // Represents the index of
        // userString, and will be used
        // to iterate through it

        int strIndex;        // Set the firstWord "bool" variable
        // to true before we begin to process
        // the user-provided string

        firstWord = TRUE;        // If there is no file to access, prompt the user
        // to enter a string

        printf("Please enter a string: ");

        // Scanf takes user's input from stdin

        scanf("%[^\n]s", userString);

        for (strIndex = 0; strIndex < strlen(userString); strIndex++)
        {

            // If the current character isn't whitespace and
            // is an alphanumeric character, increase
            // the letterCount of the current word
            // and append the character to currentWord

            if (!isspace(userString[strIndex]) && isalnum(userString[strIndex]))
            {
                letterCount++;

                strcatc(currentWord, userString[strIndex]);
            }

                // Otherwise, we've reached the end of a word,
                // so increase wordCount

            else
            {
                wordCount++;

                // Check whether the current word is
                // the first word

                if (firstWord)
                {
                    // If it is, assume it is both
                    // the longest and shortest word
                    // (this will be changed as we iterate
                    // through the line, of course)

                    strcpy(longestWord, currentWord);
                    strcpy(shortestWord, currentWord);

                    // At this point, we set the firstWord
                    // variable to 0. Because this variable
                    // is not modified anywhere else in the
                    // loop, we will only ever enter this if
                    // branch when we are processing the first
                    // word in the string

                    firstWord = FALSE;
                }

                    // If not, check whether the currentWord is
                    // longer or shorter than the current
                    // longest and shortest words, respectively

                else if (letterCount > strlen(longestWord))
                {
                    strcpy(longestWord, currentWord);
                }

                    // Note that here I use <= when comparing
                    // letterCount to the length of shortestWord.
                    // This is because shortestWord is initialized
                    // to a single blank space, so the shortest
                    // possible words ("I", "A", etc as explained earlier)
                    // would not be copied into shortestWord if I just
                    // used

                else if (letterCount <= strlen(shortestWord))
                {
                    strcpy(shortestWord, currentWord);
                }

                // Now that we're done comparing the
                // letterCount of the currentWord to
                // the current longest/shortest words,
                // reset letterCount so we can start
                // counting the number of letters in the
                // NEXT word from 0

                letterCount = 0;

                // We also reset currentWord, since we are
                // moving on to the next word in the string

                strcpy(currentWord, "");
            }
        }
    }

        // Otherwise, process the file

    else
    {

        // As we did earlier, set the "bool"
        // variable to true. We do this outside
        // the while loop below so that it is only
        // ever set to true BEFORE we begin processing
        // the file (further explanation below)

        firstWord = TRUE;

        // While loop that iterates until the end
        // of the file provided

        while (!feof(inFile))
        {

            // Extract a line of text from the file

            fgets(currentLine, STRING_SIZE, inFile);

            // The rest of this follows exactly
            // same algorithm as I did in the user-provided
            // string.

            for (i = 0; i < strlen(currentLine); i++)
            {

                if (!isspace(currentLine[i]) && isalnum(currentLine[i]))
                {
                    letterCount++;

                    strcatc(currentWord, currentLine[i]);
                }

                else
                {
                    wordCount++;

                    if (firstWord)
                    {
                        strcpy(longestWord, currentWord);
                        strcpy(shortestWord, currentWord);

                        firstWord = FALSE;
                    }

                    else if (letterCount > strlen(longestWord))
                    {
                        strcpy(longestWord, currentWord);
                    }

                    else if (letterCount <= strlen(shortestWord))
                    {
                        strcpy(shortestWord, currentWord);
                    }

                    letterCount = 0;

                    strcpy(currentWord, "");

                }
            }
        }
    }    // If the number of words isn't zero (that is,
    // if the string/file wasn't empty), increment
    // wordCount by one

    // The reason this is necessary is that my algorithm
    // doesn't actually count the words themselves - it
    // counts the spaces BETWEEN words. Because every two
    // words are separated by a single space, that means
    // that the number of spaces in a sentence is equal
    // to (number of words - 1), hence this adjustment

    if (wordCount != 0)
    {
        wordCount++;
    }

    // Output the total number of words

    printf("The total number of words is: %d\n", wordCount);

    // If lflag is set, output
    // the longest word to the user

    if (lflag)
    {
        printf("The longest word is: %s\n", longestWord);
    }

    // If sflag is set, output the
    // shortest word to the user

    if (sflag)
    {
        printf("The shortest word is: %s\n", shortestWord);
    }    fclose(inFile);

}

// Auxiliary method used to concatenate a character to a char array ("String")

void strcatc(char *str, char c)
{
    // Iterate through the memory locations that
    // make up the char array until we reach the
    // final spot

    for (; *str; str++);

    // Add the char to the end of the already-existing
    // C-string

    *str++ = c;

    // Add the "null character" to the end of the string
    // to accommodate C-string formats (null-terminated character arrays)

    *str++ = '\0';
}

So my problem this time is a weird one. When I run the program on Ocelot (I'm not totally sure what it is - it's required for the course I'm taking and basically works like a command line that uses Unix syntax. You run it through PuTTY), it works perfectly when I provide a filename. If I type in the commands

make

./countwords -l -s infile

Where infile is a file containing the sentence "This is a test", I'll get the correct output (number of words is 4, shortest "word" is the particle "a", longest is the word "this", even though it's technically tied with "test").

However, when I try to run it without providing a filename, as in

./countwords -l -s

I immediately get a segmentation fault. I have no idea what's causing it. At first, I thought it might have something to do with the line

Code:
char *mode = "r";

Since I'm assigning a value to the pointer without setting aside memory space for it, but even if I change it
to, for example,

Code:
char mode[2] = "r";

I still get the segmentation fault.

I don't know what else could be wrong. The fact that the program works perfectly when I provide a filename and only breaks down when I fail to do so tells me that the problem lies in that area (the filename).

My hunch is that it may have something to do with these lines:

Code:
char fileDirectory[STRING_SIZE];

strcpy(fileDirectory, argv[optind]);

Specifically the second one. When I provide a filename, argv[optind] returns the C-string at the end of the option list (that is, it returns the filename in ./countwords -l -s filename).

When I don't provide a filename, it returns... something else, and maybe that something isn't a string, or is otherwise incompatible with the strcpy method? That's all I can think of off the top of my head.
 
Physics news on Phys.org
  • #2
Enharmonics said:
We know we've reached the final getopt argument at this point,
// so optind at this point represents the index position of filename (if it exists).
What does optind point to if filename does not exist?
 
  • Like
Likes DrClaude
  • #3
Tom.G said:
What does optind point to if filename does not exist?

I put some thought into this. Checking a few different documentation sources for the getopt() function, I eventually found something that looked interesting:

https://linux.die.net/man/3/optind

The getopt() function parses the command-line arguments. Its arguments argc and argv are the argument count and array as passed to the main() function on program invocation. An element of argv that starts with '-' (and is not exactly "-" or "--") is an option element. The characters of this element (aside from the initial '-') are option characters.

[...]

The variable optind is the index of the next element to be processed in argv. The system initializes this value to 1.

[...]

If there are no more option characters, getopt() returns -1. Then optind is the index in argv of the first argv-element that is not an option.

So if filename doesn't exist, optind would point to the first argv element that isn't an option... except my only argv elements besides filename are -l and -s (longest and shortest word options), so there is no such element.

Given the fact that the problem is a segmentation fault, I'm guessing optind then points to NULL, -1, or some other value that isn't a valid index for an array, which is what's causing the problem? Is that right?
 
  • #4
Enharmonics said:
I'm guessing optind then points to NULL, -1, or some other value that isn't a valid index for an array, which is what's causing the problem? Is that right?
I highly suspect that is the case. (or, to put it another way, I'm guessing that is the most likely.)
I suggest re-coding to avoid trying to access something that doesn't exist. (and remember the possibility in future programs, it's very common)

There may be other errors. I haven't inspected the pgm in detail.

Tip: Sketching a flow chart of what you are programming is quite useful; inconsistencies tend to jump out at you. Your brain can process images in large chunks, but reads by a word at a time (with some overhead for translating.)
Enharmonics said:
Because every two
// words are separated by a single space,
If that is a requirement, better put it in the "Usage" message. For instance, more 'formal' English has two spaces between sentences.
 
  • #5
When getopt() indicates it parsed all options, optind is set to the index of first non-option argv.
argv[] vector is terminated by NULL element.
In this case argv[optind] will be NULL.
 
  • Like
Likes Enharmonics
  • #6
nikkkom said:
When getopt() indicates it parsed all options, optind is set to the index of first non-option argv.
argv[] vector is terminated by NULL element.
In this case argv[optind] will be NULL.

I suspected as much! I changed the line strcpy(fileDirectory, argv[optind] to

Code:
if (argv[optind] != NULL)
{
   strcpy(fileDirectory, argv[optind]
}

That fixed everything right up. Knowing that argv[optind] is NULL in cases like this will come in handy in the future. Thanks!
 

1. What is a segmentation fault in C?

A segmentation fault in C is an error that occurs when a program tries to access a memory location that it does not have permission to access. This can happen due to various reasons, such as accessing an uninitialized pointer or writing to read-only memory.

2. How can I debug a segmentation fault in C?

The best way to debug a segmentation fault in C is to use a debugger, such as GDB. This will allow you to track the program's execution and identify the specific line of code that is causing the segmentation fault. You can also use print statements or logging to track the program's execution and narrow down the cause of the error.

3. Can memory leaks cause segmentation faults in C?

Yes, memory leaks can lead to segmentation faults in C. This happens when a program allocates memory but does not free it afterwards, causing the program to run out of memory and potentially access invalid memory locations.

4. How can I prevent segmentation faults in my C programs?

To prevent segmentation faults in C, it is important to properly handle pointers and memory allocation. Always initialize pointers to NULL and check for NULL before dereferencing them. Be mindful of memory allocation and make sure to free memory when it is no longer needed.

5. Can hardware issues cause segmentation faults in C?

Yes, hardware issues such as faulty memory or a corrupted hard drive can cause segmentation faults in C. In these cases, the errors may be unpredictable and difficult to debug. It is important to perform regular hardware checks and replace any faulty components to prevent such errors.

Similar threads

  • Engineering and Comp Sci Homework Help
Replies
3
Views
4K
  • Engineering and Comp Sci Homework Help
Replies
3
Views
2K
  • Engineering and Comp Sci Homework Help
Replies
2
Views
3K
  • Engineering and Comp Sci Homework Help
Replies
1
Views
3K
  • Engineering and Comp Sci Homework Help
Replies
9
Views
3K
  • Engineering and Comp Sci Homework Help
Replies
1
Views
6K
  • Engineering and Comp Sci Homework Help
Replies
5
Views
2K
  • Engineering and Comp Sci Homework Help
Replies
5
Views
2K
  • Engineering and Comp Sci Homework Help
Replies
8
Views
11K
  • Programming and Computer Science
Replies
21
Views
4K
Back
Top