Project logfind: fclose() error

I’ve been trying to tackle Project logfind by myself and I thought to add the line numbers and indexes as an extra feature. I used two files: log_input.c checked for matches in the files and printed them, and logfind.c processed the arguments and did the globbing.

The code ended up pretty gnarly-looking in log_input.c but when it was completed, I ran into a new error, corrupted size vs. prev_size which was unexpected. Reading up online, I found it could happen if you wrote more bytes than malloc or realloc gave you. But when I ran a backtrace using lldb, it seemed to originate from log_input.c:135 which was just a normal fclose(). Since I couldn’t find anything on fclose() outputting errors like that, I decided to turn here.

log_input.c:

#include <stdio.h>
#include <stdlib.h>
#include "dbg.h"

struct Linedet {
	int lnum;
	int *s;
	int *l;
	char *line;
	char **matches;
};

struct Matchdata {
	int s;
	int l;
	char *match;
};

char *match[] = {"random"};
int match_ln = 1;

void die (char *message)
{
	
	if (errno) {
		perror (message);
	} else {
		printf ("%s\n", message);
	}
	
	exit (1);
}

struct Matchdata *check_input (char *string, int *found)
{
	int i = 0;
	int matches = 0;
	int strln = strlen (string);
	struct Matchdata *mts = NULL;

	for (i = 0; i < strln; i++) {
		int k = 0;
		while (k < match_ln) {
			if (string[i] == match[k][0]) {
				debug ("Letter matched...");
				int j = 0;
				int mln = strlen (match[k]);
				for (j = 1; j < mln; j++) {
					if (string[i+j] == match[k][j] && j == (mln - 1)) {
						debug ("String matched...");
						struct Matchdata mat = { .s = i, .l = i+j, .match = match[k] };
						mts = realloc (mts, sizeof (struct Matchdata) * (matches + 1));
						mts[matches] = mat;
						matches++;
					} else if (string[i+j] == match[k][j]) {
						debug ("Another letter matched...");
						continue;
					} else {
						break;
					}
				}
			}
			k++;
		}
	}
	*found = matches;

	return mts;

}

struct Linedet *readline (const char *filename, int *ar_size, int *match_no)
{
	struct Linedet *array = NULL;
	
	int i = 0;
	int n = 1;
	int found;
	int matches;
	int j = 0;
	char *rc;
	char *str = malloc (256);

	debug ("Opening file...");
	FILE *file = fopen (filename, "rb");
	if (file == NULL)
		die ("File open failed");
	
	debug ("Starting while loop...");
	while (n) {
		debug ("Reading from file...");
		rc = fgets (str, 256, file);
		if (rc == NULL)
			break;
		debug ("Calling check_input...");
		struct Matchdata *irc = check_input (str, &matches);
		if (irc != NULL) {
			size_t mt_size;
			found = 1;
			debug ("Allocating memory for array...");
			array = realloc (array, sizeof (struct Linedet) * (i+1));
			if (array == NULL)
				die ("Memory error");
			debug ("Creating structure...");
			struct Linedet lndt = {.lnum = n, .line = strdup (str)};
			for (j = 0; j < found; j++) {
				mt_size += strlen (irc[j].match) + 1;
				lndt.s = realloc (lndt.s, sizeof (int) * (j+1));
				lndt.l = realloc (lndt.l, sizeof (int) * (j+1));
				lndt.matches = realloc (lndt.matches, mt_size);

				lndt.s[j] = irc[j].s;
				lndt.l[j] = irc[j].l;
				lndt.matches[j] = irc[j].match;
			}
			debug ("Assigning structure to array...");
			array[i] = lndt;
			i++;
		}
		n++;
		free (irc);
	}
	*ar_size = i;
	*match_no = matches;

	debug ("Exited while loop...");

	debug ("Closing file and freeing memory...");
	fclose (file);
	free (str);

	if (found) {
		return array;
	} else {
		goto error;
	}

error:
	return NULL;
}

void memfree (int ar_size, int match_no, struct Linedet *array)
{
	int i = 0;
	for (i = 0; i < ar_size; i++) {
		free (array[i].line);
		free (array[i].s);
		free (array[i].l);
		free (array[i].matches);
	}

	free (array);
}

int main (int argc, char *argv[])
{
	if (argc != 2)
		die ("USAGE: log_input <file>");
	int ar_size;
	int match_no;
	struct Linedet *array = readline (argv[1], &ar_size, &match_no);

	int i = 0;
	int j = 0;
	for (i = 0; i < ar_size; i++) {
		printf ("In line %d: \n", array[i].lnum);
		for (j = 0; j < match_no; j++) {
			printf ("Found '%s' at index %d:\n", array[i].matches[j], array[i].s[j]);
			printf ("Text: %s", array[i].line);
		}
	}
	
	debug ("Freeing array...");
	memfree (ar_size, match_no, array);

	return 0;
}

I’ve been working on this a few days and I was almost about to scrap it and start new because I didn’t like how it looked, but this error caught my notice and I thought I’d ask what it meant here. At the time I was unaware of strstr() so I created my own function to check for matches. I hope someone can help. Thanks in advance!

PS: I’d be grateful if you could tell me where I could improve my code.

Edit: A few more tests revealed that most random text I put in the file gave the error realloc(): invalid next size. I still would like to know if fclose() could indeed output that error, and why. The new error seems to come from log_input.c:52, the reason for which I still haven’t figured out.

1 Like

Ha! I did exactly the same thing when I did logfind. Felt a little stupid afterwards… :slight_smile:


Anyway, these two lines (52–53 in my editor) in check_input look suspicious to me:

mts = realloc (mts, sizeof (struct Matchdata) * matches);
mts[matches] = mat;

As far as I can tell, matches is still 0 when you first hit these lines, so you would “resize” mts to a size of zero and then try to write to it.

I don’t think fclose can cause that other error, but I’m by no means an expert.


As for improvements, your code is rather deeply nested. I would at least pull the string matching routine out of check_input into a stand-alone function (or use strnstr).

I also think you could (should?) use for loops instead of while loops in a few places, eg. line 42, 90. That’ll improve legibility a lot – and be safer, too. The latter, while (n)... with n++ at the end looks especially hazardous. In what way does this loop depend on n at all?

You’re already kind of using arrays dynamically. Maybe go on for now and revisit the exercise after you created the first few data structures? Memory management will be a lot easier with a simple list or dynamic array.

Clearer variable names and a few comments would go a long way to make it easier for others to understand what’s going on. Just saying… :wink:

I’m not sure if it’s a good idea to just use realloc everywhere. If you had malloc when you first initialize something, that could make things clearer for others (and maybe even for yourself).

Thanks for your reply :grinning: . I’ll try to improve my code according to what you said. At first, I did think of adding comments, but then other stuff like making the code work took precedence, and I totally forgot about it. Next time I’ll give it priority. About the variable names, I’m just terrible at creativity, so instead of cool related names, I get that kinda random names instead. I’ll try and improve, I swear :stuck_out_tongue_winking_eye:.

The deeply nested part was what made me think of scrapping it and starting over, but I’d never seen fclose() output errors like that, and I got caught up searching for possible causes. About using realloc() everywhere, I read the man page and it said it would be the same as calling malloc() if the pointer had no memory allocated to it, so I went for it. Like everything else, I’ll try to change it up next time.

Since you asked what the variable n was for, I used it for storing the line numbers of all the lines that had matches. And I’ll try using for loops more, but I must’ve thought at the time that using a for loop would add unnecessary complication to the program, but whatever. The matches variable was indeed meant to be zero at first and the line of code you pointed out actually was matches + 1 but for some reason unknown to me, I think I deleted it.

Anyway, thanks a lot for your reply. I’ll try to improve my code as a I go along. I still got a long way to go to hack on the Linux Kernel :yum: which is what I got in mind. For now.

1 Like

I know exactly what you mean. :slight_smile: It’s always easy to tell others how to improve…

Yes, but does that have any impact on how often the loop runs? That’s what I meant. Isn’t this just an infinite loop with a specific exit condition (fgets returns NULL)? If so, it’d probably be better to use the standard idiom for infinite loops (while (1)) or embed the exit condition into a for loop:

for (rc = fgets(...); rc != NULL; rc = fgets(...)) ...

The thing is, with a for-loop everybody who reads the code can get a pretty accurate idea of the looping conditions right at the top. In while loops the relevant changes tend to be scattered across the body.

Yeah… sounds like a great goal! And just keep going! (My logfind didn’t look any less gnarly at the time. :slight_smile: )

Editing the line with the matches variable you mentioned seems to make it work fine, no realloc(): invalid next size, but you never know when another error might pop up, so I’m doing some tests now. It was positively weird when I saw the backtrace pointing it to the fclose() since I hadn’t found any mention of something like that anywhere in the man pages or in Stack Overflow, which led me to ask here. Again, thanks for pointing out all that :smiley:.

If I may ask, have you ever hacked on the Kernel code? And if you have, how long did it take you to get there? I’m asking because it’s one of my main goals, one which would tell me I’m competent as a developer. As for now, I’m pretty bad. Even attempting to create a webpage using Python and Django seems almost impossible, mostly because I forgot the classes and functions, also in no small part because of my own incompetence. But I digress.

No, it does not really have any impact on how often the loop runs. I’ll fix it next time. But, if I may, is there a particular reason why the standard idiom for the infinite loop is preferred? I just remember coding practices better when I know the reasons associated with it. People sometimes like to debate things like the ambiguity of using the parentheses in sizeof operator and other similar topics. I usually prefer to take a more pragmatic approach.

I think it’s mostly clarity: while (1) makes it immediately clear that it’s an infinite loop. while (x) could run any number of times depending on what x is and what happens to it.

Hehe. No. I’m about 30 exercises ahead of you in the book. :slight_smile:

You know, I’m a conductor (or was before Corona stopped my line of work from existing), and cynics say the core competence of a conductor is appearing to be competent… Looks like I just proved I’m good at that. :stuck_out_tongue:

Don’t fret. Everyone’s bad when they’re learning stuff. It’s pretty much the definition of learning. Just keep going. But maybe set yourself some smaller goals for the way, just to make sure you don’t lose the sense of achievement.

Well, thanks for all the help. Anyway, I’ll try to see if I can catch up with you in the book :stuck_out_tongue:. After all, I’m free almost all the time. Projects like logfind are the real time-killers.

Hmm that is a lot of code to go through, so I have two suggestions:

  1. If you can get valgrind and run this under valgrind then definitely do that. Even if you have to install Linux in a virtual box it’ll save you hours of debugging this. https://valgrind.org/ and also https://www.virtualbox.org/
  2. I see that this is probably a “stream of consciousness” dive into your idea. If you can’t figure it out from valgrind then I suggest you do scrap this and take a more methodical approach using this as a guide.

The methodical approach is to take what you learned here and start breaking it into functions that do simple things you can test. For example, inside your readline you have a lot of code for building a Linedet struct, but you should create a Linedet_create() and Linedet_destroy() function set that do this for you reliably, so you can test that only that works and add a bunch of check macros to the beginning to confirm inputs. In fact, every place you are using malloc/realloc/free to craft a struct should go in a foolproof function with loads of checking.

Another thing is you aren’t using the full power of the dbg.h. You should adopt a policy of using the check macro on every return value that you aren’t using. Every function you call you should look at the man page (man strlen for example), determine it’s return values, and check for an error condition value.

Next, this right here is a huge no-no:

int strln = strlen (string);

You never do this and should stick to standard names instead of trying to save 1 char in typing. You should also never ever use strlen and instead at least use the strnlen variant that lets you set a max size on the input. But, be careful because strnlen has a weird return value and strange guarantees.

Finally, I’m not quite sure what you’re trying to do with the check_input function, but describe to me the problem you’re trying to solve and I might just have some functions you can use to solve it easier without hand coding this.

First of all, thanks for your reply. I’ll try to use strnlen more instead of strlen where I can, and try to avoid using variable names that way. I just didn’t know about it since both had different man pages.

I did run it under valgrind every time I made even a minor modification and made sure there were no errors. And I also use Linux in a VM, which is what makes me lazy to start up the VM everytime I code, I guess. I just made a few modifications to the code to use check_mem() everywhere I can and now valgrind is giving me a lot of errors telling me about bytes definitely lost and possibly lost. I’ve just been trying to fix it for more than a couple hours and failed miserably.

Before this edit, the only error in valgrind was Conditional jump or move depends on uninitialised value(s) but I can’t make sense of anything after the edit, so I think I’d rather start over.

Yes, indeed. After the aforementioned edit making things very very complicated, I do think it would be smarter to scrap the code and begin again. Definitely did learn a lot about different error messages, return codes and other useful data with my searching around for a solution. I don’t exactly know what you mean by “stream of consciousness” dive into my idea, but it probably seems like it because I did it over the span of a few days, with other stuff to occupy mind whenever I was not coding.

The check_input function was meant to return whatever word(s) it matched given in the match global variable, the starting index and ending index as an array of structs so I can use the values later to do whatever I wanted with it.

I really don’t like looking at my own code when it looks like this, so I’m going to try to start over using more functions and a little less complexity if I can. Any help on ways I could use to solve it easier with less code is very much appreciated.

Alright, starting over is a good idea. I suggest you try using the testing framework I have and code this from the “bottom up”. That means:

  1. You already know the pieces you need, but you’ve kind of intertwined them together.
  2. This time, start with implementing each piece you need, and write a test for it that makes sure it works.
  3. Each piece should be a function that take in input and outputs what you want to feed the next function.
  4. Each test should handle trying to break it with bad input.
  5. You then extensively use the check macros to make sure each function works.
  6. Then as you build each piece you can put together the final result.

Try that.

I’ve been trying what you said and it seems to work so far. But, now that I’m almost completed, I’m running into a very much new and weird error. It comes from the dbg.h: check macro, and I can’t find anything wrong with it.
Here’s the error:

cc -Wall -g     input.c   -o input
In file included from input.c:4:
input.c: In function ‘readline’:
dbg.h:28:26: error: expected ‘,’ or ‘;’ before ‘if’
   28 | #define check(A, M, ...) if(!(A)) {\
      |                          ^~
input.c:128:2: note: in expansion of macro ‘check’
  128 |  check (file != NULL, "File open failed");
      |  ^~~~~
make: *** [<builtin>: input] Error 1

And here’s my dbg.h:

#ifndef __dbg_h__
#define __dbg_h__

#include <stdio.h>
#include <errno.h>
#include <string.h>

#ifdef NDEBUG
#define debug(M, ...)
#else
#define debug(M, ...) fprintf(stderr, "[DEBUG] %s:%d: " M "\n",\
        __FILE__, __LINE__, ##__VA_ARGS__)
#endif

#define clean_errno() (errno == 0 ? "None" : strerror(errno))

#define log_err(M, ...) fprintf(stderr,\
        "[ERROR] (%s:%d: errno: %s) " M "\n", __FILE__, __LINE__,\
        clean_errno(), ##__VA_ARGS__)

#define log_warn(M, ...) fprintf(stderr,\
        "[WARN] (%s:%d: errno: %s) " M "\n",\
        __FILE__, __LINE__, clean_errno(), ##__VA_ARGS__)

#define log_info(M, ...) fprintf(stderr, "[INFO] (%s:%d) " M "\n",\
        __FILE__, __LINE__, ##__VA_ARGS__)

#define check(A, M, ...) if(!(A)) {\
    log_err(M, ##__VA_ARGS__); errno=0; goto error; }

#define sentinel(M, ...)  { log_err(M, ##__VA_ARGS__);\
    errno=0; goto error; }

#define check_mem(A) check((A), "Out of memory.")

#define check_debug(A, M, ...) if(!(A)) { debug(M, ##__VA_ARGS__);\
    errno=0; goto error; }

#endif

And here’s my new input.c:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "dbg.h"

#define MAX_SIZE 256

// To store data about strings found
// if and_args[i] is found, then and[i]
// in the struct is updated to 1
struct Data {
	int *and;
	int *or;
};

// What to find is given here
char *and_args[] = {"One", "Two"};
int and_size = 2;

char *or_args[] = {"Three", "Four"};
int or_size = 2;

struct Data *dat_create ()
{
	struct Data *dat = malloc (sizeof (struct Data));
	
	dat->and = malloc (sizeof (int) * and_size);
	dat->or = malloc (sizeof (int) * or_size);

	int i = 0;

	// Initializing all values as zero
	for (i = 0; i < and_size; i++) {
		dat->and[i] = 0;
	}
	
	for (i = 0; i < or_size; i++) {
		dat->or[i] = 0;
	}

	return dat;
}

int dat_destroy (struct Data *dat) {
	check (dat != NULL, "NULL pointer received, aborting");
	
	if (dat->and != NULL)
		free (dat->and);
	if (dat->or != NULL)
		free (dat->or);

	free (dat);

	return 0;
error:
	return 1;
}

// Function to set value of corresponding index
// of and_args/or_args to 1 if it is found in a 
// line
int update (int index, struct Data *dat, char x)
{
	// Checking if it is and/or argument
	// and setting values to 1 for
	// corresponding indexes
	if (x == 'a') {
		if (dat->and[index] != 1)
			dat->and[index] = 1; 
	} else if (x == 'o') {
		if (dat->or[index] != 1)
			dat->or[index] = 1;
	} else {
		sentinel ("Invalid value for x received");
	}
	
	return 0;
error:
	return 1;
}

// Looks for matches in string and passes it to
// update() to set variables in the struct Data
int find (const char *string, struct Data *dat)
{
	// For checking the return values
	int success = 0;
	char *rc = NULL;
	
	int i = 0;

	// Looping through and_args
	for (i = 0; i < and_size; i++) {
		// Checking the string
		rc = strstr (string, and_args[i]);

		if (rc != NULL)
			update (i, dat, 'a');
		
		if (!success)
			goto error;
	}
	
	// Looping through or_args
	for (i = 0; i < or_size; i++) {
		// Checking the string
		rc = strstr (string, and_args[i]);

		if (rc != NULL)
			success = update (i, dat, 'o');

		if (!success)
			goto error;
	}
	
	return 0;

error:
	return 1;
}

// Function reads lines one by one from files
// and passes it to find()
struct Data *readline (const char *filename)
{
	// Opening the file
	FILE *file = fopen (filename, "rb")
	check (file != NULL, "File open failed");

	// For checking return values
	char *rc = NULL;
	int value = 0;

	// Argument for find() and update()
	// and to return to main()
	struct Data *dat = dat_create ();

	// For storing the lines read
	char *str = malloc (MAX_SIZE);
	check_mem (str);
	
	while (1) {
		// Reading line from file
		rc = fgets (str, MAX_SIZE, file);
		if (rc != NULL) {
			value = find (str, dat);
			
			// Checking value and aborting if necessary
			if (!value)
				goto error;
		} else {
			break;
		}
	}

	return dat;

error:
	return NULL;
}

int main (int argc, char *argv[])
{
	check (argc == 2, "USAGE: ./input <file>");

	struct Data *dat = readline (argv[1]);
	if (dat == NULL)
		goto error;
	
	// To check no. of args matched
	int and = 0;
	int or = 0;
	
	int i = 0;
	for (i = 0; i < and_size; i++) {
		and += dat->and[i];
	}
	
	for (i = 0; i < and_size; i++) {
		or += dat->or[i];
	}

	if (and == and_size && or >= 1) {
		printf ("%s\n", argv[1]);
	}
        dat_destroy (dat);
	return 0;
error:
	return 1;
}

If it is happening because of an error in the code, please let me know. Thanks again!

Yes, you’re doing this most likely:

if(blah)
    check(...);

In fact, I see you do this a lot. You should always include the { } characters on if-statements you write. In C this is one of the things that causes errors in security handling because it’s ambiguous in many cases. You’ll get code like this a lot:

if(be_secure)
    fix_security();
    dont_fail();

It’s easy to miss because the dont_fail() is indented, so visually you see it as part of the if(be_secure), but it’s actually not because there’s no { } on the if. That code is actually interpreted as:

if(be_secure) {
    fix_security();
}
dont_fail();

But what the person writing probably meant was:

if(be_secure) {
   fix_security();
   dont_fail();
}

This is so common a cause of security bugs that the best option is to always add { } until you’re very good at C, and even then I only avoid them on super simple if checks.

So go to line 128 in input.c, look above any check macros you’re using, and see if you did this. Then go and fix all of your if-statements to always have {} and it should fix a lot of other things.

Thanks for the reply! Well, I just tried what you said but it still returns the same error. You did mention in the book that we shouldn’t use if statements without {}, but I ignored it, sorry for that. I still seem to be missing something incredibly obvious, and it still refuses to compile. Looking at the code has done nothing so far.

Anything else I could try right now?

Yes, go to line 128 and delete it.

It’s got nothing to do with the check macro. You actually have an error on the line above it. Any time you get an error first look on the line it mentions, then start going backwards to check each line above. Chances are the error location is just where the compiler landed after hitting the real error.

For example ; let’s say I had typed ; something and ;

Oh, damn. Thanks a lot. I never thought I’d miss something like that. I’ll finish up with the whole program in a while and post the code then. Next time I’ll be more careful about semicolons.

1 Like