Making An INI parser.

Saturday, June 29, 2019

I think writing code to parse stuff is fun. Sometimes it’s nice to get away from regex hell and make our own fun hell.

I want to note: I have no real idea what the best way to do any of this is. There’s way smarter people than me that made much better parsers than me. I’m doing this just because it’s fun. And why can’t we have a little fun once in a while?

A simple INI file

INI files are pretty simple. Let’s parse one in javascript. INIs are really just a vague concept so we can do whatever we want!

The file to parse

key=value
hello=world
;comment here
[section]
foo=bar

What we want our object to look like after parsing

{
  key: "value",
  hello: "world",
  section: {
    foo: "bar"
  }
}

We are going to go over the file character by character. Depending on what state we are in we will do different things.

In the default state let’s trim whitespace until we get to a comment, section or key.

function isWhiteSpace(at){
	if(at === "\t" || at === " "){
  	return true;
  }
}


let state = "default";
while(index < file.length){
   let at = file[index];
  if(state === "default"){
     if(at === "\r" && file[index+1] === "\n"){
    	index+=2;
    }else if(at === ";"){
    	state = "comment";
    } else if(at === "["){
    	state = "section";
    } else if(isWhiteSpace(at){
    	//we have to negate the index++ so we don't eat the first character of the key
    	index--;
    	state = "key";	
    } 
    
    index++;
  }

Parse some keys

We want to keep track of the key name. And we will keep parsing for a key name until a character = . (We could allow it to be escaped eg \= but I don’t want to.)

//...
let keyName = "";
while(/*...*/) {
// ...

    else if(state === "key"){
        if(at !== "\n" && at !== "="){
            keyName += at;
        } else if(at === "\n"){
            state = "default";
            keyName = "";
        } else {
            state = "value";
        }
        index++;
    }
}

So now we have a key. You may have notice we just ignore a line that doesn’t have an = at all. You could throw an error or make it an empty string. But I don’t want to.

Parse a value

The process for getting a value is much the same as getting a key. We just go until the end of the line. But we also want to set the key to the value when we get to the end of the line.

//...
const result = {};
//at first the section is just the root of result
let section = result;
let value = ""
while(/*...*/){
//... All that other code...

    else if(state = "value"){
        if(at !== "\n"){
    	  value += at;
        } else {
          //End of the line let's do this
          section[keyName] = value;
          keyName = "";
          state = "default";
        }
        index++;
    }
}

You’ll notice how I use section instead of result there because we also …

Parse a section

When we get to a section we need to nest. That is why we want a final result as well as a reference to the current section we are in

//...
let sectionName = "";
//while(){
//...other stuff
else if(state == "section") {
    if(at === "]" || at === "\n"){
    	result[sectionName] = {};
        section = result[sectionName];
        sectionName = "";
    	state = "default"
    } else {
    	sectionName += at;
    }
  	index++;
}
//}

So now we have sections for this. We have no end of section so it’s the current section until a new one comes along.

Parsing comments

Comments are easy we just do nothing until the new line.

else if(state === "comment") {
  	if(at === "\n"){
    	state = "default"
    }
  	index++
  }

The entire parser

Okay I broke it down to show you and so here’s the entire thing.

const file = `

key=value
hello=world
;comment here
[section]
foo=bar

`

function isWhiteSpace(at){
	if(at === "\t" || at === " "){
  	return true;
  }
}


let index = 0;
const result = {};
let section = result;
let state = "default";
let keyName = "";
let value = "";
let sectionName = "";
while(index < file.length){
	let at = file[index];
  if(state === "default"){
  	if(at === ";"){
    	state = "comment";
    } else if(at === "["){
    	state = "section";

    } else if(!isWhiteSpace(at)){
    	state = "key";
	index--;
    }
    index++;
  } else if(state === "key") {
  	if(at !== "\n" && at !== "="){
    	keyName += at;
    } else if(at === "\n"){
    	 state = "default";
       keyName = "";
    } else {
    	state = "value";
    }
    index++;
  } else if(state === "value") {
  	if(at !== "\n"){
    	value += at;
    } else {
   		//End of the line let's do this
      section[keyName] = value;
      keyName = "";
      value = "";
      state = "default";
      
    }
    index++;
  } else if(state == "section") {
  	if(at === "]" || at === "\n"){
    	result[sectionName] = {};
        section = result[sectionName];
        sectionName = "";
    	state = "default"
    } else {
    	sectionName += at;
    }
  	index++;
  } else if(state === "comment") {
  	if(at === "\n"){
    	state = "default"
    }
  	index++
  }
  
}

So parsing stuff can be fun. Try it out here. Give it some other input and see where it breaks. I bet I made a lot of stupid mistakes. But that’s part of the fun.

Let me know if you have any questions and checkout out my project: DropConfig

This post is also available on DEV.