mxis March 20, 2018 at 19:27

How I parse a C-Tree database developed 34 years ago

From the sandbox

Recently, a task flew to me to supplement the functionality of one rather old program (there is no source code for the program). In fact, it was just necessary to periodically scan the database, analyze the information and based on this make mailings. The whole difficulty turned out to be that the application works with a c-tree database written already in 1984.

Having rummaged through the manufacturer’s website for this database, I found a certain odbc driver, but I couldn’t connect it at all. Numerous googles also did not help to connect to the database and get data. Later, it was decided to contact technical support and ask the developers of this database for help, but the guys honestly admitted that 34 years had already passed, everything changed 100,500 times, they don’t have normal drivers for connecting to such old things, and I guess they don’t have those programmers alive either wrote this miracle.
Having rummaged in the database files and having studied the structure, I realized that each table in the database is saved in two files with the extension * .dat and * .idx. The idx file stores information on id, indexes, etc. for faster search of information in the database. The dat file contains the information itself, which is stored in the tablets.

It was decided to parse these files on their own and somehow get this information. Go was used as the language, as the rest of the project is written on it.

I will describe only the most interesting moments that I encountered while processing data, which mainly affect not the Go language itself, but namely calculation algorithms and the magic of mathematics.

I can immediately understand that where in the files is unrealistic, since there is no normal encoding there, the data is written to some specific bits and initially looks more like a data garbage.

I received an inspirational phrase from a client with a short description of why: “**** was a MS-DOS application running on 8086 processors and memory was scarce”.

The logic of work was as follows: I assign some new data to the program, look at what files have changed, and then try to punch out bits of data that have changed values.
During the development process, I realized the following important rules that helped me figure out faster:

the beginning and end of the file (the size is always different) are reserved for tabular data
the row length in the table always takes the same number of bytes
calculations are carried out in a 256 decimal system

Scheduled Time

The application creates a schedule with an interval of 15 minutes (for example 12:15 - 14:45). Poking at different times, I found the memory area that is responsible for the clock. I read data from a file byte by byte. For time, 2 bytes of data are used.

Each byte can contain a value from 0 to 255. When you add 15 minutes to the schedule, the number 15 is added to the first byte. Plus 15 minutes and the byte is already 30, etc.
For example, we have the following values:

[0] [245]

As soon as the value exceeds 255, 1 is added to the next byte, and the remainder is written to the current byte.

245 + 15 = 260
260 - 256 = 4

Total, we have the value in the file:

[1] [4]

Now attention! Very interesting logic. If this is a range from 0 minutes to 45 minutes per hour, then 15 minutes are added to bytes. BUT! If this is the last 15 minutes in an hour (from 45 to 60), then the number 55 is added to the bytes.

It turns out that 1 hour is always equal to the number 100. If we have 15 hours 45 minutes, then we will see such data in the file:

[6] [ 9]

And now we turn on some magic and translate the values from bytes to an integer:

6 * 256 + 9 = 1545

If you divide this number by 100, then the integer part will be equal to hours, and the fractional one will be equal to minutes:

1545/100 = 15.45

Kodjara:

data, err := ioutil.ReadFile(defaultPath + scheduleFileName)
	if err != nil {
		log.Printf("[Error] File %s not found: %v", defaultPath+scheduleFileName, err)
	}
	timeOffset1 := 98
	timeOffset2 := 99
	timeSize := 1
	//first 1613 bytes reserved for table
	//one record use 1598 bytes
	//last 1600 bytes reserved for end of table
	for position := 1613; position < (len(data) - 1600); position += 1598 {
		...
		timeInBytesPart1 := data[(position + timeOffset2):(position + timeOffset2 + timeSize)]
		timeInBytesPart2 := data[(position + timeOffset1):(position + timeOffset1 + timeSize)]
		totalBytes := (int(timeInBytesPart1[0]) * 256) + int(timeInBytesPart2[0])
		hours := totalBytes / 100
		minutes := totalBytes - hours*100
		...
	}

Scheduled Date

The logic of calculating the value from bytes for dates is the same as in time. The byte is filled up to 255, then it is reset, and the next byte is added 1, etc. Only for the date has already been allocated not two, but four bytes of memory. Apparently, the developers decided that their application can survive several million more years. It turns out that the maximum number that we can get is:

[255] [255] [255] [256]
256 * 256 * 256 * 256 + 256 * 256 * 256 + 256 * 256 + 256 = 4311810304 The

reference starting date in the application equal to December 31, 1849. I consider a specific date by adding days. I initially know that AddDate from the time package has a limit and will not be able to eat 4311810304 days, but 200 is enough for the next years).
After analyzing a decent number of files, I found three options for arranging dates in memory:

reading bytes, a specific section of memory, must be carried out from left to right
reading bytes, a specific piece of memory, it is necessary to carry out from right to left
a null byte is inserted between each “useful byte” of the data. For example, [1] [0] [101] [0] [100] [0] [28]

I still could not understand why different logic was used to arrange the data, but the principle of calculating the date is the same everywhere, the main thing is to get the data correctly.

Kodjara:

func getDate(data []uint8) time.Time {
	startDate := time.Date(1849, 12, 31, 0, 00, 00, 0, time.UTC)
	var result int
	for i := 0; i < len(data)-1; i++ {
		var sqr = 1
		for j := 0; j < i; j++ {
			sqr = sqr * 256
		}
		result = result + (int(data[i]) * sqr)
	}
	return startDate.AddDate(0, 0, result)
}

Availability Schedule

Employees have an availability schedule. Up to three time intervals can be assigned per day. For example:

8:00 - 13:00
14:00 - 16:30
17:00 - 19:00

Schedule can be set on any day of the week. I needed to generate a schedule for the next 3 months.

Here is an example scheme for storing data in a file:

I cut out each record from the file individually, then shift by a certain number of bytes depending on the day of the week and there I already process the time.

Kodjara:

type Schedule struct {
	ProviderID string                  `json:"provider_id"`
	Date       time.Time               `json:"date"`
	DayStart   string                  `json:"day_start"`
	DayEnd     string                  `json:"day_end"`
	Breaks     *ScheduleBreaks         `json:"breaks"`
}
type SheduleBreaks []*cheduleBreak
type ScheduleBreak struct {
	Start time.Time `json:"start"`
	End   time.Time `json:"end"`
}
func ScanSchedule(config Config) (schedules []Schedule, err error) {
	dataFromFile, err := ioutil.ReadFile(config.DBPath + providersFileName)
	if err != nil {
		return schedules, err
	}
	scheduleOffset := 774
	weeklyDayOffset := map[string]int{
		"Sunday":    0,
		"Monday":    12,
		"Tuesday":   24,
		"Wednesday": 36,
		"Thursday":  48,
		"Friday":    60,
		"Saturday":  72,
	}
	//first 1158 bytes reserved for table
	//one record with contact information use 1147 bytes
	//last 4494 bytes reserved for end of table
	for position := 1158; position < (len(dataFromFile) - 4494); position += 1147 {
		id := getIDFromSliceByte(dataFromFile[position:(position + idSize)])
		//if table border (id equal "255|255"), then finish parse file
		if id == "255|255" {
			break
		}
		position := position + scheduleOffset
		date := time.Now()
		//create schedule on 3 future month (90 days)
		for dayNumber := 1; dayNumber < 90; dayNumber++ {
			schedule := Schedule{}
			offset := weeklyDayOffset[date.Weekday().String()]
			from1, to1 := getScheduleTimeFromBytes((dataFromFile[(position + offset):(position + offset + 4)]), date)
			from2, to2 := getScheduleTimeFromBytes((dataFromFile[(position + offset + 1):(position + offset + 4 + 1)]), date)
			from3, to3 := getScheduleTimeFromBytes((dataFromFile[(position + offset + 2):(position + offset + 4 + 2)]), date)
			//no schedule on this day
			if from1.IsZero() {
				continue
			}
			schedule.Date = time.Date(date.Year(), date.Month(), date.Day(), 0, 0, 0, 0, time.UTC)
			schedule.DayStart = from1.Format(time.RFC3339)
			switch {
			case to3.IsZero() == false:
				schedule.DayEnd = to3.Format(time.RFC3339)
			case to2.IsZero() == false:
				schedule.DayEnd = to2.Format(time.RFC3339)
			case to1.IsZero() == false:
				schedule.DayEnd = to1.Format(time.RFC3339)
			}
			if from2.IsZero() == false {
				scheduleBreaks := ScheduleBreaks{}
				scheduleBreak := ScheduleBreak{}
				scheduleBreak.Start = to1
				scheduleBreak.End = from2
				scheduleBreaks = append(scheduleBreaks, &scheduleBreak)
				if from3.IsZero() == false {
					scheduleBreak.Start = to2
					scheduleBreak.End = from3
					scheduleBreaks = append(scheduleBreaks, &scheduleBreak)
				}
				schedule.Breaks = &scheduleBreaks
			}
			date = date.AddDate(0, 0, 1)
			schedules = append(schedules, &schedule)
		}
	}
	return schedules, err
}
//getScheduleTimeFromBytes calculate bytes in time range
func getScheduleTimeFromBytes(data []uint8, date time.Time) (from, to time.Time) {
	totalTimeFrom := int(data[0])
	totalTimeTo := int(data[3])
	//no schedule
	if totalTimeFrom == 0 && totalTimeTo == 0 {
		return from, to
	}
	hoursFrom := totalTimeFrom / 4
	hoursTo := totalTimeTo / 4
	minutesFrom := (totalTimeFrom*25 - hoursFrom*100) * 6 / 10
	minutesTo := (totalTimeTo*25 - hoursTo*100) * 6 / 10
	from = time.Date(date.Year(), date.Month(), date.Day(), hoursFrom, minutesFrom, 0, 0, time.UTC)
	to = time.Date(date.Year(), date.Month(), date.Day(), hoursTo, minutesTo, 0, 0, time.UTC)
	return from, to
}

In general, I just wanted to share my knowledge about which calculation algorithms used before.

Tags:

How I parse a C-Tree database developed 34 years ago

Scheduled Time

Scheduled Date

Availability Schedule

Also popular now: