t0pep0 June 21, 2016 at 17:38

[Go] [JS] And again about handling marc formats

Greetings, I have written two articles (on geektimes tyts tyts ) about MARC formats.

Today I have an article with technical details, I cleaned up the code for my solution, removed the magic from there and generally combed it.

Under the cut: go and js friendship, hatred of marc formats.

And so, let's start with the “core” package, for working with marc formats, the package is written in go, 63% coverage with tests.
https://github.com/t0pep0/marc21

“Head” of the entire package - MarcRecord structure

type MarcRecord struct {
	Leader         *Leader
	directory      []*directory
	VariableFields []*VariableField
}

And just two methods working with it, these are

func ReadRecord(r io.Reader) (record *MarcRecord, err error)
func (mr *MarcRecord) Write(w io.Writer) (err error)

Frankly speaking, I don’t see any point in stopping at them. The only thing ReadRecord upon reaching the end of Reader returns err == io.EOF.

We look further, we are interested in the structures of Leader and VariableField, as well as why VariableField is made as a slice and not a hashmap (because, in contrast to all standards and common sense, the situation of the existence of two different fields (in content), with one tag is possible, running ahead I’ll say in advance that this is also true for SubField)

type Leader struct {
	length               int
	Status               byte
	Type                 byte
	BibLevel             byte
	ControlType          byte
	CharacterEncoding    byte
	IndicatorCount       byte
	SubfieldCodeCount    byte
	baseAddress          int
	EncodingLevel        byte
	CatalogingForm       byte
	MultipartLevel       byte
	LengthOFFieldPort    byte
	StartCharPos         byte
	LengthImplemenDefine byte
	Undefine             byte
}

The leader’s structure, the right word, nothing interesting, just a set of flags, and what is not exported is used only for serialization / deserialization. Two methods are attached to it - serialization and deserialization, are called from {Read, Write} Record (for other structures this is also true.

type VariableField struct {
    Tag           string
    HasIndicators bool
    Indicators    []byte
    RawData       []byte
    Subfields     []*SubField
}

The structure of the "variable field". I want to note several interesting points right away - three-character tags, RawData - could be made a string, but for me it was more convenient to work with an array of bytes. During serialization, if the field has no subfields (len (Subfields) == 0), then RawData is written, otherwise RawData is ignored

type SubField struct {
    Name string
    Data []byte
}

Name - one character,
Data is truncated - again it was possible to use a string, but I decided ...

There are no special nuances in the package, I can say only one thing at once - before adding a field, make sure that the field has at least something other than a tag, otherwise, you risk spending a lot of time thinking about the high and trying to understand why the export to OPAC \ IRBIS does not work.

Sample code that does not change data, but, in fact, simply copies one record file to another

package main
import (
	"github.com/t0pep0/marc21"
	"io"
	"os"
)
func main() {
	orig := os.Args[1]
	result := os.Args[2]
	origFile, _ := os.Open(orig)
	resultFile, _ := os.Create(result)
	for {
		rec, err := marc21.ReadRecord(origFile)
		if err != nil {
			if err == io.EOF {
				break
			}
			panic(err)
		}
                        //А здесь - делайте что хотите....
		err = rec.Write(resultFile)
		if err != nil {
			panic(err)
		}
	}
}

Now let's move on to https://github.com/HerzenLibRu/BatchMarc

In fact, this is the js interpreter https://github.com/robertkrimen/otto/ with the library mentioned above.

func main() {
	marcFile, err := os.Open(os.Args[1])
	outFile, _ := os.Create(os.Args[2])
	jsFile, _ := os.Open(os.Args[3])
	jsBytes, _ := ioutil.ReadAll(jsFile)
	jsRules := string(jsBytes)
	if err != nil {
		return
	}
	for {
		rec, err := marc21.ReadRecord(marcFile)
		if err != nil {
			if err == io.EOF {
				break
			}
			panic(err)
		}
		if rec == nil {
			break
		}
		res := new(marc21.MarcRecord)
		js := NewJSMachine(rec, res)
		err = js.Run(jsRules)
		if err != nil {
			panic(err)
		}
		res.Write(outFile)
	}
}

The difference from the code above is that here we open the file with js and create a js machine, passing its rules.

Let's take a closer look at the js machine and its constructor.

type jsMachine struct {
	otto        *otto.Otto
	source      *marc21.MarcRecord
	destination *marc21.MarcRecord
}
func NewJSMachine(source, destination *marc21.MarcRecord) (js *jsMachine) {
	js = new(jsMachine)
	js.otto = otto.New()
	js.otto.Run(classJS)
	js.otto.Set("LoadSource", js.fillSource)
	js.otto.Set("WriteResult", js.getResult)
	js.source = source
	js.destination = destination
	return js
}
func (js *jsMachine) Run(src string) (err error) {
	_, err = js.otto.Run(src)
	if err != nil {
		return err
	}
	return nil
}

As we see - everything is simple and corny, the embedding was not used consciously.

Two functions are added to the standard otto delivery - LoadSource and WriteResult, plus class constructors (MarcRecord, Leader, VariableField, VariableSubField) are added.

I will not detail the implementations of the function, but I will pay attention to an interesting point in otto that there is an Object type that can be reduced to js variables. The Object type has a Call method (the same goes for Set / Get methods), which allows you to call a variable method. Duck here - Object.Call does not allow calling a method on a nested class.

            source := call.Argument(0)
	if !source.IsObject() {
		return otto.FalseValue()
	}
	object := source.Object()
            //Вот так правильно
            jsValue, _ := object.Get("VariableField")
	jsVariableFields := jsValue.Object()
	jsValue, _ = jsVariableFields.Call("length")
            //А вот так - не правильно
            jsValue, _ = object.Call("VariableField.length")

It is noteworthy that it swears at a mistake of the type, and because of this, the right decision has long crossed the mind.

A few words about JS. There are no artificially created variables - just create an instance of the class from the MarcRecord constructor and load it with LoadSource (instance), to send changes to go at the end of the script, specify WriteResult (instance).

PullRequest \ IssueRequest - welcome.

Tags:

[Go] [JS] And again about handling marc formats

Also popular now: