[Go] [JS] And again about handling marc formats

    Greetings, I have written two articles (on geektimes tyts tyts ) about MARC formats.

    Today I have an article with technical details, I cleaned up the code for my solution, removed the magic from there and generally combed it.

    Under the cut: go and js friendship, hatred of marc formats.



    And so, let's start with the “core” package, for working with marc formats, the package is written in go, 63% coverage with tests.
    https://github.com/t0pep0/marc21

    “Head” of the entire package - MarcRecord structure

    type MarcRecord struct {
    	Leader         *Leader
    	directory      []*directory
    	VariableFields []*VariableField
    }
    


    And just two methods working with it, these are

    func ReadRecord(r io.Reader) (record *MarcRecord, err error)
    func (mr *MarcRecord) Write(w io.Writer) (err error)
    


    Frankly speaking, I don’t see any point in stopping at them. The only thing ReadRecord upon reaching the end of Reader returns err == io.EOF.

    We look further, we are interested in the structures of Leader and VariableField, as well as why VariableField is made as a slice and not a hashmap (because, in contrast to all standards and common sense, the situation of the existence of two different fields (in content), with one tag is possible, running ahead I’ll say in advance that this is also true for SubField)

    type Leader struct {
    	length               int
    	Status               byte
    	Type                 byte
    	BibLevel             byte
    	ControlType          byte
    	CharacterEncoding    byte
    	IndicatorCount       byte
    	SubfieldCodeCount    byte
    	baseAddress          int
    	EncodingLevel        byte
    	CatalogingForm       byte
    	MultipartLevel       byte
    	LengthOFFieldPort    byte
    	StartCharPos         byte
    	LengthImplemenDefine byte
    	Undefine             byte
    }
    


    The leader’s structure, the right word, nothing interesting, just a set of flags, and what is not exported is used only for serialization / deserialization. Two methods are attached to it - serialization and deserialization, are called from {Read, Write} Record (for other structures this is also true.

    type VariableField struct {
        Tag           string
        HasIndicators bool
        Indicators    []byte
        RawData       []byte
        Subfields     []*SubField
    }
    


    The structure of the "variable field". I want to note several interesting points right away - three-character tags, RawData - could be made a string, but for me it was more convenient to work with an array of bytes. During serialization, if the field has no subfields (len (Subfields) == 0), then RawData is written, otherwise RawData is ignored

    type SubField struct {
        Name string
        Data []byte
    }
    


    Name - one character,
    Data is truncated - again it was possible to use a string, but I decided ...

    There are no special nuances in the package, I can say only one thing at once - before adding a field, make sure that the field has at least something other than a tag, otherwise, you risk spending a lot of time thinking about the high and trying to understand why the export to OPAC \ IRBIS does not work.

    Sample code that does not change data, but, in fact, simply copies one record file to another
    package main
    import (
    	"github.com/t0pep0/marc21"
    	"io"
    	"os"
    )
    func main() {
    	orig := os.Args[1]
    	result := os.Args[2]
    	origFile, _ := os.Open(orig)
    	resultFile, _ := os.Create(result)
    	for {
    		rec, err := marc21.ReadRecord(origFile)
    		if err != nil {
    			if err == io.EOF {
    				break
    			}
    			panic(err)
    		}
                            //А здесь - делайте что хотите....
    		err = rec.Write(resultFile)
    		if err != nil {
    			panic(err)
    		}
    	}
    }
    


    Now let's move on to https://github.com/HerzenLibRu/BatchMarc

    In fact, this is the js interpreter https://github.com/robertkrimen/otto/ with the library mentioned above.

    func main() {
    	marcFile, err := os.Open(os.Args[1])
    	outFile, _ := os.Create(os.Args[2])
    	jsFile, _ := os.Open(os.Args[3])
    	jsBytes, _ := ioutil.ReadAll(jsFile)
    	jsRules := string(jsBytes)
    	if err != nil {
    		return
    	}
    	for {
    		rec, err := marc21.ReadRecord(marcFile)
    		if err != nil {
    			if err == io.EOF {
    				break
    			}
    			panic(err)
    		}
    		if rec == nil {
    			break
    		}
    		res := new(marc21.MarcRecord)
    		js := NewJSMachine(rec, res)
    		err = js.Run(jsRules)
    		if err != nil {
    			panic(err)
    		}
    		res.Write(outFile)
    	}
    }
    


    The difference from the code above is that here we open the file with js and create a js machine, passing its rules.

    Let's take a closer look at the js machine and its constructor.

    type jsMachine struct {
    	otto        *otto.Otto
    	source      *marc21.MarcRecord
    	destination *marc21.MarcRecord
    }
    func NewJSMachine(source, destination *marc21.MarcRecord) (js *jsMachine) {
    	js = new(jsMachine)
    	js.otto = otto.New()
    	js.otto.Run(classJS)
    	js.otto.Set("LoadSource", js.fillSource)
    	js.otto.Set("WriteResult", js.getResult)
    	js.source = source
    	js.destination = destination
    	return js
    }
    func (js *jsMachine) Run(src string) (err error) {
    	_, err = js.otto.Run(src)
    	if err != nil {
    		return err
    	}
    	return nil
    }
    


    As we see - everything is simple and corny, the embedding was not used consciously.

    Two functions are added to the standard otto delivery - LoadSource and WriteResult, plus class constructors (MarcRecord, Leader, VariableField, VariableSubField) are added.

    I will not detail the implementations of the function, but I will pay attention to an interesting point in otto that there is an Object type that can be reduced to js variables. The Object type has a Call method (the same goes for Set / Get methods), which allows you to call a variable method. Duck here - Object.Call does not allow calling a method on a nested class.
                source := call.Argument(0)
    	if !source.IsObject() {
    		return otto.FalseValue()
    	}
    	object := source.Object()
                //Вот так правильно
                jsValue, _ := object.Get("VariableField")
    	jsVariableFields := jsValue.Object()
    	jsValue, _ = jsVariableFields.Call("length")
                //А вот так - не правильно
                jsValue, _ = object.Call("VariableField.length")
    

    It is noteworthy that it swears at a mistake of the type, and because of this, the right decision has long crossed the mind.

    A few words about JS. There are no artificially created variables - just create an instance of the class from the MarcRecord constructor and load it with LoadSource (instance), to send changes to go at the end of the script, specify WriteResult (instance).

    PullRequest \ IssueRequest - welcome.

    Also popular now: