Loriowar September 8, 2016 at 11:13

How to work with MS Access on Linux

Transfer

Many people use Access ... even in production ... even to this day. Therefore, there are times when someone wants to connect to this database from some unexpected place. For example, from a Unix server. Of course, you want to connect not just like that, but to use the data from Access in a web application. And, without any doubt, there will be a desire to use this data together with information from other, more modern databases.

So, I want to describe several approaches to communicating with a creature called MS Access. Therefore, the initial task is as follows: establish a connection to MS Access from a Ruby on Rails application or from PostgreSQL (using FDW ) and gain access to data, preferably in real time.

Below I will try to collect all the information related to the problem described above and try to describe non-trivial cases and underwater stools. I hope this description saves someone time ... or simply, to some extent, amuses the respected public.

Immediately tldr is for those who care only about the facts and the opinion of the author on this issue.

Convert to CSV

To begin with, I will describe a simple working solution. It is guaranteed to work on Ubuntu 14.04. Should work on other Linux distributions. And it does not require any ~~higher / sacred~~ knowledge, skills and other magic.

There is such a thing mdbtools . It is put very simply:

sudo apt-get install mdbtools

Details on its dependencies, manual assembly, package features, and much more can be found on GitHub's page .

This package provides a bunch of different tools for working with Access. We will not consider the whole bunch, but dwell on one. The one that can turn mdb files into csv:

mdb-export 'mdb-file' 'table-name' > result.csv

As a result, we get a csv file with the contents of the specified table. Further, this file can be subjected to all conceivable and unthinkable processing and tortures, because csv is an insanely simple and widespread format.

Executing Queries in MS Access

Now a more difficult task: to execute an SQL query, having in your hands an mdb file and a machine with something unix. It is easy to guess that you need to put a couple more packages and create several configs.

First, you need ODBC . This is the standard API for communicating with the database. In Unix, unixODBC exists for these purposes . Its installation is very simple:

sudo apt-get install unixodbc libmdbodbc1

The second package contains libmdbodbc.so , which will be needed just below.

The next step is to find the appropriate ODBC driver for MS Access. The closest available is the driver from mdbtools. Next, you need to dig deeper into the configs: describe the driver and declare the database.

We describe the driver in /etc/odbcinst.ini :

[MDBTools]
Description     = MDBTools Driver
Driver          = libmdbodbc.so
Setup           = libmdbodbc.so
FileUsage       = 1
UsageCount      = 1

And the database is declared in /etc/odbc.ini:

[testdb]
Description = test
Driver      = MDBTools
Database    = /opt/db/MS_Access.mdb

It should be noted that in “Driver” you need to specify the name of the driver, which is described in odbcinst.ini.

More about odbcinst.ini and odbc.ini can be found here .

So, the configuration is done. Now you can start executing the queries. For these purposes, we will use the isql utility from the unixODBC package:

isql testdb

If everything is done correctly, then a console should appear to execute requests:

SQL> SELECT * from "Раздел"
+------------+-----------------------------------------------------+
| Код        | Раздел                                              |
+------------+-----------------------------------------------------+
| 1          | Документация                                        |
| 2          | Сборочные единицы                                   |
| 3          | Детали                                              |
| 4          | Комплекты                                           |
+------------+-----------------------------------------------------+
SQLRowCount returns 4
4 rows fetched

Lastly, it is worth noting that there is an analog of isql with Unicode support. It is called iusql.

Oddities isql

Honestly, the isql utility is pretty stub. It has a bunch of syntax restrictions and no user friendliness or understanding. For example: put a semicolon at the end of the expression - get an error and try to guess what it is because of. There are no hints, tips and other delights of modern design here. This is not PotgreSQL, which will kindly say that you made a mistake in the expression and offer the correct option. Here you will simply be sent and not even informed of the reason. Therefore, for at least some ease of working with isql, the pyodbc-cli shell was created . With its help, you can at least somehow weaken the struggle with isql and focus on writing queries.

Exotic table / column encodings

There are many rumors about the 'Charset' parameter, which affects the used stake page. Here is an example of using this parameter:

[testdb]
Description = test
Driver = MDBTools
Database = /opt/db/MS_Access.mdb
Charset = CP1251

The effect of this parameter on the operation of isql has not been noticed. In isql I can work both with mdb files containing Cyrillic and with regular Unicode mdb files. At the same time, the iusql utility, regardless of the 'Charset' parameter, generated a lot of question marks (like these: ) when working with a Cyrillic mdb file.

Alternatives to isql

An alternative for isql is mdb-sql from the mdbtools package. This utility does not need ini files. You just need to set it on a specific mdb file:

mdb-sql /opt/db/MS_Access.mdb

The man page will answer well all questions about using the utility . The only feature: the utility could not swallow the aforementioned Cyrillic mdb-file. There were no problems with unicode files.

Ruby / Rails Path

Now is the middle of 2016, the last release of MS Access was on September 22, 2015. But this is bad luck, the last work on the adapter for ActiveRecord is dated 2008. Therefore, I, as is customary, have two news: good and bad.

I'll start with the good one: there is odbc-rails and its reincarnation activerecord-odbc-adapter .

And now the bad one: as noted above, the last commits to the adapter repository are dated 2008 and support for Rails and ActiveRecord versions one and two are declared; therefore, I don’t know how to run it on Rails 3+ (and whether it can be done at all). The reasons for my ignorance are approximately the following. First: the adapter has bad documentation (but rather its absence). And secondly: there is no desire to go into the source, understand and bring them back to life. So if you have enough knowledge, experience and time - you can finish and describe how to use it. Good luck in this case!

Ruby-ODBC

Since everything is sad with the adapter, you can look the other way. One side is called ruby-odbc .

The last update of this gem is dated 2011, but, at the moment, it is more or less working. To install the gem, you need to perform simple steps:

sudo apt-get install unixodbc unixodbc-dev
gem install ruby-odbc

Without the unixodbc-dev package, compiling the native extension will fail with the error: ERROR: sql.h not found .

Further, we assume that ODBC is configured on the system (that is, the odbcinst.ini and odb.ini files are present ). In this case, you can open irb and do the following:

001 > require 'odbc'
 => true
002 > client = ODBC.connect("testdb")
 => #
003 > statement = client.prepare 'SELECT * FROM "Раздел"'
 => # 
004 > statement.execute
 => #
005 > first_row = statement.fetch
 => [1, "\xD0\x94\xD0\xBE\xD0\xBA\xD1\x83\xD0\xBC\xD0\xB5\xD0\xBD\xD1\x82\xD0\xB0\xD1\x86\xD0\xB8\xD1\x8F\x00"] 
006 > first_row[1].force_encoding("utf-8")
 => "Документация\u0000"

More information about the syntax and available ruby-odbc gem commands can be found in the ruby-odbc / test directory on GitHub.

Mdb gem

This gem provides a DSL for working with mdb files. And she looks pretty cute. But there is a nuance: a gem is just a Ruby wrapper over the above mdbtools. That is, the gem converts mdb to csv and processes this csv in memory. No magic or direct access to the database.

Alternative for ODBC driver

There is a commercial version of the ODBC driver for MS Access . But there is no factual information about him. In an optimistic version, this adapter will help with advanced queries in Access (the driver from mdbtools has a lot to do: no LIMIT, GROUP, AS, etc.). But this is only speculation. What will actually be possible to find out only by buying it, or by taking the 14-day trial, which is available after registration on the site. In addition to this information, there were no user reviews, no bug reports, or any mention that someone used the driver and he helped him with something.

PotgreSQL Path

For Postgres, there is an OGR extension . It is part of GDAL . Which, in turn, is a huge library for converting raster and vector geospatial data formats. For our current purposes, the purpose of the library has absolutely no meaning. The main thing is that it is stated that it is able to work with the mdb format.

Installation

First you need to put a few dependencies:

sudo apt-get install gdal-bin libgdal-dev
sudo apt-get install postgis postgresql-9.3-postgis-2.1

This team will pull a ton of addictions ... but that's fine. The first set of packages for ogr_fdw, the second for postgis.

Step two: collecting pgsql-ogr-fdw from source. Here is a small bash style manual:

git clone git@github.com:pramsey/pgsql-ogr-fdw.git
cd pgsql-ogr-fdw
sudo apt-get install postgresql-server-dev-9.3
sudo apt-get install checkinstall
make
sudo checkinstall

Yes, you can take make install, but we don’t want the cats to suffer . In the appeared dialog from checkinstall, it is necessary to correct the “version” parameter. You need to make it in the format of "numbers separated by dots" (for example: '0.1.0'). Otherwise, with default values, the package assembly will fail.

Step three: go and install extensions in Postgres:

CREATE EXTENSION ogr_fdw;
CREATE EXTENSION postgis;

There is a suspicion that postgis is superfluous here, but the Readme on GitHub says that both are needed, therefore I will leave this question to inquisitive readers.

Step four: time to create FDW. There are two possible ways to work with Access in ogr_fdw. The first uses system ODBC. Details about this option can be found here . The second one is more interesting, it uses the MDB format from OGR, which provides direct access to the file using Jackcess . Details about this option are here . Below I will describe both ways.

Finally, one remark: OGR is an extremely powerful thing; the ability to work with MS Access is a small part of the whole variety of available formats and, dear reader, it may quite reasonably be said that this is a shot from a cannon by sparrow ... but the choice is not great and besides this cannon no other tools could be found. And yes, here is a list of all formats supported by OGR .

ORG ODBC format

This approach uses the ODBC system settings and works similarly to the above osql and ruby-odbc, but inside the database. All available options for initializing FDW are presented on the GDAL ODBC driver page . Below I will give only a simple example of use.

Actually here it is:

postgres=# CREATE SERVER testdb_access
postgres-# FOREIGN DATA WRAPPER ogr_fdw
postgres-# OPTIONS(
postgres(#   datasource 'ODBC:testdb',
postgres(#   format 'ODBC');
CREATE SERVER
postgres=# CREATE FOREIGN TABLE access_sections (
postgres(#     "Код" decimal,
postgres(#     "Раздел" varchar)
postgres-#     SERVER testdb_access
postgres-#     OPTIONS (layer 'Раздел');
CREATE FOREIGN TABLE
postgres=# SELECT * FROM access_sections;
ERROR:  unable to connect to layer to "Раздел"
HINT:  Does the layer exist?

As far as I understand from the OGR documentation, layer - in our case, is equivalent to the database table.

A list of all layers can be obtained using the ogrinfo utility :

$ ogrinfo -al 'ODBC:testdb'
geometry_columns is not a table in this database
Got no result for 'SELECT f_table_name, f_geometry_column, geometry_type FROM geometry_columns' command
INFO: Open of `ODBC:testdb'
      using driver `ODBC' successful.

Based on this message, it can be assumed that everything works, but the target database (that is, an mdb file) does not contain the required Geo data format and OGR stumbles on this annoying misunderstanding. I don’t know how to wean him from forcibly checking the format of the provided database. But some write that this approach works great under Windows. In general, if you know how to reason OGR ODBC and make it work with an arbitrary mdb file, please tell about it, do not keep this knowledge in yourself.

A separate question: how will PG work with the Cyrillic (and indeed with any other non-Latin) names of tables and columns. On the one hand, to Postgres no matter what the table / column is called, wrap them in double quotes and at least special characters can be used. On the other hand: who knows whether this is applicable to FDW, but it has not yet been possible to verify with a concrete example.

Format MDG MDB

This approach is based on the Jackcess Java library . Since it is Java and it has its own rich inner world, this approach has no connections with system ODBC and, therefore, problems with drivers for MS Access are foreign to it. But there are other features that I will describe below.

I’ll immediately warn you that due to the "rich" documentation on the entire described process, the lack of extensive experience with Java and some monstrosity of the target package, the working version was able to be assembled in 3 days and ~ 20 complete rebuilds of the package. Therefore, I will immediately say about some things:

this approach is suitable only for working with unencrypted mdb-files (that is, with files without a password);
since this is a package assembly, all of the dependencies, paths, versions, and other attributes described below are valid for my particular case and environment, everything may not be right for you and mindlessly copying commands is not recommended.

So, all of the below is a more detailed version of the original official description of the GDAL ACCESS MDB database driver .

First: you need to install openjdk-6-jdk.

sudo apt-get install openjdk-6-jdk

After a quick and thoughtful reading of the GDAL sources , I got the feeling that it supports openjdk-7-jdk as well. But I couldn’t get it to work with version 7.

Next, you will need libgdal-dev.

sudo apt-get install libgdal-dev

Here you need to remember the package version. It is directly related to the version of the GDAL package. In my case, this is version 1.10.1 .

Note: mdb format support starts from version 1.9.0 .

Well, lastly, you need to demolish the gdal-bin package, since we are going to collect its extended version from the sources.

sudo apt-get remove gdal-bin

Secondly: you need to download several JARs (ancient and not very), namely: jackcess-1.2.2.jar, commons-lang-2.4.jar and commons-logging-1.1.1.jar; then put them in lib / ext. In my case, the full path to this directory is: / usr / lib / jvm / java-6-openjdk-amd64 / jre / lib / ext . The above versions of JARs can be found inside this utility . For me, everything works with any later version of commons-logging (1. *), with any other minor version of commons-lang (2. *) and jackcess (1. *). Errors appeared only when using the next major version of jaccess (2.1.4).

Third: you need to download and configure GDAL.

git clone git@github.com:OSGeo/gdal.git
cd gdal/gdal/
git checkout 1.10

Here you need to go to the branch corresponding to the version of the libgdal-dev package, which was installed in the item number of times. Otherwise, the assembled binary will be incompatible with libraries.

Next you need to call configure. There are two possible ways to call. Plain:

./configure --with-java=yes --with-jvm-lib-add-rpath=yes --with-mdb=yes

and with explicit paths:

./configure --with-java=/usr/lib/jvm/java-6-openjdk-amd64 \
            --with-jvm-lib=/usr/lib/jvm/java-6-openjdk-amd64/jre/lib/amd64/server \
            --with-jvm-lib-add-rpath=yes \
            --with-mdb=yes

The second option can be useful if the system has several versions of Java (for example, openjdk-6-jdk and openjdk-7-jdk) or if the first option did not give the desired result.

After configure is complete, you need to find the treasured word 'yes' opposite the MDB format.

Fourth: you need to find a cup of tea / coffee or something stronger and start assembling the package.

sudo checkinstall

Here you need to answer a couple of simple questions and wait. In my case, it took about 10 minutes to wait.

It should be noted here that the package will turn out weighty, about 300mb. Of course, you can throw out everything superfluous from it, assemble it with your hands and get closer to the size of the gdal-bin package from the repository (~ 900Kb), but this is beyond the scope of the story and, therefore, will not be described.

Fifth: if something went wrong, the package assembly fell off, then Google and a bright mind will help you.

Sixth: if everything went well, then after checkinstall the package should be automatically installed and now you need to check whether the received binaries really support the mdb format:

$ ogrinfo --formats | grep MDB
-> "MDB" (readonly)

If there was no information about mdb in the output of ogrinfo, then go to the beginning of this section, re-read the manuals, look at the dependencies, system parameters, moon phase and other attributes that may affect compilation and the final binary, and try to rebuild all this economy again.

If the command and conclusion coincided, then all is well and the most muddy part is behind. Now ogrinfo can work with mdb files and provide information about their contents:

$ ogrinfo /opt/db/test-database.mdb
INFO: Open of `/opt/db/test-database.mdb'
      using driver `MDB' successful.
1: closeouts
2: economics

Seventh: now you can configure FDW in Postgres. Here is a small script with an example of this action:

postgres=# CREATE SERVER acc
    FOREIGN DATA WRAPPER ogr_fdw
    OPTIONS (
      datasource '/opt/db/test-database.mdb',
      format 'MDB' );
CREATE SERVER
postgres=# CREATE FOREIGN TABLE economics(
    ID integer)
    SERVER acc
    OPTIONS(layer 'economics');
CREATE FOREIGN TABLE
postgres=# SELECT * FROM economics;
 id 
----
  1
  2
  3
  4
  5
(5 rows)

And, in general, that's all. At the end of this section, I’ll say a few words about “encrypted” mdb files.

If FDW cannot pull data from Access, and ogrinfo swears as follows:

Exception in thread "main" com.healthmarketscience.jackcess.UnsupportedCodecException: Decoding not supported.
Please choose a CodecProvider which supports reading the current database encoding.
        at com.healthmarketscience.jackcess.DefaultCodecProvider$UnsupportedHandler.decodePage(DefaultCodecProvider.java:115)

then most likely you have a password-protected mdb file. In this case, you should look at the FAQ from Jaccess and think about the OGR Access driver dopil. As far as I understand, there is a Jackcess Encrypt project . This project provides CryptCodecProvider, which, in turn, provides an implementation of the CodecProvider interface for Jackess and supports some encryption formats for mdb files. But, unfortunately, the current driver from GDAL does not know how to work with Jackcess Encrypt and, therefore, does not support encrypted files. So, there is a good direction for working in the open source camp.

Other FDW

A list of all existing FDWs for Postgres can be found on the official wiki page . There is ZhengYang / odbc_fdw , in which the last commit is dated 2011. And CartoDB / odbc_fdw , which is actively developing and supporting Postgres 9.5+. So the choice is small.

Conclusion

Working with MS Access is painful ... doubly painful if you need to do it under Linux. So good advice at once: pull data from any access to any modern database and get rid of the problems car. If you can’t pull it out, then work with Windows access. There is a normal driver provided by Microsoft “out of the box,” about docking Access and Postgres on Windows, there are at least some articles and configuration examples, and generally the products of the same company usually work well with each other. If this is not possible, then you again have two options: turn everything into a CSV and work with it, or try to access the mdb file directly. The first option is simple, works out of the box and does not require special skills. The second option is much more complicated, it takes time, nerves, straight arms, has a set of restrictions, pitfalls and other unpleasant things. Therefore,

References

Blog of a certain girl SARA SAFAVI about GDAL / OGR under Ubuntu
Stackoverflow about MS Access under Ubuntu
OpenNet Theme
and many many links kindly provided by google good corporation

Tags: