Description of the Database GWIS

0 downloads 0 Views 3MB Size Report
Jan 6, 2016 - 2.2 GWIS Components. • Data stored in MySQL on a server or PC/Laptop (external access by VPN). • Security and access are managed by ...
01.06.2016

Description of the Database GWIS*

Database to store chemical structures, any biological data, phys.-chem. data designed for small pharmaceutical R&D Organizations

Author: Peter Schneider

*Growing with Success

Peter Schneider Bäumliackerstrasse 8 4103 Bottmingen [email protected]

1 Table of Contents 2

3

General Description ........................................................................................................ 4 2.1

Goal ......................................................................................................................... 4

2.2

GWIS Components .................................................................................................. 4

2.3

GWIS Functionality .................................................................................................. 4

2.4

Inimitable GWIS Design ........................................................................................... 4

2.4.1

Connectivity of Sample Number and Solution Number ..................................... 5

2.4.2

Connection of Solution Number to Biological Data ............................................ 5

User Manual ................................................................................................................... 6 3.1

User Login VPN ....................................................................................................... 6

3.2

User Login Database ............................................................................................... 6

3.2.1

Activation of the folder on the server ................................................................. 6

3.2.2

Database login.................................................................................................. 6

3.3

3.3.1

Register Chemistry Data ................................................................................... 8

3.3.2

Register Biological Data ................................................................................... 8

3.3.3

Retrieve all Sample Data .................................................................................. 9

3.3.4

Search all Structures ........................................................................................ 9

3.3.5

Retrieve Biological Data ................................................................................... 9

3.3.6

Solution Data Register/Retrieve ........................................................................ 9

3.3.7

Powder Data Register/Retrieve......................................................................... 9

3.4

4

Excel Workbooks for each Tab on the Activity Form ................................................ 8

Explanation of the Functionality of Excel File Templates ....................................... 10

3.4.1

Registry Chemistry Excel Files ....................................................................... 10

3.4.2

Registry Biological Data Excel File ................................................................. 11

3.4.3

Retrieve all Sample Data Excel File ................................................................ 11

3.4.4

Retrieve Biological Data Excel File ................................................................. 15

3.4.5

Solution data Register/Retrieve Excel File ...................................................... 19

3.4.6

Powder Data Register/Retrieve Excel File ...................................................... 19

Administrative Tools ..................................................................................................... 21 4.1

Connections to MySQL .......................................................................................... 21

4.1.1

Connection of MySQL via Excel...................................................................... 21

4.1.2

GWIS access of MySQL via VPN ................................................................... 21

4.1.3

MySQL Workbench connection to MySQL ...................................................... 21

4.1.4

MySQL DB connected to Instant JChem......................................................... 21

4.1.5

Data Management of MySQL ......................................................................... 21

2

4.1.6 4.2

5

MySQL DB on PC/Laptop ............................................................................... 21

Manage Database with GWIS ................................................................................ 21

4.2.1

Structure Transfer to Instant JChem ............................................................... 22

4.2.2

Employees ...................................................................................................... 22

4.2.3

New Bio Test .................................................................................................. 22

4.2.4

New Strain ...................................................................................................... 23

4.2.5

New Salt ......................................................................................................... 24

4.2.6

New Labjornal................................................................................................. 24

4.2.7

New Project .................................................................................................... 24

4.2.8

New Chem Provider........................................................................................ 25

4.2.9

New Bio Provider ............................................................................................ 25

4.2.10

New Registration Rights ................................................................................. 25

4.2.11

Patent Update................................................................................................. 25

4.2.12

Table/Column Changes .................................................................................. 25

4.2.13

General Tools ................................................................................................. 27

4.2.14

General Queries ............................................................................................. 29

4.3

Shell Commands for MySQL ................................................................................. 29

4.4

Add User to MySQL Database ............................................................................... 30

4.5

Working with InstantJChem ................................................................................... 30

4.6

Using DataWarrier as additional Frontend ............................................................. 31

Appendices ................................................................................................................... 32 5.1

“Bad Structures” Table ........................................................................................... 32

5.2

Search Level Table ................................................................................................ 32

5.3

Projects ................................................................................................................. 32

5.4

Assays ................................................................................................................... 32

5.5

Bacteria-Strain Collection ...................................................................................... 32

5.6

DB Tables and Columns (selection) ....................................................................... 32

5.7

Templates to Register Biological Data into DB....................................................... 32

3

2 General Description 2.1 Goal Design and develop a low cost database with the ability to grow and retrieve data with a most flexible outline. The concept is designed to serve start-up companies in order to organize the data flood.

2.2 GWIS Components     

    

Data stored in MySQL on a server or PC/Laptop (external access by VPN) Security and access are managed by MySQL built in tools and vb.net tools Flexible table design in MySQL with unique software, later migration to an ORACLE DB possible Excel and JChem for Excel (add-in from Chemaxon, license necessary) as front end to manage chemical structures Two pre-prepared Excel sheets for flexible queries (no programming necessary), the components for the queries are selected from drop-down lists, guarantee a versatile table design adaptable to user’s needs. Data can be easily exported as sd files for further manipulations. Easy to use registration sheets for structures, sample data, solutions, biological data and general data Action sensitive Task Panels in Excel Ribbon Built-in functionalities of Excel are available, e.g. conditional formatting, sorting, etc. Client to manage the database and the data (vb.net) for DB administrators Tested for Win7, Win10 and Office10, Office13 and 32, 64 bit

2.3 GWIS Functionality       

Data storing, data management and retrieving via Excel or vb.net tools Structure storing with automated number generation (no structure duplicates!) Sample storing with automated identifier generation for parent, salts and batches (no duplicate identifiers) Powder inventory keyed on e.g. barcodes Solution inventory keyed e.g. on barcodes (automated identifier generation in relation to the structure/sample number) Detailed biological data storing Expandable in any direction on user’s demand

2.4 Inimitable GWIS Design Database concept based on two identifiers: Sample Number and Solution Number 7 digit number Structure identifier

Sample number

3 digit number Salt identifier

CC-xxx-xxxx-yyy-zz

2 digit number Batch identifier

same number

Solution number

Sol-xxx-xxxx-vvv

3 digit number for lot

4

2.4.1

Connectivity of Sample Number and Solution Number

Arbitrary example: Combination =

Time advance

unique identifier for samples

CC-000-0001-001-01 CC-000-0001-002-01 CC-000-0001-001-02 CC-000-0001-001-03 CC-000-0001-003-01 etc.

Each solution is assigned to a sample uniquely.

unique identifier for solutions

Sol-000-0001-001 Sol-000-0001-002 Sol-000-0001-003 Sol-000-0001-004 Sol-000-0001-005 Sol-000-0001-006 Sol-000-0001-007 Sol-000-0001-008 etc.

2.4.2 Connection of Solution Number to Biological Data The Solution Number is connected to any type of biological data and therefor logically to the Sample Number. The position of the solution in the plate (or barcoded bin) and the plate number (barcoded) is uniquely connected with the Solution Number. The biological data are defined by the experimental date (timestamp), lab journal and the position in the test-plate to mention some important ones (incl. many other parameters).

Arbitrary example: Sol-000-0001-001 Sol-000-0001-002 Sol-000-0001-003 Sol-000-0001-004 Sol-000-0001-005 Sol-000-0001-006 Sol-000-0001-007 Sol-000-0001-008

Test1 - Test5 Test1 - Test3, Test6, Test9 Test1 - Test3 Test1 - Test3, Test7 - Test10 Test10 - Test15 Test1 - Test3, Test10 - Test15 Test1 - Test3 Test16 - Test20

Time advance

The unique identifier concept of the Sample Number and the Solution Number guarantees for any sample an unambiguous retracement of all biological and sample data and vice versa.

5

3 User Manual 3.1 User Login VPN This part is necessary only for external access.

The file “TPBasel-Bioversys.pcf” file has to be placed into the folder “C:\Program Files (x86)\Cisco Systems\VPN Client\Profiles”. The connection has to be established with the “username” and the appropriate “password”.

3.2 User Login Database 3.2.1 Activation of the folder on the server Before the connection to the DB is established the folder on the server has to be activated on PC/laptop Windows Explorer because the program needs some files (Excel templates) which are located on the server.

3.2.2

Database login

The DB is protected with user’s password. It is possible to change between 2 equal databases (Backup DB). After “Login” the next form shows the possible activity groups. Additional groups are defined by the administrator or the manager of the DB on request. Each employee can be a member of different activity groups. Currently six activity groups were defined see New Registration Rights. Within each activity group the corresponding Tabs are activated and lead to new forms or Excel templates.

6

In this form on the left side Excel files are disposed from the different activated Tabs and are discussed later (see below). On the “Saved free Queries” side, the owner can define if the Excel file (from the “Data Bio Excel Sheets” Tab, see Retrieve Biological Data Excel File) should be saved later and defined as “private” (p) or accessible for all (a). It is also possible to change from “p” to “a” and vice versa but only for owners; only this column can be changed besides the filename of coarse. By activation of a row it is also possible to delete a file where the owner is logged in and the “privacy” set to ‘p’. With the “Restore” Tab the corrections are visible. This gaget is useful to save profiles for a set of compounds e.g. to follow the progress of biological data determination or to add later collected data of additional Biotests.

Administrator

Registry_admin

Figure 1

All 9 Tabs are activated

6 Tabs are activated, except Administrator tools

Registry_biology

Registry_chemistry

5 Tabs are activated, except Administrator 5 Tabs are activated, except Administrator Tools and related structure registering tools Tools and biological data registering tools

7

User_biology

User_chemistry

3 Tabs are activated, allowing retrieving but not changing data except Solution Inventory

3 Tabs are activated, allowing retrieving but not changing data except Powder Inventory

Single click on an activated Tab Excel workbook name(s) pop up in the data grid window. The Excel workbook opens up by a single click to be ready for actions.

3.3 Excel Workbooks for each Tab on the Activity Form The Tab Administrator Tools and InstantJChem structure transfer are extensively described in the section Administrative Tools. 3.3.1

Register Chemistry Data

Two files are available: First, the file (CCStructureReg) is used to register new structures, samples, new salts of samples and new batches. This file is responsible to construct the unique CCNumber according the delivered data which are selective for the structure. Second, the file (CCSampleDataReg) adds additional information for samples to the DB specific for samples. The Excel files are discussed in detail in a subsequent section Registry Chemistry Excel Files. 3.3.2

Register Biological Data

Up to now many registry files depending on the assay are available to transfer the biological data from the lab to the DB with simple actions. Those files are designed for each assay according the wishes of the Biology with minimal coding. The goal of these transfer files includes the possibilities to use laboratory files from Biology. First the data are transferred from the lab Excel files to the registry file by the software then the registry is done by a mouse click. The Excel files are discussed in detail in a subsequent section Registry Biological Data Excel File.

8

3.3.3

Retrieve all Sample Data

This Excel file allows retrieving CCNumbers from structures, data related to Samples, Biological Data, Solutions, Powder and Patent according to the CCNumbers. It is also possible to retrieve the structures according the CCNumbers. Introduction of additional information is possible on request. The Excel file is discussed in detail in a subsequent section Retrieve all Sample Data Excel File. 3.3.4

Search all Structures

The Excel file opens up and retrieves all the structures and selective Sample Data (Structure, CCNumber, Provider Number and Individual Number). This file can be exported into an sdf file for further use. Important: The substructure search and other structure searches are not possible within this file and should be made in other applications like InstantJChem (ChemaxonTool). This part is still under construction. 3.3.5

Retrieve Biological Data

This Excel file is a powerful template allows profiling any combination of Samples with biological data including the structures. The results can be exported to an sdf file for further use. The data can be easily transferred to a PowerPoint file as a reporting system including structures and biological data. It is also possible to view the data in different graphs. The Excel file is discussed in detail in a subsequent section Retrieve Biological Data Excel File. 3.3.6

Solution Data Register/Retrieve

This Excel file allows registering, updating and retrieving data for solutions. The possibility to fulfill the needs for an inventory is included in the file (inclusive barcode reading). The Excel file is discussed in detail in a subsequent section Solution data Register/Retrieve Excel File. 3.3.7

Powder Data Register/Retrieve

This Excel file allows updating and retrieving data of Powder samples. The possibility to fulfill the needs for an inventory is included in the file (inclusive barcode reading). The Excel file is discussed in detail in a subsequent section Powder Data Register/Retrieve Excel File.

9

3.4 Explanation of the Functionality of Excel File Templates 3.4.1 Registry Chemistry Excel Files Two Excel files are designed to register the structure(s) and to update the data for the individual sample(s).

3.4.1.1 CCStructureReg

All orange columns must be filled with data otherwise the structure cannot be registered. The white columns are filled by the system during registration. The grey columns must not be filled but later might be very helpful if this kind of data getting registered. The most important column is the structure column (A). The structures are entered via a SMILES code or entered via the JChem tools. For more details to draw structures see Retrieve all CCNumbers from Structures. Most important: the structure can be drawn with correct optically active centers; a reference can be made to the racemate or vice versa in the structure comment column. Columns D to K are filled with drop-down lists. Only data from the drop-down list can be registered. If a definition is missing it can be complemented anytime by the administrator. This is most important for the salt definition. During registration the system is checking whether the structure is registered (enantiomers have different numbers compared to the racemate), if not a new number is generated, otherwise a new batch for the sample depending on the salt is registered.

3.4.1.2 CCSampleDatReg

The new generated CCNumbers are copied into this Excel file und the newly registered data are retrieved with Retrieve data. The remaining data are entered as many as are available. In the column Analytical Tests several tests can be added into the same cell. If the sample is also available as powder then the data are updated with Update sample data, register powder data; additional data (plate position, etc.) for powders are described in section Powder Data Register/Retrieve. The update for sample data can be repeated anytime once new data are available for the distinct sample with Update only sample data.

10

3.4.2 Registry Biological Data Excel File Many Excel files are available to transfer first the data from the crude data into the Excel template file and second into the corresponding DB tables. For further details see Templates to Register Biological Data into DB. 3.4.3 Retrieve all Sample Data Excel File After open the excel file sample numbers can be retrieved according certain criteria. The following paragraphs show some possibilities. The following criteria can be used: 1. 2. 3. 4.

Biological data, sample data, structure data etc. From structures via structures or smiles code Provider numbers or individual numbers (e.g. lab numbers) Sol numbers

3.4.3.1 CCNumbers with criteria from Biodata, structures and samples

1. Select from the drop-down list a. Retrieve all CCNumbers from Structures b. Retrieve Data from all CCNumbers or listed c. Retrieve Structures from all CCNumbers or listed 2. Select table name (see 7) from drop-down list. 3. Select item to be searched for (table Column Name). If table name (2) = biodata and Assay (4) is selected from drop-down list then column name can only be ResultValue or ResultValueAN, for numbers or text respectively. See DB Tables and Columns 4. Assay is selected if table name (2) = biodata. For the list see DB Tables and Columns 5. Add query criteria. Numbers, text or date(s) (format: dd.mm.yyyy) can be added. 6. Add query operator from drop-down list a. No operator means “=”. Do not type “=”. b. “” are selected from the drop-down list c. In the case of “between” two data inputs are necessary separated by a “-“ d. Using the “like” operator the placeholder(s) “%” can be placed anywhere in the criteria string (several placeholdrs allowed). Several missing character(s) (numbers and/or letters) can be replaced by one placeholder as long as they are contiguous.

11

7. From seven Tables data can be retrieved. On request more tables can be added. a. Biodata, see DB Tables and Columns b. Assay, see DB Tables and Columns c. Patentinfo, see DB Tables and Columns d. Powder, see DB Tables and Columns e. Sample, see DB Tables and Columns f. Solution, see DB Tables and Columns g. Structure, see DB Tables and Columns

3.4.3.2 Retrieve all CCNumbers from Structures

In a new template the JChem tab in the Toolbar has to be activated and a new Ribbon shows up. Then click on a cell (e.g. A7). In the Ribbon click the tab Add/Edit and the window to draw the structure opens, draw the structure and press OK to close the window and the structure is to cell A7 transferred. A set of structures can be copied also as smiles code, then after activation of the cells (e.g. A6) check the JChem in the toolbar and in the Ribbon select From Smiles and the smiles code is converted to the structure. The search can be done as structure or as the smiles code without conversion to the structure. Change the cell Structure with all Sample Data to YES. Select Retrieve all CCNumbers from Structures (see Point 1). All the samples are retrieved for the same structure (batches, salts). The picture below illustrates the result of the CCNumber selection from structures.

12

3.4.3.3 CCNumbers from a set of Provider or Individual Numbers

A set of CHProvNumber (provider number) can be converted into CCNumbers. The set has to be placed into column D. If the user is not aware if the number is a CHProvNumber or an IndivNumber the search can be done as an IndivNumber again if empty CCNumbers show up. The user can just change the Cell(D2) to IndivNumber (drop-down list) and start the search again. The rest of the CCNumbers might be filled up if available. Generally: if a number is not in the DB the CCNumber cell stays empty.

A set of IndivNumber (e.g. used in Labjournals) can be converted into CCNumbers. The set has to be placed into column D. If the user is not aware if the number is an IndivNumber or a CHProvNumber the search can be done as a CHProvNumber again if empty CCNumbers show up. The user can just change the Cell(D2) to CHprovNumber (drop-down list) and start the search again. The rest of the CCNumbers might be filled up if available. Generally: if a number is not in the DB the CCNumber cell stays empty.

3.4.3.4 Selection of CC Numbers from a set of Solution Numbers

A set of SolNumber can be converted into CCNumbers. The set has to be placed into column D. If empty CCNumbers show up then this SolNumber is nonexistent. An additional

13

possibility is changing Keep CCNumbers to YES then start the search again and all the CCNumbers including solution numbers are retrieved registered in the DB. Generally: if a number is not in the DB the CCNumber cell stays empty.

3.4.3.5 Retrieve Structures from all CCNumbers or listed

Samples with the same structure, structure are symbolized with the structure number to compact the output. A list of CCNumbers can be copied into the sheet under CCNumber (B6) and then the structures can be retrieved without first to retrieve data.

3.4.3.6 Additional Features

“Keep CCNumbers”

For a normal query the NO is selected. In the column “CC Number” Row 6 a list of CCNumbers can be copied. Then the “NO” has to be changed to “YES” (drop-down list) and the Data are retrieved only for the CCNumbers listed independent of any query.

“Sort CCNumbers”

A list of CCNumbers which is not sorted (for good reason) then NO is selected from the drop-down list.

“Structure with all Sample Data”

YES means that all Samples are retrieved with the same structure number (Batches, Parent, Salts)

“Remove Borders”

With this Tab the borders can be permanently removed

14

3.4.4

Retrieve Biological Data Excel File

1. Select from the drop-down list a. Retrieve data b. Retrieve structures c. Export for sdf d. Save as, save the query as a profiling 2. Antibiotica concentration, can be any other criteria instead Antibiotica 3. Select item to be searched (e.g. test data ends always with “……pc”) 4. Test name (Bacteria/strain) see Bacteria-Strain Collection 5. Compound concentration 6. Add query condition. Numbers, text or date(s) (format: dd.mm.yyyy) can be added 7. Add query operator from drop-down list a. No operator means “=”. Do not type “=”. b. “” are selected from the drop-down list c. In the case of “between” two data inputs are necessary separated by a “-“ d. Using the “like” operator the placeholder(s) “%” can be placed anywhere in the condition string (several placeholders allowed). Several missing character(s) (numbers and/or letters) can be replaced by one placeholder as long as they are contiguous. 8. Instead of writing the AB conc or the compound conc into the corresponding row (like points 2, 5), for each a column is selected. After completion of the entries select in 1 “Retrieve data” and the SQL query is automatically generated and the data retrieved.

15

3.4.4.1 Retrieve structures

3.4.4.2 Export for sdf The data are first retrieved with Calculate Average clicked to YES, and then retrieve the structures by selecting in 1 “Retrieve structures” and the sdf file can be generated by selecting in 1 Export for sdf. The result is transferred to the sheet “Transfer”.

Check the JChem Tab and select Export, choose a file name and follow the proposals. The sdf file can be used in other applications like DataWarrior (Using DataWarrier as additional Frontend), Discovery Studio, Instant JChem etc. for visualizations.

3.4.4.3 Additional Features

“Antibiotica”

From the drop-down list the antibiotic can be selected which then is incorporated into the query, otherwise leaf the cell empty.

16

“Structure Check”

Zero means that structures with a structural flag in the DB are not retrieved. In addition at least 4 Substructure levels can be defined to group structures for special SAR questions. Explanations see Structure Check.

“Keep CCNumbers”

Into the column “CCNumber” Row 7 a list of CCNumbers can be copied. Then the “NO” has to be changed to “YES” (drop-down list) and the Data are retrieved only for the CCNumbers in the given order listed independent of the query.

“Calculate average”

The biological data for a defined set of parameters (e.g. compound and/or antibiotic concentration(s)) are displayed as the average.

“Remove Borders”

With this Tab the borders can be permanently removed.

“Sort”

This Tab activates the Excel Sort Form and activates automatically the entire data range. Any sorting type via a column(s) can be sorted.

“Scatter Plot”

This Tab opens a form to select different opportunities arranging the data in different scatter-plots and to see the structures for the different data points. Explanations see Tab Scatter Plot.

“Data to Slides”

This Tab allows transferring structures and data to Power Point with different outlines as a reporting system. First the data have to be retrieved as Calculate Average “YES” and second selecting in 1 “Retrieve structures”. Explanations see Tab Data to Slides.

3.4.4.3.1 Structure Check To see “bad structures” type the numbers, to avoid seeing the “bad structures” type 0, to see all the structures enter 0-19 (depending on the bad structure numbers), to see all “bad structures” enter 1-19. A mixed entry is also possible e.g. 0,2,3,5-6,1316. The numbers must be separated by a comma (see “Bad Structures” Table).

This tool allows also the grouping of structures according substructures, which are defined by the chemists in advance. Four levels are proposed, but can be enlarged on request. a) b) c) d)

Searchlevel01: StrXX Searchlevel02: StrXX_YY Searchlevel03: StrXX_YY_ZZ Searchlevel04: StrXX_YY_ZZ_WW

Also the combination is possible like Str04,Str03 or Str04,Str01_5_1,Str02_3. Any number of search levels can be combined. The levels must be separated by a comma. The two structure checks can be combined. If the two structure checks are combined then they have to be split with /, e.g. 0,18-19/Str01_2,Str04. Such structure queries have to be planned carefully otherwise they do not make sense (see Search Level Table)!

17

3.4.4.3.2

Tab Scatter Plot On this form it is possible to select from three different plots. To incorporate the data select the columns accordingly, the only restriction is that the column of the CCNumber is fixed; the rest of the columns can be selected free depending on the plot type (the 3D Zylinder plot is not available yet). Columns 4 and 5 are used only in conjunction with scatter Plot linear. It is also possible to select the Min and Max of the X- and Y-axes to remove too many overlays of data points. The bubble size can also be changed improving the visibility of the data bubbles. A very helpful program is described in the section Using DataWarrier as additional Frontend.

3.4.4.3.3

Tab Data to Slides This is a unique reporting tool transferring structures and biological data to Power Point and coloring the data according preset conditions. One ppt slide can be filled either with 8 structures including 8 datasets or 4 structures with 20 datasets. The Number of columns field must always be filled with the appropriate number of datasets per structure. Since the profile of a compound in the Excel sheet might contain many columns (e.g. > 20) then it is possible to enter in the row Column the column letters in any order (the number of chosen column letters must be the same number as in the Number of columns field. Tree levels of criteria are possible to color the data (additional levels can be added on request). These criteria are saved individually for each dataset and can be retrieved later, changed and again saved. A Checkbox which is not activated means that the smallest data number is superior to the higher number (e.g. IC50 data) for coloring; a ticked Checkbox changes the criteria color to the opposite by the system (only numbers are allowed). The small print screen shows how the different fields have to be filled.

18

3.4.5

Solution data Register/Retrieve Excel File

1. To register a new solution the CCNumber(s) have to be copy/paste from the file CCStructureReg into the CCNumber column. All the available data are entered and most important the date (Production Date). Usually the Residual Volume column is filled with the Start Volume values and the Used Volume column is filled with zeros. Then check Register new Sol number in the Ribbon. 2. There are four ways retrieving data for solutions a) Enter Sol Number(s) in column A and check Retrieve data b) Enter a number or string into Position Case column manually then double click, the data are retrieved automatically. c) Enter a barcode into Barcode Box column manually or via a barcode reader then double click, the data are retrieved automatically. d) Enter a barcode into Barcode Bin column manually or via a barcode reader then double click, the data are retrieved automatically. 3. This template can also be used for INVENTORY purpose for solutions. After retrieving the data described under point 2 the used volume can be entered into the column F, after double click the cell the system calculates the remaining volume. The row where the content of the cells were changed is automatically activated. The Used For column has to be filled to track all the tests for which this solution was used. Clicking the Register used amounts in the Ribbon the new volumes are registered. This way all the changes, uses etc. are permanently in the DB registered. 4. The Update fields in the Ribbon, e.g. barcodes cannot be used for the INVENTORY (see Point 2). This action is only for updating the columns if those data were not available during registration. 5. After retrieving data according point 2 the outline can be changed that the Position Vial is organized according to Biology’s definition. 3.4.6

Powder Data Register/Retrieve Excel File

Part of the powder data are registered parallel with the sample data. The rest has to be updated in the specific columns e.g. Used for, Case, Box, etc. Later this tracking of uses is important during registration activities (e.g. NDA, IND). 1. There are two ways retrieving data for powders a. Enter CCNumber(s) in column A and check Retrieve data

19

b. Enter a number or string into Case column manually then double click, the data are retrieved automatically. c. Enter a barcode into Bin Barcode column manually or via a barcode reader then double click, the data are retrieved automatically. 2. This template can also be used for INVENTORY purpose for powders. After retrieving the data described under point 1 the used amount can be entered into the column D; the Used For column has to be filled to track all the tests for which this powder was used. After double click the cell the system calculates the remaining amount. The row where the content of the cells was changed is automatically activated. Clicking the Register used amounts in the Ribbon the new amounts are registered. Then the next row can be activated and registered. This way all the changes, uses etc. are permanently in the DB registered. 3. The Update data in the Ribbon cannot be used for the INVENTORY (see Point 2). This action is only for updating data if those data were not available during registration.

20

4 Administrative Tools 4.1 Connections to MySQL 4.1.1 Connection of MySQL via Excel Connection to DB via DNS files, load first mysql-connector-odbc-5.2.6-win32, and via System DNS with C:/Windows/SysWOW64/odbcad32.exe an ODBC file should be generated by the administrator. Connection via a Home Network usually an additional User/Password hurdle has to be passed. 4.1.2 GWIS access of MySQL via VPN The administrator needs to setup a secured VPN client. This connection is done in addition with security items of vb.net (2010) and MySQL features. Each user needs login name and password setup by the administrator. 4.1.3 MySQL Workbench connection to MySQL MySQL Workbench is a powerful tool to manage the database. See handbook of this tool. Additional help to manage the database are shown within GWIS. 4.1.4 MySQL DB connected to Instant JChem Another helpful frontend besides Excel is Chemaxon’s Instant JChem. This tool can show structures and biological data with interesting other possibilities, but GWIS tools are by far more interesting for Medicinal Chemists and database managers. See Working with InstantJChem. 4.1.5 Data Management of MySQL The setups for the connections from GWIS or InstantJChem to MySQL are valid for Win7, Win8 and Win10. With the help of GWIS the management of the DB can be done on a laptop (PC) or on the server directly. 4.1.6 MySQL DB on PC/Laptop All the data management can be done on the laptop or PC with direct data transfer to the server DB with GWIS. Important: Users have to be registered in addition separately in MySQL with all necessary privileges. MySQL DB on a laptop or PC has to be installed with e.g. “mysql-installer-community5.6.14.0”) as server (not as development machine) and MySQL Workbench as frontend to manage some data (e.g. new tables, new columns in tables, new user, etc.).

4.2 Manage Database with GWIS On the form Activities (see page 7, Figure 1) a tab Administrator Tools leads to a form Administrator Tools with 13 tabs. All these tabs and subtabs inclusive Excel sheets allow all the activities to manage data and the DB. GWIS is not dealing with few activities which are easier to manage with MySQL Workbench.

21

4.2.1 Structure Transfer to Instant JChem Selecting the tab Structure Transfer Files two Excel files pop up. After selecting the file New_structures_IJC_reg the newly registered structures in the DB are automatically retrieved. The sd file can be saved for further actions (Working with InstantJChem) within the JChem tab in the Excel sheet with Export.

4.2.2

Employees

Enter all the available date to a new line. The term nameabbr is unique; the program will tell whether the abbreviation is already taken. After registration of the data Registration rights have to be defined; another Form shows up. Register Data, Update Data and Check new Entry are self-explanatory.

Select first the Name Abbreviation and then add all the other data. For each registry group a new line has to be filled with data.

4.2.3 New Bio Test For new biological tests a reasonable name has to be defined and in the Assay Table entered; this is also the name of the corresponding database table. All the yellow cells have to be filled. At the end of the “assaynameabbr” a “%” has to be added without space (for technical reason). The corresponding column names are defined in the Assay Value Names grid view. Column names at best should incorporate some information about the assay. At the end of the column name a “%” has to be added without a space for technical reason. All the yellow cells have to be filled. The rest of the cells are filled automatically by the system. To register or update assay information the Assay Table row and the corresponding Assay Value Names row(s) have to be activated (color changes to blue).

22

The tab Fill the Tables opens a new form to fill invisible tables allowing an automated update of the server DB out of the laptop/PC database. Important: First select the cell, adjust the right hand site and then activate the row on the left hand site. Then tip tab Insert with or without tipping the radio button. The assay abbreviation is later used for the form Select IDs and Tables to update Server The Update Server tab in the form General Tools opens a new form. The last IDs from the update_check table in DB laptop/PC and server DB are retrieved. In this table all updates are registered as table name and row number. The row(s) which show up has (have) to be selected at the row header before processing. With this information the update of the server DB can be started with the tab Update server. If the two ID numbers are different then the left gridview is filled, otherwise the box stays empty. In the right gridview the row(s) with the updated data is presented. After this action is finished the tab Insert Data to Server can be activated (next paragraph).

Retrieve last IDs and old IDs from Server for Update.

4.2.4

New Strain

New strains are registered filling all the data in the yellow columns. The data are based on the strain datasheet provided by Biology. Double click in the Cell of the column pdfile opens the datasheet from the server. The column filterassay is a kind of common number for a family of strains (e.g. 7 represents S. aureus).

23

4.2.5 New Salt The data are not entered in this form because a structure is connected to the data. Therefore, an Excel file template is activated. For anions and cations a convention has to be considered as outlined in the header.

In the template the structure has to be drawn with the help of JChem. A reasonable salt name is given and the salt type as explained above according the convention. During the registration of a compound the molecular weight is completed by the system automatically. Important: Before a not registered salt type of a compound can be registered the salt has to be registered in advance.

4.2.6 New Labjornal In this form the registered data can be retrieved best via Name Abbreviation and second by Name. The data for the yellow framed columns have to be added if available. The first two columns are entered by the system. Important: In every case where a status is necessary e.g. LBJStatus 1 means “active”, 2 means “inactive”. If a status is changed from 1 to 2 the “LBJDateEnd” has to be entered.

4.2.7 New Project The data for the yellow framed columns have to be added. If a status/activity is changed from 1 to 2 the “ProjectEndDate” has to be entered.

24

4.2.8 New Chem Provider The data for the yellow framed columns have to be added.

4.2.9 New Bio Provider The data for the yellow framed columns have to be added.

4.2.10 New Registration Rights In this form the registered data can be retrieved best via Name Abbreviation. Each employee can be a member of different activity groups. The data for the yellow framed columns have to be added. The information to be added can be all seen from the drop-down lists.

4.2.11 Patent Update An Excel template pops up and a sample number(s) is (are) added in the A column. If data exist for the sample(s), then Retrieve data fills the columns. After adding new data the entries are updated with Update data.

4.2.12 Table/Column Changes On the form Table Generation two tabs allow an automated generation and update of dropdown lists for Excel sheets. The tab General Data Tables generates drop-down lists for the Excel file CCSampleDataBioDataRetrxx (xx stands for 32 or 64 bit) for all tables in the DB, see Retrieve all Sample Data Excel File. The general name of this type of drop-down list looks like titlevalyy where yy is equal to a two digit number.

25

The tab Bio Data Tables generates drop-down lists for Excel file CCBioDataProfilexx for all assays, see Retrieve Biological Data Excel File. The general name of this type of drop-down list looks like titledataicyy where yy is equal to a two digit number.

4.2.12.1

General Data tables

In this form the left grid view is filled on loading with all the tables from the DB. Selecting the row-header the right grid view is filled with the column names of the corresponding table. If the lowest grid view stays empty no titlevalyy table exists. The tab Generate New Table creates the new titlevalyy in the DB. With the tab Update table the new table is visible in the lower grid view. The tab Update Excel writes the information into the Excel file CCSampleDataBioDataRetrxx to provide a new drop-down list.

4.2.12.2

Bio Data Tables

In this form two grid views are filled on opening (Figure 2). Selecting row header of biodataic12 (arrow 1) the rest of the empty data grids are filled. After selecting the row header 2 and 3 the tab Add table as drop-down list in Excel is selected if the grid view Dropdown List Table Name is empty. The tab Update drop-down List is overwriting the existing Table. With tab Check Drop-Down List the entries in the existing drop-down list table are selected. After selecting the row header(s) in the grid view Drop-Down List Content and delete the lines it is possible to change the content of the drop-down list of the Excel template. This action is fully reversible with selecting the row header biodataic12 (arrow 1) and the tab Update drop-down List.

26

Figure 2 Information from DB extracted

Figure 3 Activation row header biodataic12 1

2 1

3

4.2.13 General Tools On this form helpful tools are available to manage data. Those tools emerged from needs managing specific data problems. Especially the red colored functions are helpful to manage data from an external database (PC/laptop) to a server database. Important: First update the data, and then insert new data into the server DB.

4.2.13.1

Update single large Datasets

This tool allows entering or updating large data sets according unique ID numbers.

4.2.13.2

CCNumbers from any Plate Positions

Sometimes in Biology only the plate positions and the plate numbers are available. With this tool the SolNumbers and the CCNumbers can be retrieved if available. Otherwise the compounds are not as solutions registered. The same connections can be made starting from IndivNumbers, usually chemist’s laboratory numbers.

4.2.13.3

Correcting Structures/Salts/Structure related Items

In this Excel sheet the red columns can be used entering information to retrieve the stored data, either the Structure ID or the precise CCNumber. The data are retrieved with Retrieve

27

Data. Only orange and blue colored columns can be changed, the rest is filled by the system during update with Change Structure/Data/Salt selection.

4.2.13.4

CCNumbers to SolNumbers and Plate Positions

Sometimes Biology has to dilute compounds into new Plates. Then the positions are filled with the Provider or Indiv Numbers according a template setup. This file is opened from the Excel file Plate_to_Sol_Number_automated, action 1. All the yellow cells have to be filled and selected accordingly. Number of Plates means how many plates are already located on one Excel sheet of the biology. Select appropriate reference number Provider Nu or Indiv Nu, depending availability. The tab action 2 retrieves the information from the biology plate. The next action 3 registers new Sol Numbers, and tab action 3 fills the Sol Numbers to the appropriate plate positions in the biology Excel sheet.

4.2.13.5

Select IDs and Tables to update Server

The Update Server tab in the form General Tools opens a new form. The last IDs from the update_check table in DB laptop/PC and server DB are retrieved. In this table all updates are registered as table name and row number. The row(s) which show up has (have) to be selected at the row header before processing. With this information the update of the server DB can be started with the tab Update server. If the two ID numbers are different then the left gridview is filled, otherwise the box stays empty. In the right gridview the row(s) with the updated data is presented. After this action is finished the tab Insert Data to Server can be activated (next paragraph).

4.2.13.6

Retrieve last IDs and old IDs from Server for Update

This tool is used loading data from a laptop/PC into the server DB. The connection to the server has to be activated before choosing this tool. This form is filled by the system with the last IDs of the tables. If the table’s ID of the laptop/PC DB is larger than the table’s ID of the

28

server the cell turns to yellow. Only the table(s) represented by the yellow cell(s) is (are) inserted by the tab Insert. Subsequent new activation of the form controls whether all cells turned white; if not the procedure can be repeated until all cells turned white without harm. This uncontrolled action depends on the speed of the electronic line when the update is made.

4.2.14 General Queries This is a very general tool to manage at present specific data of one Table. With this tool it is possible to insert and update data. It is also possible to delete whole rows; if they are at the end of the rows it is necessary to tip the Reset Autonumber tab. Use the Drop Table tab very consciously or best never.

4.3 Shell Commands for MySQL Select cmd.exe and send to Desktop (create shortcut). In this window the commands are pasted.

Examples: 1. Change to the MySQL folder: paste cd C:\Program Files\MySQL\MySQL Server 5.6\bin 2. Make a dump file: paste mysqldump -u pschneider -p xxDB02 chemprovider projects strain > C:/DumpPS/DumpCP_PR_STR.sql.

Important: It is important that the extension .sql is always typed. 3. Add dump file to CCDB01: paste mysql -u pschneider -p

29

Type: use CCDB01; enter tab Type: source C:/DumpPS/DumpCP_PR_STR.sql; enter tab Important: ; Semicolon is important after change to the mysql level. It is also important that the extension .sql is always typed. To reach a Server from a laptop/PC a general command looks like: mysql -u pschneider -p -h xxx.yyy.zzz.ww To make an update from a certain row number onwards the following command is helpful: mysqldump -u pschneider -p xxDB02 biodata -w"biodataid>123049" --no-create-info > C:/DumpPS/filename.sql DB name Table name Column name Row number

4.4 Add User to MySQL Database For this action refer to the manual MySQL Workstation. Here are only the most important parts described.

The password is the same as for entering GWIS. Most important: Schema Privileges has to be filled with the appropriate privileges otherwise the account is not active.

4.5 Working with InstantJChem This product was developed by Chemaxon. Very good in searching for structures, substructures or similarity searches. For down load and license conditions see https://www.chemaxon.com/download/instant-jchem/. The sd file from GWIS (Structure Transfer to Instant JChem) is used to update Instant JChem with new structures. Detailed information is retrieved by the user manuals from Instant JChem. In Instant JChem open Projects [CCDB01_local] and select with right mouse click the tab Import File Into CCB01_allStructures_only to open the Form to browse for the structure sd file to be registered.

30

4.6 Using DataWarrier as additional Frontend Also a very helpful tool to visualize data is called “DataWarrior” with a free license (http://www.openmolecules.org/datawarrior/download.html). The sd File from the excel sheet Export for sdf serves as data transfer for the “DataWarrior”; first it is necessary to collect data with a query. For further information see the User Manual http://www.openmolecules.org/help/basics.html.

31

5 Appendices 5.1 “Bad Structures” Table Structurecheck 0 All “good sructures”

11

6

1 Nitrocompounds

984

12

351

2 Hydrazines

649

13

8

3 Imids

668

14

9

4 Thiourea

836

15

5 Urea

653

16

30 + ortho 3

6

7

17

97

7

53

18

116

8

245

19

10

9 Hydrazones 10

296 333

5.2 Search Level Table 5.3 Projects 5.4 Assays 5.5 Bacteria-Strain Collection The collection is online available within the GWIS tool see New Strain.

5.6 DB Tables and Columns (selection) 5.7 Templates to Register Biological Data into DB

32

Suggest Documents