Data Integration Developer (Senior)

Job description

Job description

We’re looking for an exceptional Ruby Developer to join OpenCorporates, the world’s largest open database of company info, and one of the most exciting and important data companies in the world.

OpenCorporates is revolutionising access to company data, and with it genuinely changing the world for the better. Our data is more trusted, more transparent, fresher and depended upon by everyone from investigative journalists and anti-corruption investigators to law enforcement, major banks and fintech unicorns. We are an innovative, fast growing, scale-up with aggressive goals and a public benefit mission at its core. We have a world-class set of trustees to ensure the long term sustainability of our business model.

Main Responsibilities:

Your role will be to write, run and maintain the scrapers and bots that integrate with APIs, download open data dumps or scrape websites to collect data for OpenCorporates. Your main responsibilities include:

  • Support & expand our data pipeline by working hand with the rest of the Data Acquisition team

  • Write bots to source publicly available data (scraping websites, consuming data published via APIs or CSV, or extracting data from PDFs) in order to create new data feeds, and also help solve problems with our existing feeds

  • Maintain high data quality by documenting and comparing datasets to their source to verify that the information is complete and error-free. You'll also suggest ways to make our processes more efficient.

  • Some weekend and/or out of hours work may be required


What we are looking for from the ideal person:

We're particularly looking for highly talented Ruby Devs who both understand how to extract data from a variety of sources - from structured APIs to legacy, messy or plain broken public websites. Above all we are looking for smart people who we think will fit in well. During the interview process you should be able to demonstrate that:

  • You're good at writing bots. Extracting information buried deep within government websites can often be challenging, but you know the right techniques to obtain the data without hacking or bombarding our sources with requests.

  • You know data. Our product is data – arranging it, linking it, making it accessible – so you should enjoy dealing with it, including handling complex concepts, managing big datasets and knowing how to keep it high quality.

  • You understand process. The OpenCorporates data pipeline has a variety of data flows and processes which need to work together seamlessly. You must understand and be able to deal with the challenges this raises.

  • You have a keen eye for detail. Accuracy, attention to detail and ability to spot trends is key to keeping data quality levels high.

  • You know how to work in a team. The problems we deal with require a lot of collaboration and communication.

  • You will be part of a group of colleagues who are on an on-call rota to remediate issues with regard to scrapers, exports etc

Required skills

  • You know and love Ruby

  • You have an interest in writing bots, scrapers or similar,

  • Able to query & maintain structured data: SQL (SQLite/MySQL), JSON, XML

  • Regular expressions

  • Git or other version control software

  • Linux environment

Desirable skills / experience

  • ETL processes and data pipelines

  • Root cause analysis & data remediation experience

  • Excellent verbal and written communication skills

Our Values

Our values outline the shared principles that define the OpenCorporates culture and team environment. The company values underpin everything we do, to day-to-day decision making, teamwork, supporting our clients and evaluating individual and company performance, the core values are the lens we look through in everything we do.

All our employees are driven by our values and use them as a compass to guide their work and collaboration with colleagues and clients.

Be Bold & Beat The Odds

  • Our work is hard - and matters. We will succeed by being more ambitious, more imaginative and more daring than our competitors

We Put Users First

  • Success will only come if we focus obsessively on the success of our users in everything we do

Learn & Adapt

  • There is no straight line to success. We will excel by taking a scientific approach to all our work

One Team

  • We win together. We fail together. And diversity – of backgrounds, of views, of personalities – is a critical asset