OpenCorporates is revolutionising access to company data, and with it genuinely changing the world for the better. Our data is more trusted, more transparent, fresher and depended upon by everyone from investigative journalists and anti-corruption investigators to law enforcement, major banks and fintech unicorns. We are an innovative, fast growing, scale-up with aggressive goals and a public benefit mission at its core. We have a world-class set of trustees to ensure the long term sustainability of our business model.
Your role will be to write, run and maintain the scrapers and bots that integrate with APIs, download open data dumps or scrape websites to collect data for OpenCorporates. Your main responsibilities include:
Support & expand our data pipeline by working hand with the rest of the Data Acquisition team
Write bots to source publicly available data (scraping websites, consuming data published via APIs or CSV, or extracting data from PDFs) in order to create new data feeds, and also help solve problems with our existing feeds
Maintain high data quality by documenting and comparing datasets to their source to verify that the information is complete and error-free. You'll also suggest ways to make our processes more efficient.
Some weekend and/or out of hours work may be required
We're particularly looking for highly talented Ruby Devs who both understand how to extract data from a variety of sources - from structured APIs to legacy, messy or plain broken public websites. Above all we are looking for smart people who we think will fit in well. During the interview process you should be able to demonstrate that:
You're good at writing bots. Extracting information buried deep within government websites can often be challenging, but you know the right techniques to obtain the data without hacking or bombarding our sources with requests.
You know data. Our product is data – arranging it, linking it, making it accessible – so you should enjoy dealing with it, including handling complex concepts, managing big datasets and knowing how to keep it high quality.
You understand process. The OpenCorporates data pipeline has a variety of data flows and processes which need to work together seamlessly. You must understand and be able to deal with the challenges this raises.
You have a keen eye for detail. Accuracy, attention to detail and ability to spot trends is key to keeping data quality levels high.
You know how to work in a team. The problems we deal with require a lot of collaboration and communication.
You will be part of a group of colleagues who are on an on-call rota to remediate issues with regard to scrapers, exports etc
You know and love Ruby
You have an interest in writing bots, scrapers or similar,
Able to query & maintain structured data: SQL (SQLite/MySQL), JSON, XML
Git or other version control software
Desirable skills / experience
ETL processes and data pipelines
Root cause analysis & data remediation experience
Excellent verbal and written communication skills
Our values outline the shared principles that define the OpenCorporates culture and team environment. The company values underpin everything we do, to day-to-day decision making, teamwork, supporting our clients and evaluating individual and company performance, the core values are the lens we look through in everything we do.
All our employees are driven by our values and use them as a compass to guide their work and collaboration with colleagues and clients.
Be Bold & Beat The Odds
Our work is hard - and matters. We will succeed by being more ambitious, more imaginative and more daring than our competitors
We Put Users First
Success will only come if we focus obsessively on the success of our users in everything we do
Learn & Adapt
There is no straight line to success. We will excel by taking a scientific approach to all our work
We win together. We fail together. And diversity – of backgrounds, of views, of personalities – is a critical asset