Data Analysts (data acquisition/pipeline)
OpenCorporates exists to make information about companies and the corporate world more accessible, more discoverable, and more usable, and thus give citizens, community groups, journalists, other companies, and society as a whole the ability to understand, monitor and regulate them. We're based in Hoxton, in the heart of London's technology hub.
OpenCorporates is the largest open database of companies and company data in the world, with in excess of 170 million companies from over 130 jurisdictions. Our primary goal is to make information on companies more usable and more widely available for public benefit, particularly to tackle the use of companies for criminal or anti-social purposes, for example corruption, money laundering and organised crime. Our customers include the likes of Mastercard, PwC, CapitalOne and Factset. By charging commercial users for proprietary access to the structured data, we can make our data available for free to journalists, NGOs and academics via our website and API. You can read more about the impact we have, and the data side of things, on our blog.
We are looking for talented Data Analysts to join us and help us continue OpenCorporates’ important mission. We need proactive and positive team players with a drive to learn, improve our tech and processes, with a keen eye for detail to ensure our data is reliable and provenanced, and a desire to become experts in our data sources. You'll be interested in how to handle industrial quantities of complex data as we have over 200 data feeds to keep running smoothly, and under our newly appointed Chief Data Officer we have started an ambitious mission to rapidly expand the breadth and depth of the data we collect.
What you'll be doing
- Expanding our data. You will ensure that we are accelerating our data acquisition, proactively and continually looking to bring on new registries and sources of data - expanding the depth and breadth of our data based on client needs.
- Analyse data sources. You'll know how to make sense of complex data sources and can apply this to our data sets, such as company registers - understanding their issues and data they contain. You’ll help us understand what is in our extensive datasets, analyse/transform new and existing data sources and identify/remediate data quality issues or other anomalies.
- Managing and supporting our data pipeline: You will work with other members of the team to manage and solve issues with our data feeds ensuring OpenCorporates data is the highest quality.
- Company data expert. You will want to become an expert in the company data that we hold (and other sources), be engaged with supporting clients with their queries, and will support the Chief Data Officer and Head of Campaigns to engage with governments and registries.
- Help us improve the way we work. You will be a key player in our goal to have a ruthless focus on efficiency and productivity improvements in order to maintain our competitive advantage, scale at pace, provide better and fresher data and minimise the human interaction (allowing the team to focus on data analysis and other areas where we add the most value).
- You will help us to structure, systemise and automate our data pipeline from data sourcing to provision to clients
- Provide great service. You'll ensure our clients and users receive quality and timely data or answers to queries, and help keep our automated data feeds running smoothly. You will be involved in publishing product and data expertise to demonstrate thought leadership and to increase utility for users - starting with excellent documentation (manuals, data policies, dictionaries and source documentation).
- Develop our products. You will work alongside the CDO, Commercial and Tech teams to implement compelling and innovative products by understanding the use cases, the data and desired client experience.
Relevant technical skills
Dependent on the level applying for, you will either have one (or more) years of experience with data acquisition and ETL in a data analyst, data engineering or master data management roles. Above all we are looking for talented people who we think will fit in well, who are happy to be part of a team - sharing knowledge and learning from each other.
Desirable skills (extent based on seniority):
- Query and understand structured data within SQLite, MySQL and JSON
- Software development knowledge in Ruby or Python, Git, Linux shell scripting & tools
- ETL processes and data pipelines; data testing/quality assurance processes (script based languages as opposed to ETL tools)
- Data acquisition, analysis and quality management on large datasets
- Root cause analysis & data remediation experience
- Accuracy and attention to detail.
- Excellent verbal and written communication skills
Useful but not a prerequisite:
- Knowledge about company data and master data management
- Process improvement and automation
- Web scraping
- Competitive salary depending on experience.
- We offer 26 days holiday (plus public holidays).
- We will add you to our pension scheme.
- Are you interested in a conference, training course or book? You got it.
- If you want to work remotely sometimes because kids, Amazon deliveries, plumbers, sick cat etc., that's cool.
- We are an equal opportunity employer and value diversity at our company.
- We do not discriminate on the basis of race, religion, colour, national origin, gender, sexual orientation, age, marital status or disability status.